Regular Expressions Cookbook, 2nd Edition

Book description

Take the guesswork out of using regular expressions. With more than 140 practical recipes, this cookbook provides everything you need to solve a wide range of real-world problems. Novices will learn basic skills and tools, and programmers and experienced users will find a wealth of detail. Each recipe provides samples you can use right away.

This revised edition covers the regular expression flavors used by C#, Java, JavaScript, Perl, PHP, Python, Ruby, and VB.NET. You’ll learn powerful new tricks, avoid flavor-specific gotchas, and save valuable time with this huge library of practical solutions.

  • Learn regular expressions basics through a detailed tutorial
  • Use code listings to implement regular expressions with your language of choice
  • Understand how regular expressions differ from language to language
  • Handle common user input with recipes for validation and formatting
  • Find and manipulate words, special characters, and lines of text
  • Detect integers, floating-point numbers, and other numerical formats
  • Parse source code and process log files
  • Use regular expressions in URLs, paths, and IP addresses
  • Manipulate HTML, XML, and data exchange formats
  • Discover little-known regular expression tricks and techniques

Publisher resources

View/Submit Errata

Table of contents

  1. Regular Expressions Cookbook
  2. Preface
    1. Caught in the Snarls of Different Versions
    2. Intended Audience
    3. Technology Covered
    4. Organization of This Book
    5. Conventions Used in This Book
    6. Using Code Examples
    7. Safari® Books Online
    8. How to Contact Us
    9. Acknowledgments
  3. 1. Introduction to Regular Expressions
    1. Regular Expressions Defined
      1. Many Flavors of Regular Expressions
      2. Regex Flavors Covered by This Book
    2. Search and Replace with Regular Expressions
      1. Many Flavors of Replacement Text
    3. Tools for Working with Regular Expressions
      1. RegexBuddy
      2. RegexPal
      3. RegexMagic
      4. More Online Regex Testers
        1. RegexPlanet
        2. regex.larsolavtorvik.com
        3. Nregex
        4. Rubular
        5. myregexp.com
      5. More Desktop Regular Expression Testers
        1. Expresso
        2. The Regulator
        3. SDL Regex Fuzzer
      6. grep
        1. PowerGREP
        2. Windows Grep
        3. RegexRenamer
      7. Popular Text Editors
  4. 2. Basic Regular Expression Skills
    1. 2.1. Match Literal Text
      1. Problem
      2. Solution
      3. Discussion
      4. Variations
        1. Block escape
        2. Case-insensitive matching
      5. See Also
    2. 2.2. Match Nonprintable Characters
      1. Problem
      2. Solution
      3. Discussion
      4. Variations on Representations of Nonprinting Characters
        1. The 26 control characters
        2. The 7-bit character set
      5. See Also
    3. 2.3. Match One of Many Characters
      1. Problem
      2. Solution
        1. Calendar with misspellings
        2. Hexadecimal character
        3. Nonhexadecimal character
      3. Discussion
      4. Variations
        1. Shorthands
        2. Case insensitivity
      5. Flavor-Specific Features
        1. .NET character class subtraction
        2. Java character class union, intersection, and subtraction
      6. See Also
    4. 2.4. Match Any Character
      1. Problem
      2. Solution
        1. Any character except line breaks
        2. Any character including line breaks
      3. Discussion
        1. Any character except line breaks
        2. Any character including line breaks
        3. Dot abuse
      4. Variations
      5. See Also
    5. 2.5. Match Something at the Start and/or the End of a Line
      1. Problem
      2. Solution
        1. Start of the subject
        2. End of the subject
        3. Start of a line
        4. End of a line
      3. Discussion
        1. Anchors and lines
        2. Start of the subject
        3. End of the subject
        4. Start of a line
        5. End of a line
        6. Zero-length matches
      4. Variations
      5. See Also
    6. 2.6. Match Whole Words
      1. Problem
      2. Solution
        1. Word boundaries
        2. Nonboundaries
      3. Discussion
        1. Word boundaries
        2. Nonboundaries
      4. Word Characters
      5. See Also
    7. 2.7. Unicode Code Points, Categories, Blocks, and Scripts
      1. Problem
      2. Solution
        1. Unicode code point
        2. Unicode category
        3. Unicode block
        4. Unicode script
        5. Unicode grapheme
      3. Discussion
        1. Unicode code point
        2. Unicode category
        3. Unicode block
        4. Unicode script
        5. Unicode grapheme
      4. Variations
        1. Negated variant
        2. Character classes
        3. Listing all characters
      5. See Also
    8. 2.8. Match One of Several Alternatives
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    9. 2.9. Group and Capture Parts of the Match
      1. Problem
      2. Solution
      3. Discussion
      4. Variations
        1. Noncapturing groups
        2. Group with mode modifiers
      5. See Also
    10. 2.10. Match Previously Matched Text Again
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    11. 2.11. Capture and Name Parts of the Match
      1. Problem
      2. Solution
        1. Named capture
        2. Named backreferences
      3. Discussion
        1. Named capture
        2. Named backreferences
        3. Groups with the same name
      4. See Also
    12. 2.12. Repeat Part of the Regex a Certain Number of Times
      1. Problem
      2. Solution
        1. Googol
        2. Hexadecimal number
        3. Hexadecimal number with optional suffix
        4. Floating-point number
      3. Discussion
        1. Fixed repetition
        2. Variable repetition
        3. Infinite repetition
        4. Making something optional
        5. Repeating groups
      4. See Also
    13. 2.13. Choose Minimal or Maximal Repetition
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    14. 2.14. Eliminate Needless Backtracking
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    15. 2.15. Prevent Runaway Repetition
      1. Problem
      2. Solution
      3. Discussion
      4. Variations
      5. See Also
    16. 2.16. Test for a Match Without Adding It to the Overall Match
      1. Problem
      2. Solution
      3. Discussion
        1. Lookaround
        2. Negative lookaround
        3. Different levels of lookbehind
        4. Matching the same text twice
        5. Lookaround is atomic
      4. Alternative to Lookbehind
      5. Solution Without Lookbehind
      6. See Also
    17. 2.17. Match One of Two Alternatives Based on a Condition
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    18. 2.18. Add Comments to a Regular Expression
      1. Problem
      2. Solution
      3. Discussion
        1. Free-spacing mode
        2. Java has free-spacing character classes
      4. Variations
    19. 2.19. Insert Literal Text into the Replacement Text
      1. Problem
      2. Solution
      3. Discussion
        1. When and how to escape characters in replacement text
        2. .NET and JavaScript
        3. Java
        4. PHP
        5. Perl
        6. Python and Ruby
        7. More escape rules for string literals
      4. See Also
    20. 2.20. Insert the Regex Match into the Replacement Text
      1. Problem
      2. Solution
        1. Regular expression
        2. Replacement
      3. Discussion
      4. See Also
    21. 2.21. Insert Part of the Regex Match into the Replacement Text
      1. Problem
      2. Solution
        1. Regular expression
        2. Replacement
      3. Discussion
        1. Replacements using capturing groups
        2. $10 and higher
        3. References to nonexistent groups
      4. Solution Using Named Capture
        1. Regular expression
        2. Replacement
        3. Flavors that support named capture
      5. See Also
    22. 2.22. Insert Match Context into the Replacement Text
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
  5. 3. Programming with Regular Expressions
    1. Programming Languages and Regex Flavors
      1. Languages Covered in This Chapter
      2. More Programming Languages
    2. 3.1. Literal Regular Expressions in Source Code
      1. Problem
      2. Solution
        1. C#
        2. VB.NET
        3. Java
        4. JavaScript
        5. XRegExp
        6. PHP
        7. Perl
        8. Python
        9. Ruby
      3. Discussion
        1. C#
        2. VB.NET
        3. Java
        4. JavaScript
        5. XRegExp
        6. PHP
        7. Perl
        8. Python
        9. Ruby
      4. See Also
    3. 3.2. Import the Regular Expression Library
      1. Problem
      2. Solution
        1. C#
        2. VB.NET
        3. XRegExp
        4. Java
        5. Python
      3. Discussion
        1. C#
        2. VB.NET
        3. Java
        4. JavaScript
        5. XRegExp
        6. PHP
        7. Perl
        8. Python
        9. Ruby
    4. 3.3. Create Regular Expression Objects
      1. Problem
      2. Solution
        1. C#
        2. VB.NET
        3. Java
        4. JavaScript
        5. XRegExp
        6. Perl
        7. Python
        8. Ruby
      3. Discussion
        1. .NET
        2. Java
        3. JavaScript
        4. XRegExp
        5. PHP
        6. Perl
        7. Python
        8. Ruby
      4. Compiling a Regular Expression Down to CIL
        1. C#
        2. VB.NET
      5. Discussion
      6. See Also
    5. 3.4. Set Regular Expression Options
      1. Problem
      2. Solution
        1. C#
        2. VB.NET
        3. Java
        4. JavaScript
        5. XRegExp
        6. PHP
        7. Perl
        8. Python
        9. Ruby
      3. Discussion
        1. .NET
        2. Java
        3. JavaScript
        4. XRegExp
        5. PHP
        6. Perl
        7. Python
        8. Ruby
      4. Additional Language-Specific Options
        1. .NET
        2. Java
        3. JavaScript
        4. XRegExp
        5. PHP
        6. Perl
        7. Python
        8. Ruby
      5. See Also
    6. 3.5. Test If a Match Can Be Found Within a Subject String
      1. Problem
      2. Solution
        1. C#
        2. VB.NET
        3. Java
        4. JavaScript
        5. PHP
        6. Perl
        7. Python
        8. Ruby
      3. Discussion
        1. C# and VB.NET
        2. Java
        3. JavaScript
        4. PHP
        5. Perl
        6. Python
        7. Ruby
      4. See Also
    7. 3.6. Test Whether a Regex Matches the Subject String Entirely
      1. Problem
      2. Solution
        1. C#
        2. VB.NET
        3. Java
        4. JavaScript
        5. PHP
        6. Perl
        7. Python
        8. Ruby
      3. Discussion
        1. C# and VB.NET
        2. Java
        3. JavaScript
        4. PHP
        5. Perl
        6. Python
        7. Ruby
      4. See Also
    8. 3.7. Retrieve the Matched Text
      1. Problem
      2. Solution
        1. C#
        2. VB.NET
        3. Java
        4. JavaScript
        5. PHP
        6. Perl
        7. Python
        8. Ruby
      3. Discussion
        1. .NET
        2. Java
        3. JavaScript
        4. PHP
        5. Perl
        6. Python
        7. Ruby
      4. See Also
    9. 3.8. Determine the Position and Length of the Match
      1. Problem
      2. Solution
        1. C#
        2. VB.NET
        3. Java
        4. JavaScript
        5. PHP
        6. Perl
        7. Python
        8. Ruby
      3. Discussion
        1. .NET
        2. Java
        3. JavaScript
        4. PHP
        5. Perl
        6. Python
        7. Ruby
      4. See Also
    10. 3.9. Retrieve Part of the Matched Text
      1. Problem
      2. Solution
        1. C#
        2. VB.NET
        3. Java
        4. JavaScript
        5. PHP
        6. Perl
        7. Python
        8. Ruby
      3. Discussion
        1. .NET
        2. Java
        3. JavaScript
        4. PHP
        5. Perl
        6. Python
        7. Ruby
      4. Named Capture
        1. C#
        2. VB.NET
        3. Java
        4. XRegExp
        5. PHP
        6. Perl
        7. Python
        8. Ruby
      5. See Also
    11. 3.10. Retrieve a List of All Matches
      1. Problem
      2. Solution
        1. C#
        2. VB.NET
        3. Java
        4. JavaScript
        5. PHP
        6. Perl
        7. Python
        8. Ruby
      3. Discussion
        1. .NET
        2. Java
        3. JavaScript
        4. PHP
        5. Perl
        6. Python
        7. Ruby
      4. See Also
    12. 3.11. Iterate over All Matches
      1. Problem
      2. Solution
        1. C#
        2. VB.NET
        3. Java
        4. JavaScript
        5. XRegExp
        6. PHP
        7. Perl
        8. Python
        9. Ruby
      3. Discussion
        1. .NET
        2. Java
        3. JavaScript
        4. XRegExp
        5. PHP
        6. Perl
        7. Python
        8. Ruby
      4. See Also
    13. 3.12. Validate Matches in Procedural Code
      1. Problem
      2. Solution
        1. C#
        2. VB.NET
        3. Java
        4. JavaScript
        5. XRegExp
        6. PHP
        7. Perl
        8. Python
        9. Ruby
      3. Discussion
      4. See Also
    14. 3.13. Find a Match Within Another Match
      1. Problem
      2. Solution
        1. C#
        2. VB.NET
        3. Java
        4. JavaScript
        5. XRegExp
        6. PHP
        7. Perl
        8. Python
        9. Ruby
      3. Discussion
      4. See Also
    15. 3.14. Replace All Matches
      1. Problem
      2. Solution
        1. C#
        2. VB.NET
        3. Java
        4. JavaScript
        5. PHP
        6. Perl
        7. Python
        8. Ruby
      3. Discussion
        1. .NET
        2. Java
        3. JavaScript
        4. PHP
        5. Perl
        6. Python
        7. Ruby
      4. See Also
    16. 3.15. Replace Matches Reusing Parts of the Match
      1. Problem
      2. Solution
        1. C#
        2. VB.NET
        3. Java
        4. JavaScript
        5. PHP
        6. Perl
        7. Python
        8. Ruby
      3. Discussion
        1. .NET
        2. Java
        3. JavaScript
        4. PHP
        5. Perl
        6. Python
        7. Ruby
      4. Named Capture
        1. C#
        2. VB.NET
        3. Java 7
        4. XRegExp
        5. PHP
        6. Perl
        7. Python
        8. Ruby
      5. See Also
    17. 3.16. Replace Matches with Replacements Generated in Code
      1. Problem
      2. Solution
        1. C#
        2. VB.NET
        3. Java
        4. JavaScript
        5. PHP
        6. Perl
        7. Python
        8. Ruby
      3. Discussion
        1. C#
        2. VB.NET
        3. Java
        4. JavaScript
        5. PHP
        6. Perl
        7. Python
        8. Ruby
      4. See Also
    18. 3.17. Replace All Matches Within the Matches of Another Regex
      1. Problem
      2. Solution
        1. C#
        2. VB.NET
        3. Java
        4. JavaScript
        5. PHP
        6. Perl
        7. Python
        8. Ruby
      3. Discussion
      4. See Also
    19. 3.18. Replace All Matches Between the Matches of Another Regex
      1. Problem
      2. Solution
        1. C#
        2. VB.NET
        3. Java
        4. JavaScript
        5. PHP
        6. Perl
        7. Python
        8. Ruby
      3. Discussion
        1. Perl and Ruby
        2. Python
      4. See Also
    20. 3.19. Split a String
      1. Problem
      2. Solution
        1. C#
        2. VB.NET
        3. Java
        4. JavaScript
        5. XRegExp
        6. PHP
        7. Perl
        8. Python
        9. Ruby
      3. Discussion
        1. C# and VB.NET
        2. Java
        3. JavaScript
        4. XRegExp
        5. PHP
        6. Perl
        7. Python
        8. Ruby
      4. See Also
    21. 3.20. Split a String, Keeping the Regex Matches
      1. Problem
      2. Solution
        1. C#
        2. VB.NET
        3. Java
        4. JavaScript
        5. XRegExp
        6. PHP
        7. Perl
        8. Python
        9. Ruby
      3. Discussion
        1. .NET
        2. Java
        3. JavaScript
        4. XRegExp
        5. PHP
        6. Perl
        7. Python
        8. Ruby
      4. See Also
    22. 3.21. Search Line by Line
      1. Problem
      2. Solution
        1. C#
        2. VB.NET
        3. Java
        4. JavaScript
        5. PHP
        6. Perl
        7. Python
        8. Ruby
      3. Discussion
      4. See Also
    23. Construct a Parser
      1. Problem
      2. Solution
        1. C#
        2. VB.NET
        3. Java
        4. JavaScript
        5. XRegExp
        6. Perl
        7. Python
        8. PHP
        9. Ruby
      3. Discussion
      4. See Also
  6. 4. Validation and Formatting
    1. 4.1. Validate Email Addresses
      1. Problem
      2. Solution
        1. Simple
        2. Simple, with restrictions on characters
        3. Simple, with all valid local part characters
        4. No leading, trailing, or consecutive dots
        5. Top-level domain has two to six letters
      3. Discussion
        1. About email addresses
        2. Regular expression syntax
        3. Building a regex step-by-step
      4. Variations
      5. See Also
    2. 4.2. Validate and Format North American Phone Numbers
      1. Problem
      2. Solution
        1. Regular expression
        2. Replacement
        3. C# example
        4. JavaScript example
        5. Other programming languages
      3. Discussion
      4. Variations
        1. Eliminate invalid phone numbers
        2. Find phone numbers in documents
        3. Allow a leading “1”
        4. Allow seven-digit phone numbers
      5. See Also
    3. 4.3. Validate International Phone Numbers
      1. Problem
      2. Solution
        1. Regular expression
        2. JavaScript example
      3. Discussion
      4. Variations
        1. Validate international phone numbers in EPP format
      5. See Also
    4. 4.4. Validate Traditional Date Formats
      1. Problem
      2. Solution
      3. Discussion
      4. Variations
      5. See Also
    5. 4.5. Validate Traditional Date Formats, Excluding Invalid Dates
      1. Problem
      2. Solution
        1. C#
        2. Perl
        3. Pure regular expression
      3. Discussion
        1. Regex with procedural code
        2. Pure regular expression
      4. Variations
      5. See Also
    6. 4.6. Validate Traditional Time Formats
      1. Problem
      2. Solution
      3. Discussion
      4. Variations
      5. See Also
    7. 4.7. Validate ISO 8601 Dates and Times
      1. Problem
      2. Solution
        1. Dates
        2. Weeks
        3. Times
        4. Date and time
        5. XML Schema dates and times
      3. Discussion
      4. See Also
    8. 4.8. Limit Input to Alphanumeric Characters
      1. Problem
      2. Solution
        1. Regular expression
        2. Ruby example
      3. Discussion
      4. Variations
        1. Limit input to ASCII characters
        2. Limit input to ASCII noncontrol characters and line breaks
        3. Limit input to shared ISO-8859-1 and Windows-1252 characters
        4. Limit input to alphanumeric characters in any language
      5. See Also
    9. 4.9. Limit the Length of Text
      1. Problem
      2. Solution
        1. Regular expression
        2. Perl example
      3. Discussion
      4. Variations
        1. Limit the length of an arbitrary pattern
        2. Limit the number of nonwhitespace characters
        3. Limit the number of words
      5. See Also
    10. 4.10. Limit the Number of Lines in Text
      1. Problem
      2. Solution
        1. Regular expression
        2. PHP (PCRE) example
      3. Discussion
      4. Variations
        1. Working with esoteric line separators
      5. See Also
    11. 4.11. Validate Affirmative Responses
      1. Problem
      2. Solution
        1. Regular expression
        2. JavaScript example
      3. Discussion
      4. See Also
    12. 4.12. Validate Social Security Numbers
      1. Problem
      2. Solution
        1. Regular expression
        2. Python example
      3. Discussion
      4. Variations
        1. Find Social Security numbers in documents
      5. See Also
    13. 4.13. Validate ISBNs
      1. Problem
      2. Solution
        1. Regular expressions
        2. JavaScript example, with checksum validation
        3. Python example, with checksum validation
      3. Discussion
        1. ISBN-10 checksum
        2. ISBN-13 checksum
      4. Variations
        1. Find ISBNs in documents
        2. Eliminate incorrect ISBN identifiers
      5. See Also
    14. 4.14. Validate ZIP Codes
      1. Problem
      2. Solution
        1. Regular expression
        2. VB.NET example
      3. Discussion
      4. See Also
    15. 4.15. Validate Canadian Postal Codes
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    16. 4.16. Validate U.K. Postcodes
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    17. 4.17. Find Addresses with Post Office Boxes
      1. Problem
      2. Solution
        1. Regular expression
        2. C# example
      3. Discussion
      4. See Also
    18. 4.18. Reformat Names From “FirstName LastName” to “LastName, FirstName”
      1. Problem
      2. Solution
        1. Regular expression
        2. Replacement
        3. JavaScript example
      3. Discussion
      4. Variations
        1. List surname particles at the beginning of the name
      5. See Also
    19. 4.19. Validate Password Complexity
      1. Problem
      2. Solution
        1. Length between 8 and 32 characters
        2. ASCII visible and space characters only
        3. One or more uppercase letters
        4. One or more lowercase letters
        5. One or more numbers
        6. One or more special characters
        7. Disallow three or more sequential identical characters
        8. Example JavaScript solution, basic
        9. Example JavaScript solution, with x out of y validation
        10. Example JavaScript solution, with password security ranking
      3. Discussion
        1. Example JavaScript solutions
      4. Variations
        1. Validate multiple password rules with a single regex
      5. See Also
    20. 4.20. Validate Credit Card Numbers
      1. Problem
      2. Solution
        1. Strip spaces and hyphens
        2. Validate the number
        3. Example web page with JavaScript
      3. Discussion
        1. Strip spaces and hyphens
        2. Validate the number
        3. Incorporating the solution into a web page
      4. Extra Validation with the Luhn Algorithm
      5. See Also
    21. 4.21. European VAT Numbers
      1. Problem
      2. Solution
        1. Strip whitespace and punctuation
        2. Validate the number
      3. Discussion
        1. Strip whitespace and punctuation
        2. Validate the number
      4. Variations
      5. See Also
  7. 5. Words, Lines, and Special Characters
    1. 5.1. Find a Specific Word
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    2. 5.2. Find Any of Multiple Words
      1. Problem
      2. Solution
        1. Using alternation
        2. Example JavaScript solution
      3. Discussion
        1. Using alternation
        2. Example JavaScript solution
      4. See Also
    3. 5.3. Find Similar Words
      1. Problem
      2. Solution
        1. Color or colour
        2. Bat, cat, or rat
        3. Words ending with “phobia”
        4. Steve, Steven, or Stephen
        5. Variations of “regular expression”
      3. Discussion
        1. Use word boundaries to match complete words
        2. Color or colour
        3. Bat, cat, or rat
        4. Words ending with “phobia”
        5. Steve, Steven, or Stephen
        6. Variations of “regular expression”
      4. See Also
    4. 5.4. Find All Except a Specific Word
      1. Problem
      2. Solution
      3. Discussion
      4. Variations
        1. Find words that don’t contain another word
      5. See Also
    5. 5.5. Find Any Word Not Followed by a Specific Word
      1. Problem
      2. Solution
      3. Discussion
      4. Variations
      5. See Also
    6. 5.6. Find Any Word Not Preceded by a Specific Word
      1. Problem
      2. Solution
        1. Lookbehind you
        2. Words not preceded by “cat”
        3. Simulate lookbehind
      3. Discussion
        1. Fixed, finite, and infinite length lookbehind
        2. Simulate lookbehind
      4. Variations
      5. See Also
    7. 5.7. Find Words Near Each Other
      1. Problem
      2. Solution
      3. Discussion
      4. Variations
        1. Using a conditional
        2. Match three or more words near each other
          1. Exponentially increasing permutations
          2. The ugly solution
          3. Exploiting empty backreferences
          4. JavaScript backreferences by its own rules
        3. Multiple words, any distance from each other
      5. See Also
    8. 5.8. Find Repeated Words
      1. Problem
      2. Solution
      3. Discussion
      4. Variations
      5. See Also
    9. 5.9. Remove Duplicate Lines
      1. Problem
      2. Solution
        1. Option 1: Sort lines and remove adjacent duplicates
        2. Option 2: Keep the last occurrence of each duplicate line in an unsorted file
        3. Option 3: Keep the first occurrence of each duplicate line in an unsorted file
      3. Discussion
        1. Option 1: Sort lines and remove adjacent duplicates
        2. Option 2: Keep the last occurrence of each duplicate line in an unsorted file
        3. Option 3: Keep the first occurrence of each duplicate line in an unsorted file
      4. See Also
    10. 5.10. Match Complete Lines That Contain a Word
      1. Problem
      2. Solution
      3. Discussion
      4. Variations
      5. See Also
    11. 5.11. Match Complete Lines That Do Not Contain a Word
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    12. 5.12. Trim Leading and Trailing Whitespace
      1. Problem
      2. Solution
      3. Discussion
      4. Variations
      5. See Also
    13. 5.13. Replace Repeated Whitespace with a Single Space
      1. Problem
      2. Solution
        1. Clean any whitespace characters
        2. Clean horizontal whitespace characters
      3. Discussion
        1. Clean any whitespace characters
        2. Clean horizontal whitespace characters
      4. See Also
    14. 5.14. Escape Regular Expression Metacharacters
      1. Problem
      2. Solution
        1. Built-in solutions
        2. Regular expression
        3. Replacement
        4. Example JavaScript function
      3. Discussion
      4. Variations
      5. See Also
  8. 6. Numbers
    1. 6.1. Integer Numbers
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    2. 6.2. Hexadecimal Numbers
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    3. 6.3. Binary Numbers
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    4. 6.4. Octal Numbers
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    5. 6.5. Decimal Numbers
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    6. 6.6. Strip Leading Zeros
      1. Problem
      2. Solution
        1. Regular expression
        2. Replacement
        3. Getting the numbers in Perl
        4. Stripping leading zeros in PHP
      3. Discussion
      4. See Also
    7. 6.7. Numbers Within a Certain Range
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    8. 6.8. Hexadecimal Numbers Within a Certain Range
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    9. 6.9. Integer Numbers with Separators
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    10. 6.10. Floating-Point Numbers
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    11. 6.11. Numbers with Thousand Separators
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    12. 6.12. Add Thousand Separators to Numbers
      1. Problem
      2. Solution
        1. Basic solution
        2. Match separator positions only, using lookbehind
      3. Discussion
        1. Introduction
        2. Basic solution
        3. Match separator positions only, using lookbehind
      4. Variations
        1. Don’t add commas after a decimal point
          1. Use infinite lookbehind
          2. Search-and-replace within matched numbers
      5. See Also
    13. 6.13. Roman Numerals
      1. Problem
      2. Solution
      3. Discussion
      4. Convert Roman Numerals to Decimal
      5. See Also
  9. 7. Source Code and Log Files
    1. Keywords
      1. Problem
      2. Solution
      3. Discussion
      4. Variations
      5. See Also
    2. Identifiers
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    3. Numeric Constants
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    4. Operators
      1. Problem
      2. Solution
      3. Discussion
    5. Single-Line Comments
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    6. Multiline Comments
      1. Problem
      2. Solution
      3. Discussion
      4. Variations
      5. See Also
    7. All Comments
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    8. Strings
      1. Problem
      2. Solution
      3. Discussion
      4. Variations
      5. See Also
    9. Strings with Escapes
      1. Problem
      2. Solution
      3. Discussion
      4. Variations
      5. See Also
    10. Regex Literals
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    11. Here Documents
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    12. Common Log Format
      1. Problem
      2. Solution
      3. Discussion
      4. Variations
      5. See Also
    13. Combined Log Format
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    14. Broken Links Reported in Web Logs
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
  10. 8. URLs, Paths, and Internet Addresses
    1. 8.1. Validating URLs
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    2. 8.2. Finding URLs Within Full Text
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    3. 8.3. Finding Quoted URLs in Full Text
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    4. 8.4. Finding URLs with Parentheses in Full Text
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    5. 8.5. Turn URLs into Links
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    6. 8.6. Validating URNs
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    7. 8.7. Validating Generic URLs
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    8. 8.8. Extracting the Scheme from a URL
      1. Problem
      2. Solution
        1. Extract the scheme from a URL known to be valid
        2. Extract the scheme while validating the URL
      3. Discussion
      4. See Also
    9. 8.9. Extracting the User from a URL
      1. Problem
      2. Solution
        1. Extract the user from a URL known to be valid
        2. Extract the user while validating the URL
      3. Discussion
      4. See Also
    10. 8.10. Extracting the Host from a URL
      1. Problem
      2. Solution
        1. Extract the host from a URL known to be valid
        2. Extract the host while validating the URL
      3. Discussion
      4. See Also
    11. 8.11. Extracting the Port from a URL
      1. Problem
      2. Solution
        1. Extract the port from a URL known to be valid
        2. Extract the port while validating the URL
      3. Discussion
      4. See Also
    12. 8.12. Extracting the Path from a URL
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    13. 8.13. Extracting the Query from a URL
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    14. 8.14. Extracting the Fragment from a URL
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    15. 8.15. Validating Domain Names
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    16. 8.16. Matching IPv4 Addresses
      1. Problem
      2. Solution
        1. Regular expression
        2. Perl
      3. Discussion
      4. See Also
    17. 8.17. Matching IPv6 Addresses
      1. Problem
      2. Solution
        1. Standard notation
        2. Mixed notation
        3. Standard or mixed notation
        4. Compressed notation
        5. Compressed mixed notation
        6. Standard, mixed, or compressed notation
      3. Discussion
        1. Standard notation
        2. Mixed notation
        3. Standard or mixed notation
        4. Compressed notation
        5. Compressed mixed notation
        6. Standard, mixed, or compressed notation
      4. See Also
    18. 8.18. Validate Windows Paths
      1. Problem
      2. Solution
        1. Drive letter paths
        2. Drive letter and UNC paths
        3. Drive letter, UNC, and relative paths
      3. Discussion
        1. Drive letter paths
        2. Drive letter and UNC paths
        3. Drive letter, UNC, and relative paths
      4. See Also
    19. 8.19. Split Windows Paths into Their Parts
      1. Problem
      2. Solution
        1. Drive letter paths
        2. Drive letter and UNC paths
        3. Drive letter, UNC, and relative paths
      3. Discussion
        1. Drive letter paths
        2. Drive letter and UNC paths
        3. Drive letter, UNC, and relative paths
      4. See Also
    20. 8.20. Extract the Drive Letter from a Windows Path
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    21. 8.21. Extract the Server and Share from a UNC Path
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    22. 8.22. Extract the Folder from a Windows Path
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    23. 8.23. Extract the Filename from a Windows Path
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    24. 8.24. Extract the File Extension from a Windows Path
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    25. 8.25. Strip Invalid Characters from Filenames
      1. Problem
      2. Solution
        1. Regular expression
        2. Replacement
      3. Discussion
      4. See Also
  11. 9. Markup and Data Formats
    1. Processing Markup and Data Formats with Regular Expressions
      1. Basic Rules for Formats Covered in This Chapter
    2. 9.1. Find XML-Style Tags
      1. Problem
      2. Solution
        1. Quick and dirty
        2. Allow > in attribute values
        3. (X)HTML tags (loose)
        4. (X)HTML tags (strict)
        5. XML tags (strict)
      3. Discussion
        1. A few words of caution
        2. Quick and dirty
        3. Allow > in attribute values
        4. (X)HTML tags (loose)
        5. (X)HTML tags (strict)
        6. XML tags (strict)
      4. Skip Tricky (X)HTML and XML Sections
        1. Outer regex for (X)HTML
        2. Outer regex for XML
      5. See Also
    3. 9.2. Replace <b> Tags with <strong>
      1. Problem
      2. Solution
      3. Discussion
      4. Variations
        1. Replace a list of tags
      5. See Also
    4. 9.3. Remove All XML-Style Tags Except <em> and <strong>
      1. Problem
      2. Solution
        1. Solution 1: Match tags except <em> and <strong>
        2. Solution 2: Match tags except <em> and <strong>, and any tags that contain attributes
      3. Discussion
      4. Variations
        1. Whitelist specific attributes
      5. See Also
    5. 9.4. Match XML Names
      1. Problem
      2. Solution
        1. XML 1.0 names (approximate)
        2. XML 1.1 names (exact)
      3. Discussion
        1. XML 1.0 names
        2. XML 1.1 names
      4. Variations
      5. See Also
    6. 9.5. Convert Plain Text to HTML by Adding <p> and <br> Tags
      1. Problem
      2. Solution
        1. Step 1: Replace HTML special characters with named character references
        2. Step 2: Replace all line breaks with <br>
        3. Step 3: Replace double <br> tags with </p><p>
        4. Step 4: Wrap the entire string with <p>⋯</p>
        5. Example JavaScript solution
      3. Discussion
        1. Step 1: Replace HTML special characters with named character references
        2. Step 2: Replace all line breaks with <br>
        3. Step 3: Replace double <br> tags with </p><p>
        4. Step 4: Wrap the entire string with <p>⋯</p>
      4. See Also
    7. 9.6. Decode XML Entities
      1. Problem
      2. Solution
        1. Regular expression
        2. Replace matches with their corresponding literal characters
        3. Example JavaScript solution
      3. Discussion
      4. See Also
    8. 9.7. Find a Specific Attribute in XML-Style Tags
      1. Problem
      2. Solution
        1. Tags that contain an id attribute (quick and dirty)
        2. Tags that contain an id attribute (more reliable)
        3. <div> tags that contain an id attribute
        4. Tags that contain an id attribute with the value “my-id”
        5. Tags that contain “my-class” within their class attribute value
      3. Discussion
      4. See Also
    9. 9.8. Add a cellspacing Attribute to <table> Tags That Do Not Already Include It
      1. Problem
      2. Solution
        1. Solution 1, simplistic
        2. Solution 2, more reliable
        3. Insert the new attribute
      3. Discussion
      4. See Also
    10. 9.9. Remove XML-Style Comments
      1. Problem
      2. Solution
      3. Discussion
        1. How it works
        2. When comments can’t be removed
      4. Variations
        1. Find valid XML comments
        2. Find valid HTML comments
      5. See Also
    11. 9.10. Find Words Within XML-Style Comments
      1. Problem
      2. Solution
        1. Two-step approach
        2. Single-step approach
      3. Discussion
        1. Two-step approach
        2. Single-step approach
      4. Variations
      5. See Also
    12. 9.11. Change the Delimiter Used in CSV Files
      1. Problem
      2. Solution
        1. Example web page with JavaScript
      3. Discussion
      4. See Also
    13. 9.12. Extract CSV Fields from a Specific Column
      1. Problem
      2. Solution
        1. Example web page with JavaScript
      3. Discussion
      4. Variations
        1. Match a CSV record and capture the field in column 1 to backreference 1
        2. Match a CSV record and capture the field in column 2 to backreference 1
        3. Match a CSV record and capture the field in column 3 or higher to backreference 1
        4. Replacement string
      5. See Also
    14. 9.13. Match INI Section Headers
      1. Problem
      2. Solution
      3. Discussion
      4. Variations
      5. See Also
    15. 9.14. Match INI Section Blocks
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
    16. 9.15. Match INI Name-Value Pairs
      1. Problem
      2. Solution
      3. Discussion
      4. See Also
  12. Index
  13. About the Authors
  14. Colophon
  15. Copyright

Product information

  • Title: Regular Expressions Cookbook, 2nd Edition
  • Author(s): Jan Goyvaerts, Steven Levithan
  • Release date: August 2012
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781449319434