Regular Expressions Cookbook

Book description

This cookbook provides more than 100 recipes to help you crunch data and manipulate text with regular expressions. Every programmer can find uses for regular expressions, but their power doesn't come worry-free. Even seasoned users often suffer from poor performance, false positives, false negatives, or perplexing bugs. Regular Expressions Cookbook offers step-by-step instructions for some of the most common tasks involving this tool, with recipes for C#, Java, JavaScript, Perl, PHP, Python, Ruby, and VB.NET. With this book, you will:

  • Understand the basics of regular expressions through a concise tutorial

  • Use regular expressions effectively in several programming and scripting languages

  • Learn how to validate and format input

  • Manage words, lines, special characters, and numerical values

  • Find solutions for using regular expressions in URLs, paths, markup, and data exchange

  • Learn the nuances of more advanced regex features

  • Understand how regular expressions' APIs, syntax, and behavior differ from language to language

  • Write better regular expressions for custom needs

  • Whether you're a novice or an experienced user, Regular Expressions Cookbook will help deepen your knowledge of this unique and irreplaceable tool. You'll learn powerful new tricks, avoid language-specific gotchas, and save valuable time with this huge library of proven solutions to difficult, real-world problems.

    Table of contents

    1. Regular Expressions Cookbook
      1. SPECIAL OFFER: Upgrade this ebook with O’Reilly
      2. Preface
        1. Caught in the Snarls of Different Versions
        2. Intended Audience
        3. Technology Covered
        4. Organization of This Book
        5. Conventions Used in This Book
        6. Using Code Examples
        7. Safari® Books Online
        8. How to Contact Us
        9. Acknowledgments
      3. 1. Introduction to Regular Expressions
        1. Regular Expressions Defined
          1. Many Flavors of Regular Expressions
          2. Regex Flavors Covered by This Book
        2. Searching and Replacing with Regular Expressions
          1. Many Flavors of Replacement Text
        3. Tools for Working with Regular Expressions
          1. RegexBuddy
          2. RegexPal
          3. More Online Regex Testers
            1. regex.larsolavtorvik.com
            2. Nregex
            3. Rubular
            4. myregexp.com
            5. reAnimator
          4. More Desktop Regular Expression Testers
            1. Expresso
            2. The Regulator
          5. grep
            1. PowerGREP
            2. Windows Grep
            3. RegexRenamer
          6. Popular Text Editors
      4. 2. Basic Regular Expression Skills
        1. 2.1. Match Literal Text
          1. Problem
          2. Solution
          3. Discussion
          4. Variations
            1. Block escape
            2. Case-insensitive matching
          5. See Also
        2. 2.2. Match Nonprintable Characters
          1. Problem
          2. Solution
          3. Discussion
          4. Variations on Representations of Nonprinting Characters
            1. The 26 control characters
            2. The 7-bit character set
          5. See Also
        3. 2.3. Match One of Many Characters
          1. Problem
          2. Solution
            1. Calendar with misspellings
            2. Hexadecimal character
            3. Nonhexadecimal character
          3. Discussion
          4. Variations
            1. Shorthands
            2. Case insensitivity
          5. Flavor-Specific Features
            1. .NET character class subtraction
            2. Java character class union, subtraction, and intersection
          6. See Also
        4. 2.4. Match Any Character
          1. Problem
          2. Solution
            1. Any character except line breaks
            2. Any character including line breaks
          3. Discussion
            1. Any character except line breaks
            2. Any character including line breaks
            3. Dot abuse
          4. Variations
          5. See Also
        5. 2.5. Match Something at the Start and/or the End of a Line
          1. Problem
          2. Solution
            1. Start of the subject
            2. End of the subject
            3. Start of a line
            4. End of a line
          3. Discussion
            1. Anchors and lines
            2. Start of the subject
            3. End of the subject
            4. Start of a line
            5. End of a line
            6. Zero-length matches
          4. Variations
          5. See Also
        6. 2.6. Match Whole Words
          1. Problem
          2. Solution
            1. Word boundaries
            2. Nonboundaries
          3. Discussion
            1. Word boundaries
            2. Nonboundaries
          4. Word Characters
          5. See Also
        7. 2.7. Unicode Code Points, Properties, Blocks, and Scripts
          1. Problem
          2. Solution
            1. Unicode code point
            2. Unicode property or category
            3. Unicode block
            4. Unicode script
            5. Unicode grapheme
          3. Discussion
            1. Unicode code point
            2. Unicode property or category
            3. Unicode block
            4. Unicode script
            5. Unicode grapheme
          4. Variations
            1. Negated variant
            2. Character classes
            3. Listing all characters
          5. See Also
        8. 2.8. Match One of Several Alternatives
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        9. 2.9. Group and Capture Parts of the Match
          1. Problem
          2. Solution
          3. Discussion
          4. Variations
            1. Noncapturing groups
            2. Group with mode modifiers
          5. See Also
        10. 2.10. Match Previously Matched Text Again
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        11. 2.11. Capture and Name Parts of the Match
          1. Problem
          2. Solution
            1. Named capture
            2. Named backreferences
          3. Discussion
            1. Named capture
            2. Named backreferences
          4. See Also
        12. 2.12. Repeat Part of the Regex a Certain Number of Times
          1. Problem
          2. Solution
            1. Googol
            2. Hexadecimal number
            3. Hexadecimal number
            4. Floating-point number
          3. Discussion
            1. Fixed repetition
            2. Variable repetition
            3. Infinite repetition
            4. Making something optional
            5. Repeating groups
          4. See Also
        13. 2.13. Choose Minimal or Maximal Repetition
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        14. 2.14. Eliminate Needless Backtracking
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        15. 2.15. Prevent Runaway Repetition
          1. Problem
          2. Solution
          3. Discussion
          4. Variations
          5. See Also
        16. 2.16. Test for a Match Without Adding It to the Overall Match
          1. Problem
          2. Solution
          3. Discussion
            1. Lookaround
            2. Negative lookaround
            3. Different levels of lookbehind
            4. Matching the same text twice
            5. Lookaround is atomic
          4. Solution Without Lookbehind
          5. See Also
        17. 2.17. Match One of Two Alternatives Based on a Condition
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        18. 2.18. Add Comments to a Regular Expression
          1. Problem
          2. Solution
          3. Discussion
            1. Free-spacing mode
            2. Java has free-spacing character classes
          4. Variations
        19. 2.19. Insert Literal Text into the Replacement Text
          1. Problem
          2. Solution
          3. Discussion
            1. When and how to escape characters in replacement text
            2. .NET and JavaScript
            3. Java
            4. PHP
            5. Perl
            6. Python and Ruby
            7. More escape rules for string literals
          4. See Also
        20. 2.20. Insert the Regex Match into the Replacement Text
          1. Problem
          2. Solution
            1. Regular expression
            2. Replacement
          3. Discussion
          4. See Also
        21. 2.21. Insert Part of the Regex Match into the Replacement Text
          1. Problem
          2. Solution
            1. Regular expression
            2. Replacement
          3. Discussion
            1. Replacements using capturing groups
            2. $10 and higher
            3. References to nonexistent groups
          4. Solution Using Named Capture
            1. Regular expression
            2. Replacement
            3. Flavors that support named capture
          5. See Also
        22. 2.22. Insert Match Context into the Replacement Text
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
      5. 3. Programming with Regular Expressions
        1. Programming Languages and Regex Flavors
          1. Languages Covered in This Chapter
          2. More Programming Languages
        2. 3.1. Literal Regular Expressions in Source Code
          1. Problem
          2. Solution
            1. C#
            2. VB.NET
            3. Java
            4. JavaScript
            5. PHP
            6. Perl
            7. Python
            8. Ruby
          3. Discussion
            1. C#
            2. VB.NET
            3. Java
            4. JavaScript
            5. PHP
            6. Perl
            7. Python
            8. Ruby
          4. See Also
        3. 3.2. Import the Regular Expression Library
          1. Problem
          2. Solution
            1. C#
            2. VB.NET
            3. Java
            4. Python
          3. Discussion
            1. C#
            2. VB.NET
            3. Java
            4. JavaScript
            5. PHP
            6. Perl
            7. Python
            8. Ruby
        4. 3.3. Creating Regular Expression Objects
          1. Problem
          2. Solution
            1. C#
            2. VB.NET
            3. Java
            4. JavaScript
            5. Perl
            6. Python
            7. Ruby
          3. Discussion
            1. .NET
            2. Java
            3. JavaScript
            4. PHP
            5. Perl
            6. Python
            7. Ruby
          4. Compiling a Regular Expression Down to CIL
            1. C#
            2. VB.NET
          5. Discussion
          6. See Also
        5. 3.4. Setting Regular Expression Options
          1. Problem
          2. Solution
            1. C#
            2. VB.NET
            3. Java
            4. JavaScript
            5. PHP
            6. Perl
            7. Python
            8. Ruby
          3. Discussion
            1. .NET
            2. Java
            3. JavaScript
            4. PHP
            5. Perl
            6. Python
            7. Ruby
          4. Additional Language-Specific Options
            1. .NET
            2. Java
            3. JavaScript
            4. PHP
            5. Perl
            6. Python
            7. Ruby
          5. See Also
        6. 3.5. Test Whether a Match Can Be Found Within a Subject String
          1. Problem
          2. Solution
            1. C#
            2. VB.NET
            3. Java
            4. JavaScript
            5. PHP
            6. Perl
            7. Python
            8. Ruby
          3. Discussion
            1. C# and VB.NET
            2. Java
            3. JavaScript
            4. PHP
            5. Perl
            6. Python
            7. Ruby
          4. See Also
        7. 3.6. Test Whether a Regex Matches the Subject String Entirely
          1. Problem
          2. Solution
            1. C#
            2. VB.NET
            3. Java
            4. JavaScript
            5. PHP
            6. Perl
            7. Python
            8. Ruby
          3. Discussion
            1. C# and VB.NET
            2. Java
            3. JavaScript
            4. PHP
            5. Perl
            6. Python
            7. Ruby
          4. See Also
        8. 3.7. Retrieve the Matched Text
          1. Problem
          2. Solution
            1. C#
            2. VB.NET
            3. Java
            4. JavaScript
            5. PHP
            6. Perl
            7. Python
            8. Ruby
          3. Discussion
            1. .NET
            2. Java
            3. JavaScript
            4. PHP
            5. Perl
            6. Python
            7. Ruby
          4. See Also
        9. 3.8. Determine the Position and Length of the Match
          1. Problem
          2. Solution
            1. C#
            2. VB.NET
            3. Java
            4. JavaScript
            5. PHP
            6. Perl
            7. Python
            8. Ruby
          3. Discussion
            1. .NET
            2. Java
            3. JavaScript
            4. PHP
            5. Perl
            6. Python
            7. Ruby
          4. See Also
        10. 3.9. Retrieve Part of the Matched Text
          1. Problem
          2. Solution
            1. C#
            2. VB.NET
            3. Java
            4. JavaScript
            5. PHP
            6. Perl
            7. Python
            8. Ruby
          3. Discussion
            1. .NET
            2. Java
            3. JavaScript
            4. PHP
            5. Perl
            6. Python
            7. Ruby
          4. Named Capture
            1. C#
            2. VB.NET
            3. PHP
            4. Perl
            5. Python
          5. See Also
        11. 3.10. Retrieve a List of All Matches
          1. Problem
          2. Solution
            1. C#
            2. VB.NET
            3. Java
            4. JavaScript
            5. PHP
            6. Perl
            7. Python
            8. Ruby
          3. Discussion
            1. .NET
            2. Java
            3. JavaScript
            4. PHP
            5. Perl
            6. Python
            7. Ruby
          4. See Also
        12. 3.11. Iterate over All Matches
          1. Problem
          2. Solution
            1. C#
            2. VB.NET
            3. Java
            4. JavaScript
            5. PHP
            6. Perl
            7. Python
            8. Ruby
          3. Discussion
            1. .NET
            2. Java
            3. JavaScript
            4. PHP
            5. Perl
            6. Python
            7. Ruby
          4. See Also
        13. 3.12. Validate Matches in Procedural Code
          1. Problem
          2. Solution
            1. C#
            2. VB.NET
            3. Java
            4. JavaScript
            5. PHP
            6. Perl
            7. Python
            8. Ruby
          3. Discussion
          4. See Also
        14. 3.13. Find a Match Within Another Match
          1. Problem
          2. Solution
            1. C#
            2. VB.NET
            3. Java
            4. JavaScript
            5. PHP
            6. Perl
            7. Python
            8. Ruby
          3. Discussion
          4. See Also
        15. 3.14. Replace All Matches
          1. Problem
          2. Solution
            1. C#
            2. VB.NET
            3. Java
            4. JavaScript
            5. PHP
            6. Perl
            7. Python
            8. Ruby
          3. Discussion
            1. .NET
            2. Java
            3. JavaScript
            4. PHP
            5. Perl
            6. Python
            7. Ruby
          4. See Also
        16. 3.15. Replace Matches Reusing Parts of the Match
          1. Problem
          2. Solution
            1. C#
            2. VB.NET
            3. Java
            4. JavaScript
            5. PHP
            6. Perl
            7. Python
            8. Ruby
          3. Discussion
            1. .NET
            2. Java
            3. JavaScript
            4. PHP
            5. Perl
            6. Python
            7. Ruby
          4. Named Capture
            1. C#
            2. VB.NET
            3. PHP
            4. Perl
            5. Python
            6. Ruby
          5. See Also
        17. 3.16. Replace Matches with Replacements Generated in Code
          1. Problem
          2. Solution
            1. C#
            2. VB.NET
            3. Java
            4. JavaScript
            5. PHP
            6. Perl
            7. Python
            8. Ruby
          3. Discussion
            1. C#
            2. VB.NET
            3. Java
            4. JavaScript
            5. PHP
            6. Perl
            7. Python
            8. Ruby
          4. See Also
        18. 3.17. Replace All Matches Within the Matches of Another Regex
          1. Problem
          2. Solution
            1. C#
            2. VB.NET
            3. Java
            4. JavaScript
            5. PHP
            6. Perl
            7. Python
            8. Ruby
          3. Discussion
          4. See Also
        19. 3.18. Replace All Matches Between the Matches of Another Regex
          1. Problem
          2. Solution
            1. C#
            2. VB.NET
            3. Java
            4. JavaScript
            5. PHP
            6. Perl
            7. Python
            8. Ruby
          3. Discussion
            1. Perl and Ruby
            2. Python
          4. See Also
        20. 3.19. Split a String
          1. Problem
          2. Solution
            1. C#
            2. VB.NET
            3. Java
            4. JavaScript
            5. PHP
            6. Perl
            7. Python
            8. Ruby
          3. Discussion
            1. C# and VB.NET
            2. Java
            3. JavaScript
            4. PHP
            5. Perl
            6. Python
            7. Ruby
          4. See Also
        21. 3.20. Split a String, Keeping the Regex Matches
          1. Problem
          2. Solution
            1. C#
            2. VB.NET
            3. Java
            4. JavaScript
            5. PHP
            6. Perl
            7. Python
            8. Ruby
          3. Discussion
            1. .NET
            2. Java
            3. JavaScript
            4. PHP
            5. Perl
            6. Python
            7. Ruby
          4. See Also
        22. 3.21. Search Line by Line
          1. Problem
          2. Solution
            1. C#
            2. VB.NET
            3. Java
            4. JavaScript
            5. PHP
            6. Perl
            7. Python
            8. Ruby
          3. Discussion
          4. See Also
      6. 4. Validation and Formatting
        1. 4.1. Validate Email Addresses
          1. Problem
          2. Solution
            1. Simple
            2. Simple, with restrictions on characters
            3. Simple, with all characters
            4. No leading, trailing, or consecutive dots
            5. Top-level domain has two to six letters
          3. Discussion
            1. About email addresses
            2. Regular expression syntax
            3. Building a regex step-by-step
          4. Variations
          5. See Also
        2. 4.2. Validate and Format North American Phone Numbers
          1. Problem
          2. Solution
            1. Regular expression
            2. Replacement
            3. C#
            4. JavaScript
            5. Other programming languages
          3. Discussion
          4. Variations
            1. Eliminate invalid phone numbers
            2. Find phone numbers in documents
            3. Allow a leading “1”
            4. Allow seven-digit phone numbers
          5. See Also
        3. 4.3. Validate International Phone Numbers
          1. Problem
          2. Solution
            1. Regular expression
            2. JavaScript
            3. Other programming languages
          3. Discussion
          4. Variations
            1. Validate international phone numbers in EPP format
          5. See Also
        4. 4.4. Validate Traditional Date Formats
          1. Problem
          2. Solution
          3. Discussion
          4. Variations
          5. See Also
        5. 4.5. Accurately Validate Traditional Date Formats
          1. Problem
          2. Solution
            1. C#
            2. Perl
            3. Pure regular expression
          3. Discussion
          4. See Also
        6. 4.6. Validate Traditional Time Formats
          1. Problem
          2. Solution
          3. Discussion
          4. Variations
          5. See Also
        7. 4.7. Validate ISO 8601 Dates and Times
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        8. 4.8. Limit Input to Alphanumeric Characters
          1. Problem
          2. Solution
            1. Regular expression
            2. Ruby
            3. Other programming languages
          3. Discussion
          4. Variations
            1. Limit input to ASCII characters
            2. Limit input to ASCII non-control characters and line breaks
            3. Limit input to shared ISO-8859-1 and Windows-1252 characters
            4. Limit input to alphanumeric characters in any language
          5. See Also
        9. 4.9. Limit the Length of Text
          1. Problem
          2. Solution
            1. Regular expression
            2. Perl
            3. Other programming languages
          3. Discussion
          4. Variations
            1. Limit the length of an arbitrary pattern
            2. Limit the number of nonwhitespace characters
            3. Limit the number of words
          5. See Also
        10. 4.10. Limit the Number of Lines in Text
          1. Problem
          2. Solution
            1. Regular expression
            2. PHP (PCRE)
            3. Other programming languages
          3. Discussion
          4. Variations
            1. Working with esoteric line separators
          5. See Also
        11. 4.11. Validate Affirmative Responses
          1. Problem
          2. Solution
            1. Regular expression
            2. JavaScript
            3. Other programming languages
          3. Discussion
          4. See Also
        12. 4.12. Validate Social Security Numbers
          1. Problem
          2. Solution
            1. Regular expression
            2. Python
            3. Other programming languages
          3. Discussion
          4. Variations
            1. Find Social Security numbers in documents
          5. See Also
        13. 4.13. Validate ISBNs
          1. Problem
          2. Solution
            1. Regular expressions
            2. JavaScript
            3. Python
            4. Other programming languages
          3. Discussion
            1. ISBN-10 checksum
            2. ISBN-13 checksum
          4. Variations
            1. Find ISBNs in documents
            2. Eliminate incorrect ISBN identifiers
          5. See Also
        14. 4.14. Validate ZIP Codes
          1. Problem
          2. Solution
            1. Regular expression
            2. VB.NET
            3. Other programming languages
          3. Discussion
          4. See Also
        15. 4.15. Validate Canadian Postal Codes
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        16. 4.16. Validate U.K. Postcodes
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        17. 4.17. Find Addresses with Post Office Boxes
          1. Problem
          2. Solution
            1. Regular expression
            2. C#
            3. Other programming languages
          3. Discussion
          4. See Also
        18. 4.18. Reformat Names From “FirstName LastName” to “LastName, FirstName”
          1. Problem
          2. Solution
            1. Regular expression
            2. Replacement
            3. JavaScript
            4. Other programming languages
          3. Discussion
          4. Variations
            1. List surname particles at the beginning of the name
        19. 4.19. Validate Credit Card Numbers
          1. Problem
          2. Solution
            1. Strip spaces and hyphens
            2. Validate the number
            3. Example web page with JavaScript
          3. Discussion
            1. Strip spaces and hyphens
            2. Validate the number
            3. Incorporating the solution into a web page
          4. Extra Validation with the Luhn Algorithm
        20. 4.20. European VAT Numbers
          1. Problem
          2. Solution
            1. Strip whitespace and punctuation
            2. Validate the number
          3. Discussion
            1. Strip whitespace and punctuation
            2. Validate the number
          4. Variations
          5. See Also
      7. 5. Words, Lines, and Special Characters
        1. 5.1. Find a Specific Word
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        2. 5.2. Find Any of Multiple Words
          1. Problem
          2. Solution
            1. Using alternation
            2. Example JavaScript solution
          3. Discussion
            1. Using alternation
            2. Example JavaScript solution
          4. See Also
        3. 5.3. Find Similar Words
          1. Problem
          2. Solution
            1. Color or colour
            2. Bat, cat, or rat
            3. Words ending with “phobia”
            4. Steve, Steven, or Stephen
            5. Variations of “regular expression”
          3. Discussion
            1. Use word boundaries to match complete words
            2. Color or colour
            3. Bat, cat, or rat
            4. Words ending with “phobia”
            5. Steve, Steven, or Stephen
            6. Variations of “regular expression”
          4. See Also
        4. 5.4. Find All Except a Specific Word
          1. Problem
          2. Solution
          3. Discussion
          4. Variations
            1. Find words that don’t contain another word
          5. See Also
        5. 5.5. Find Any Word Not Followed by a Specific Word
          1. Problem
          2. Solution
          3. Discussion
          4. Variations
          5. See Also
        6. 5.6. Find Any Word Not Preceded by a Specific Word
          1. Problem
          2. Solution
            1. Lookbehind you
            2. Words not preceded by “cat”
            3. Simulate lookbehind
          3. Discussion
            1. Fixed, finite, and infinite length lookbehind
            2. Simulate lookbehind
          4. Variations
          5. See Also
        7. 5.7. Find Words Near Each Other
          1. Problem
          2. Solution
          3. Discussion
          4. Variations
            1. Using a conditional
            2. Match three or more words near each other
              1. Exponentially increasing permutations
              2. The ugly solution
              3. Exploiting empty backreferences
              4. JavaScript backreferences by its own rules
            3. Multiple words, any distance from each other
          5. See Also
        8. 5.8. Find Repeated Words
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        9. 5.9. Remove Duplicate Lines
          1. Problem
          2. Solution
            1. Option 1: Sort lines and remove adjacent duplicates
            2. Option 2: Keep the last occurrence of each duplicate line in an unsorted file
            3. Option 3: Keep the first occurrence of each duplicate line in an unsorted file
          3. Discussion
            1. Option 1: Sort lines and remove adjacent duplicates
            2. Option 2: Keep the last occurrence of each duplicate line in an unsorted file
            3. Option 3: Keep the first occurrence of each duplicate line in an unsorted file
          4. See Also
        10. 5.10. Match Complete Lines That Contain a Word
          1. Problem
          2. Solution
          3. Discussion
          4. Variations
          5. See Also
        11. 5.11. Match Complete Lines That Do Not Contain a Word
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        12. 5.12. Trim Leading and Trailing Whitespace
          1. Problem
          2. Solution
          3. Discussion
          4. Variations
          5. See Also
        13. 5.13. Replace Repeated Whitespace with a Single Space
          1. Problem
          2. Solution
            1. Clean any whitespace characters
            2. Clean horizontal whitespace characters
          3. Discussion
            1. Clean any whitespace characters
            2. Clean horizontal whitespace characters
          4. See Also
        14. 5.14. Escape Regular Expression Metacharacters
          1. Problem
          2. Solution
            1. Built-in solutions
            2. Regular expression
            3. Replacement
            4. Example JavaScript function
          3. Discussion
          4. Variations
          5. See Also
      8. 6. Numbers
        1. 6.1. Integer Numbers
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        2. 6.2. Hexadecimal Numbers
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        3. 6.3. Binary Numbers
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        4. 6.4. Strip Leading Zeros
          1. Problem
          2. Solution
            1. Regular expression
            2. Replacement
            3. Getting the numbers in Perl
            4. Stripping leading zeros in PHP
          3. Discussion
          4. See Also
        5. 6.5. Numbers Within a Certain Range
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        6. 6.6. Hexadecimal Numbers Within a Certain Range
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        7. 6.7. Floating Point Numbers
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        8. 6.8. Numbers with Thousand Separators
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        9. 6.9. Roman Numerals
          1. Problem
          2. Solution
          3. Discussion
          4. Convert Roman Numerals to Decimal
          5. See Also
      9. 7. URLs, Paths, and Internet Addresses
        1. 7.1. Validating URLs
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        2. 7.2. Finding URLs Within Full Text
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        3. 7.3. Finding Quoted URLs in Full Text
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        4. 7.4. Finding URLs with Parentheses in Full Text
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        5. 7.5. Turn URLs into Links
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        6. 7.6. Validating URNs
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        7. 7.7. Validating Generic URLs
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        8. 7.8. Extracting the Scheme from a URL
          1. Problem
          2. Solution
            1. Extract the scheme from a URL known to be valid
            2. Extract the scheme while validating the URL
          3. Discussion
          4. See Also
        9. 7.9. Extracting the User from a URL
          1. Problem
          2. Solution
            1. Extract the user from a URL known to be valid
            2. Extract the user while validating the URL
          3. Discussion
          4. See Also
        10. 7.10. Extracting the Host from a URL
          1. Problem
          2. Solution
            1. Extract the host from a URL known to be valid
            2. Extract the host while validating the URL
          3. Discussion
          4. See Also
        11. 7.11. Extracting the Port from a URL
          1. Problem
          2. Solution
            1. Extract the port from a URL known to be valid
            2. Extract the host while validating the URL
          3. Discussion
          4. See Also
        12. 7.12. Extracting the Path from a URL
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        13. 7.13. Extracting the Query from a URL
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        14. 7.14. Extracting the Fragment from a URL
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        15. 7.15. Validating Domain Names
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        16. 7.16. Matching IPv4 Addresses
          1. Problem
          2. Solution
            1. Regular expression
            2. Perl
          3. Discussion
          4. See Also
        17. 7.17. Matching IPv6 Addresses
          1. Problem
          2. Solution
            1. Standard notation
            2. Mixed notation
            3. Standard or mixed notation
            4. Compressed notation
            5. Compressed mixed notation
            6. Standard, mixed, or compressed notation
          3. Discussion
            1. Standard notation
            2. Mixed notation
            3. Standard or mixed notation
            4. Compressed notation
            5. Compressed mixed notation
            6. Standard, mixed, or compressed notation
          4. See Also
        18. 7.18. Validate Windows Paths
          1. Problem
          2. Solution
            1. Drive letter paths
            2. Drive letter and UNC paths
            3. Drive letter, UNC, and relative paths
          3. Discussion
            1. Drive letter paths
            2. Drive letter and UNC paths
            3. Drive letter, UNC, and relative paths
          4. See Also
        19. 7.19. Split Windows Paths into Their Parts
          1. Problem
          2. Solution
            1. Drive letter paths
            2. Drive letter and UNC paths
            3. Drive letter, UNC, and relative paths
          3. Discussion
            1. Drive letter paths
            2. Drive letter and UNC paths
            3. Drive letter, UNC, and relative paths
          4. See Also
        20. 7.20. Extract the Drive Letter from a Windows Path
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        21. 7.21. Extract the Server and Share from a UNC Path
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        22. 7.22. Extract the Folder from a Windows Path
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        23. 7.23. Extract the Filename from a Windows Path
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        24. 7.24. Extract the File Extension from a Windows Path
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        25. 7.25. Strip Invalid Characters from Filenames
          1. Problem
          2. Solution
            1. Regular expression
            2. Replacement
          3. Discussion
          4. See Also
      10. 8. Markup and Data Interchange
        1. 8.1. Find XML-Style Tags
          1. Problem
          2. Solution
            1. Quick and dirty
            2. Allow > in attribute values
            3. (X)HTML tags (loose)
            4. (X)HTML tags (strict)
            5. XML tags (strict)
          3. Discussion
            1. A few words of caution
            2. Quick and dirty
            3. Allow > in attribute values
            4. (X)HTML tags (loose)
            5. (X)HTML tags (strict)
            6. XML tags (strict)
            7. Skip tricky (X)HTML and XML sections
              1. Outer regex for (X)HTML
              2. Outer regex for XML
          4. Variations
            1. Match valid HTML 4 tags
          5. See Also
        2. 8.2. Replace <b> Tags with <strong>
          1. Problem
          2. Solution
          3. Discussion
          4. Variations
            1. Replace a list of tags
          5. See Also
        3. 8.3. Remove All XML-Style Tags Except <em> and <strong>
          1. Problem
          2. Solution
            1. Solution 1: Match tags except <em> and <strong>
            2. Solution 2: Match tags except <em> and <strong>, and any tags that contain attributes
          3. Discussion
          4. Variations
            1. Whitelist specific attributes
          5. See Also
        4. 8.4. Match XML Names
          1. Problem
          2. Solution
            1. XML 1.0 names (approximate)
            2. XML 1.1 names (exact)
          3. Discussion
            1. XML 1.0 names
            2. XML 1.1 names
          4. Variations
          5. See Also
        5. 8.5. Convert Plain Text to HTML by Adding <p> and <br> Tags
          1. Problem
          2. Solution
            1. Step 1: Replace HTML special characters with character entity references
            2. Step 2: Replace all line breaks with <br>
            3. Step 3: Replace double <br> tags with </p><p>
            4. Step 4: Wrap the entire string with <p>⋯</p>
            5. JavaScript example
          3. Discussion
            1. Step 1: Replace HTML special characters with character entity references
            2. Step 2: Replace all line breaks with <br>
            3. Step 3: Replace double <br> tags with </p><p>
            4. Step 4: Wrap the entire string with <p>⋯</p>
          4. See Also
        6. 8.6. Find a Specific Attribute in XML-Style Tags
          1. Problem
          2. Solution
            1. Tags that contain an id attribute (quick and dirty)
            2. Tags that contain an id attribute (more reliable)
            3. <div> tags that contain an id attribute
            4. Tags that contain an id attribute with the value “my-id”
            5. Tags that contain “my-class” within their class attribute value
          3. Discussion
          4. See Also
        7. 8.7. Add a cellspacing Attribute to <table> Tags That Do Not Already Include It
          1. Problem
          2. Solution
            1. Regex 1: Simplistic solution
            2. Regex 2: More reliable solution
            3. Insert the new attribute
          3. Discussion
          4. See Also
        8. 8.8. Remove XML-Style Comments
          1. Problem
          2. Solution
          3. Discussion
            1. How it works
            2. When comments can’t be removed
          4. Variations
            1. Find valid XML-style comments
            2. Find C-style comments
          5. See Also
        9. 8.9. Find Words Within XML-Style Comments
          1. Problem
          2. Solution
            1. Two-step approach
            2. Single-step approach
          3. Discussion
            1. Two-step approach
            2. Single-step approach
          4. Variations
          5. See Also
        10. 8.10. Change the Delimiter Used in CSV Files
          1. Problem
          2. Solution
            1. JavaScript example
          3. Discussion
          4. See Also
        11. 8.11. Extract CSV Fields from a Specific Column
          1. Problem
          2. Solution
            1. JavaScript example
          3. Discussion
          4. Variations
            1. Match a CSV record and capture the field in column 1 to backreference 1
            2. Match a CSV record and capture the field in column 2 to backreference 1
            3. Match a CSV record and capture the field in column 3 or higher to backreference 1
            4. Replacement string
        12. 8.12. Match INI Section Headers
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        13. 8.13. Match INI Section Blocks
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
        14. 8.14. Match INI Name-Value Pairs
          1. Problem
          2. Solution
          3. Discussion
          4. See Also
      11. Index
      12. About the Authors
      13. Colophon
      14. SPECIAL OFFER: Upgrade this ebook with O’Reilly

    Product information

    • Title: Regular Expressions Cookbook
    • Author(s):
    • Release date: May 2009
    • Publisher(s): O'Reilly Media, Inc.
    • ISBN: 9780596520687