Book: Mastering Regular Expressions
Resources
Modules
Checklist
- 1: Introduction to Regular Expressions
- Solving Real Problems
- Regular Expressions as a Language
- The Filename Analogy
- The Language Analogy
- The Regular-Expression Frame of Mind
- If You Have Some Regular-Expression Experience
- Searching Text Files: Egrep
- Egrep Metacharacters
- Start and End of the Line
- Character Classes
- Matching Any Character with Dot
- Alternation
- Ignoring Differences in Capitalization
- Word Boundaries
- In a Nutshell
- Optional Items
- Other Quantifiers: Repetition
- Parentheses and Backreferences
- The Great Escape
- Expanding the Foundation
- Linguistic Diversification
- The Goal of a Regular Expression
- A Few More Examples
- Regular Expression Nomenclature
- Improving on the Status Quo
- Summary
- Personal Glimpses
- 2: Extended Introductory Examples
- About the Examples
- A Short Introduction to Perl
- Matching Text with Regular Expressions
- Toward a More Real-World Example
- Side Effects of a Successful Match
- Intertwined Regular Expressions
- Intermission
- Modifying Text with Regular Expressions
- Example: Form Letter
- Example: Prettifying a Stock Price
- Automated Editing
- A Small Mail Utility
- Adding Commas to a Number with Lookaround
- Text-to-HTML Conversion
- That Doubled-Word Thing
- About the Examples
- 3: Overview of Regular Expression Features and Flavors
- A Casual Stroll Across the Regex Landscape
- The Origins of Regular Expressions
- At a Glance
- Care and Handling of Regular Expressions
- Integrated Handling
- Procedural and Object-Oriented Handling
- A Search-and-Replace Example
- Search and Replace in Other Languages
- Care and Handling: Summary
- Strings, Character Encodings, and Modes
- Strings as Regular Expressions
- Character-Encoding Issues
- Unicode
- Regex Modes and Match Modes
- Common Metacharacters and Features
- Character Representations
- Character Classes and Class-Like Constructs
- Anchors and Other "Zero-Width Assertions"
- Comments and Mode Modifiers
- Grouping, Capturing, Conditionals, and Control
- Guide to the Advanced Chapters
- A Casual Stroll Across the Regex Landscape
- 4: The Mechanics of Expression Processing
- Start Your Engines!
- Two Kinds of Engines
- New Standards
- Regex Engine Types
- From the Department of Redundancy Department
- Testing the Engine Type
- Match Basics
- About the Examples
- Rule 1: The Match That Begins Earliest Wins
- Engine Pieces and Parts
- Rule 2: The Standard Quantifiers Are Greedy
- Regex-Directed Versus Text-Directed
- NFA Engine: Regex-Directed
- DFA Engine: Text-Directed
- First Thoughts: NFA and DFA in Comparison
- Backtracking
- A Really Crummy Analogy
- Two Important Points on Backtracking
- Saved States
- Backtracking and Greediness
- More About Greediness and Backtracking
- Problems of Greediness
- Multi-Character Quotes
- Using Lazy Quantifiers
- Greediness and Laziness Always Favor a Match
- The Essence of Greediness, Laziness, and Backtracking
- Possessive Quantifiers and Atomic Grouping
- Possessive Quantifiers, ?+, *+, ++, and {m,n}+
- The Backtracking of Lookaround
- Is Alternation Greedy?
- Taking Advantage of Ordered Alternation
- NFA, DFA, and POSIX
- The Longest-Leftmost
- POSIX and the Longest-Leftmost Rule
- Speed and Efficiency
- Summary: NFA and DFA in Comparison
- Summary
- Start Your Engines!
- 5: Practical Regex Techniques
- Regex Balancing Act
- A Few Short Examples
- Continuing with Continuation Lines
- Matching an IP Address
- Working with Filenames
- Matching Balanced Sets of Parentheses
- Watching Out for Unwanted Matches
- Matching Delimited Text
- Knowing Your Data and Making Assumptions
- Stripping Leading and Trailing Whitespace
- HTML-Related Examples
- Matching an HTML Tag
- Matching an HTML Link
- Examining an HTTP URL
- Validating a Hostname
- Plucking Out a URL in the Real World
- Extended Examples
- Keeping in Sync with Your Data
- Parsing CSV Files
- 6: Crafting an Efficient Expression
- A Sobering Example
- A Simple Change Placing Your Best Foot Forward
- Efficiency Versus Correctness
- Advancing Further Localizing the Greediness
- Reality Check
- A Global View of Backtracking
- More Work for a POSIX NFA
- Work Required During a Non-Match
- Being More Specific
- Alternation Can Be Expensive
- Benchmarking
- Know What You're Measuring
- Benchmarking with PHP
- Benchmarking with Java
- Benchmarking with VB.NET
- Benchmarking with Ruby
- Benchmarking with Python
- Benchmarking with Tcl
- Common Optimizations
- No Free Lunch
- Everyone's Lunch is Different
- The Mechanics of Regex Application
- Pre-Application Optimizations
- Optimizations with the Transmission
- Optimizations of the Regex Itself
- Techniques for Faster Expressions
- Common Sense Techniques
- Expose Literal Text
- Expose Anchors
- Lazy Versus Greedy: Be Specific
- Split Into Multiple Regular Expressions
- Mimic Initial-Character Discrimination
- Use Atomic Grouping and Possessive Quantifiers
- Lead the Engine to a Match
- Unrolling the Loop
- Method 1: Building a Regex From Past Experiences
- The Real Unrolling-the-Loop Pattern
- Method 2: A Top-Down View
- Method 3: An Internet Hostname
- Observations
- Using Atomic Grouping and Possessive Quantifiers
- Short Unrolling Examples
- Unrolling C Comments
- The Freeflowing Regex
- A Helping Hand to Guide the Match
- A Well-Guided Regex is a Fast Regex
- Wrapup
- In Summary: Think!
- A Sobering Example
- 7: Perl
- Regular Expressions as a Language Component
- Perl's Greatest Strength
- Perl's Greatest Weakness
- Perl's Regex Flavor
- Regex Operands and Regex Literals
- How Regex Literals Are Parsed
- Regex Modifiers
- Regex-Related Perlisms
- Expression Context
- Dynamic Scope and Regex Match Effects
- Special Variables Modified by a Match
- The qr/.../ Operator and Regex Objects
- Building and Using Regex Objects
- Viewing Regex Objects
- Using Regex Objects for Efficiency
- The Match Operator
- Match's Regex Operand
- Specifying the Match Target Operand
- Different Uses of the Match Operator
- Iterative Matching: Scalar Context, with /g
- The Match Operator's Environmental Relations
- The Substitution Operator
- The Replacement Operand
- The /e Modifier
- Context and Return Value
- The Split Operator
- Basic Split
- Returning Empty Elements
- Split's Special Regex Operands
- Split's Match Operand with Capturing Parentheses
- Fun with Perl Enhancements
- Using a Dynamic Regex to Match Nested Pairs
- Using the Embedded-Code Construct
- Using local in an Embedded-Code Construct
- A Warning About Embedded Code and my Variables
- Matching Nested Constructs with Embedded Code
- Overloading Regex Literals
- Problems with Regex-Literal Overloading
- Mimicking Named Capture
- Perl Efficiency Issues
- There's More Than One Way to Do It
- Regex Compilation, the /o Modifier, qr and Efficiency
- Understanding the Pre-Match Copy
- The Study Function
- Benchmarking
- Regex Debugging Information
- Final Comments
- Regular Expressions as a Language Component
- 8: Java
- Java's Regex Flavor
- Java Support for \p{} and \P{}
- Unicode Line Terminators
- Using java.util.regex
- The Pattern.compile() Factory
- Pattern's matcher method
- The Matcher Object
- Applying the Regex
- Querying Match Results
- Simple Search and Replace
- Advanced Search and Replace
- In-Place Search and Replace
- The Matcher's Region
- Method Chaining
- Methods for Building a Scanner
- Other Matcher Methods
- Other Pattern Methods
- Pattern's split Method, with One Argument
- Pattern's split Method, with Two Arguments
- Additional Examples
- Adding Width and Height Attributes to Image Tags
- Validating HTML with Multiple Patterns Per Matcher
- Parsing Comma-Separated Values (CSV) Text
- Java Version Differences
- Differences Between 1.4.2 and 1.5.0
- Differences Between 1.5.0 and 1.6
- Java's Regex Flavor
- 9: .NET
- .NET's Regex Flavor
- Additional Comments on the Flavor
- Using .NET Regular Expressions
- Regex Quickstart
- Package Overview
- Core Object Overview
- Core Object Details
- Creating Regex Objects
- Using Regex Objects
- Using Match Objects
- Using Group Objects
- Static Convenience Functions
- Regex Caching
- Support Functions
- Advanced .NET
- Regex Assemblies
- Matching Nested Constructs
- Capture Objects
- .NET's Regex Flavor
- 10: PHP
- PHP's Regex Flavor
- The Preg Function Interface
- Pattern Arguments
- The Preg Functions
- preg_match
- preg_match_all
- preg_replace
- preg_replace_callback
- preg_split
- preg_grep
- preg_quote
- Missing Preg Functions
- preg_regex_to_pattern
- Syntax-Checking an Unknown Pattern Argument
- Syntax-Checking an Unknown Regex
- Recursive Expressions
- Matching Text with Nested Parentheses
- No Backtracking Into Recursion
- Matching a Set of Nested Parentheses
- PHP Efficiency Issues
- The S Pattern Modifier: Study
- Extended Examples
- CSV Parsing with PHP
- Checking Tagged Data for Proper Nesting