22 Regular Expressions
Regular expressions, commonly referred to as “regex”, are an abstract specification of creating patterns to do advanced search and replace operations on strings. They are like specification for a mini language for string (text) operations.
There are multiple implementations of this abstract specification with many common features but subtle differences for different use cases.
Python itself has multiple different flavors/implementations of regular expressions.
- Generic pattern matching in Python
re
module in standard library: simple tutorial, exhaustive documentationregex
package in PyPI, extension ofre
module: homepage
- Unix style pathname pattern matching, used in bash commands
glob
modulepathlib
module in standard library providesPath.glob(ptrn)
The key idea is to provide much more powerful pattern searching than regular string methods provide. For example, to search all pdf files in a folder you can use *.pdf
pattern to do a search on all file names. Or, ^project
pattern to find all files starting with the word project.
Regular expressions provide much more advanced features but are slower compared to regular string methods. The python tutorial has a section dedicated to describe the choice between using string methods and regular expressions.
Regular expressions can get very messy very fast. So, in the beginning, use only basics of regex, when it is clear that string method is not available or will be too complex. The most common situation is to use regex for path related operations.