# | Python regex | What it matches |
---|---|---|
(?:\(\d+\))?\d+(?:-\d+)* use this |
Phone numbers like (0123)123456 , (0123)123-456-789 , 123-456 , and 123456 . |
|
[\w.-]+@[\w.-]+\.[a-zA-Z]{2,3} use this |
Email addresses | |
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} use this |
IPv4 addresses | |
\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} use this |
Timestamps like 2006-08-14 02:34:56 . |
|
(\$\d{1,3}(?:,\d{3})*(?:\.\d+)?) use this |
Comma-separated currency like $234 , $1,234 , and $1,234.56 . |
|
\d{2,10} use this |
Numbers having 2 to 10 digits. |
Find matching text using Python regex online
awk isn't handy if you want to extract different columns on different rows.
sed is line-oriented, but you want to extract text no matter whether they are in a line or not.
Regex syntax of awk/grep/sed varies slightly, which is a painful to make it right sometimes.
My use cases (see details in this blog post):
- Find all words beginning with a specific prefix.
- Extract fields out from structure text, like Protobuf Message definitions.
- Find specific attributes like hrefs of links in HTML/XML.
Additional resources:
- Python Regex Cheat Sheet | ShortcutFoo
- Pythex: a Python regular expression editor, useful to test regex patterns