Regular Expression Basics
- Tony Mattke
- Scripts
- June 4, 2009
Before I even get started, I want to mention that not all regular expression metacharacters are supported in every application. Keep this in mind when building your matches.
Regular expressions are made up of normal characters and metacharacters. Normal characters include upper and lower case letters and numerals. The metacharacters have special meanings and can match any number of things.
In the simplest case, a regular expression looks like a standard search string. For example, the regular expression “test” contains no metacharacters. It will match “test” and “test123” but it will not match “Testing123”. Metacharacters help solve these simple dilemas, here is a table of such characters.
Regex | Match |
---|---|
. | Matches any single character i.e. b.. could match ban, boy, or b u, but not boot. |
$ | Matches the end of a line. For example, the regular expression home$ would match the end of the string I’m going home but not the string We’re gong home. |
^ | Matches the beginning of a line. For example, the regular expression ^With the would match the beginning of the string With the power of but would not match What and When the |
* | Matches zero or more occurences of the character immediately preceding. For example, the regular expression** .* **means match any number of any characters. |
\ | This is the quoting character, use it to treat the following character as an ordinary character. For example, $ is used to match the dollar sign character ($) rather than the end of a line. Similarly, the expression . is used to match the period character rather than any single character |
[] | Matches any one of the characters between the brackets. For example, the regular expression r[aou]t matches rat, rot, and rut, but not ret. Ranges of characters can specified by using a hyphen. For example, the regular expression [0-9] means match any digit. |
[^c1-c2] | Multiple ranges can be specified as well. The regular expression [A-Za-z] means match any upper or lower case letter. To match any character except those in the range, the complement range, use the caret as the first character after the opening bracket. For example, the expression [^269A-Z] will match any characters except 2, 6, 9, and upper case letters. |
< > | Matches the beginning (<) or end (>) or a word. For example, <the matches on “the” in the string “for the wise” but does not match “the” in “otherwise“. \ Not supported by all applications. |
( ) | Treat the expression between ( and ) as a group. Also, saves the characters matched by the expression into temporary holding areas. Up to nine pattern matches can be saved in a single regular expression. They can be referenced as \1 through \9. |
| | Or two conditions together. For example (him |
+ | Matches one or more occurences of the character or regular expression immediately preceding. For example, the regular expression 9+ matches 9, 99, 999. \ Not supported by all applications |
? | Matches 0 or 1 occurence of the character or regular expression immediately preceding. \ Not supported by all applications |
{i} | Match a specific number of instances or instances within a range of the preceding character. For example, the expression A[0-9]{3} will match “A” followed by exactly 3 digits. That is, it will match A123 but not A1234. The expression [0-9]{4,6} any sequence of 4, 5, or 6 digits. \ Not supported by all applications |
Thats all for now.. keep your eyes open for a follow up article on regular expressions