If one wants to match a pattern of characters, then the simplest way to express the characters to be matched
is just to put them down. For example, if you want to match the characters main, then you can use
the regular expression:
main
Blanks are treated just like any other character in a regular expression and regular expressions are case-sensitive (uppercase and lowercase letters are different).
main
but not when it appears as part of another word like domain or mainEvent.
Certain characters have special meanings when used in regular expressions. These are:
\ ^ $ . [ ] * + ? ( ) |
If you need to match a pattern that includes some of these characters, then you can force the character NOT to be treated as a metacharacter by placing the \ character in front of it.
For example, to match the characters $main, you must use the regular expression:
\$main
In a regular expression, the . (period or dot) character will take the place of any character. If you want to match the character patterns:
message1
message2
message8
messageX
You can simply use the regular expression:
message.
If you want to match a character pattern but only if it occurs at the beginning of a line, then place the ^
character before the pattern. For example, to match the word apple but only if it appears at the beginning
of the line, you would use the regular expression:
^apple
If you want to match a character pattern but only if it occurs at the end of a line, then place a $
character after the pattern. For example, to match the word event but only if it occurs at the end
of a line, you would use the regular expression:
event$
Note: The regular expression: ^$ will match any blank line.
The '*' character is used to match zero or more of the preceding character or regular expression.
For example, the regular expression:
file2*
will match any of the following:
file file2 file22
If you wanted to match the patterns
LwindowMargin RwindowMargin
but not the patterns
TwindowMargin BwindowMargin
you cannot use the tools we have so far except to list the acceptable patterns. The left and right bracket characters allow us to handle this case. These brackets are used to enclose the definition of a set of characters that we wish to match in a regular expression.
For example, to match any of the letters L or R, we can use the regular expression:
[LR]
We can use this now to get the regular expression we could use above:
[LR]windowMargin
The brackets can be used anywhere in a regular expression. For example to match any pattern that starts with
the characters icon, followed by the numbers 1, 2, or 3, and the characters file, we
could use the regular expression:
icon[123]file
There are some shorcuts that we can use with the brackets. We can specify a range of characters (successive
ASCII codes) by using the - symbol. For example, if the numerical part of the previous example could
be any number from 0 to 9, then the appropriate regular expression would be:
icon[0-9]file
You can have several components within the brackets. For example
[a-z123]
will match any lowercase letter or the digits 1, 2, or 3.
There is one more shortcut that you can use. Inside the brackets (but not outside), when you use the ^ character, it is interpreted to mean that you want the complement of the following characters within the brackets, that is everything except the characters shown.
For example if you want to match any pattern that does not include a digit, then you can use:
[^0-9]
Some Unix utilities allow some extensions to what we have seen so far for working with regular expressions.
For example, if you want to allow zero or more repetitions of the pattern a4 then you could
use the regular expression:
(a4)*
Likewise, in awk, you can use + to mean match the previous character or regular (sub)expression
one or more times (instead of zero or more times).
Also you can use ? to mean match the previous character or regular (sub)expression zero or one
time only.
Finally you can use the | character as a logical "or" for matching either of the regular
(sub)expressions on either side.
Here are some useful regular expressions that you may wish to use:
[A-Za-z][A-Za-z]*
The above will match any string of characters that don't have digits.
[+\-][0-9][0-9]*
The above will match any integer with a preceding + or -.
.*
This will match any string of characters.
. Is a one character re that matches any character
* Matches 0 or more of the preceding one character re
[ ]" matches any enclosed character. A range can be specified
with a "-". [a-d] == [abcd]. If the first character following
a "[" is a "^" then any character not enclosed is matched.
^ at the beginning of the re forces the re to match at the beginning of a line.
$ at the end of the re forces the re to match the final segment of a line.
* . ^ $ [ ] /
+ Matches 1 or more of the preceding regular expression
? Matches 0 or 1 of the preceding regular expression
| Between two regular expressions | will match if either expression matches.
( ) Expressions may be enclosed in parentheses for grouping
Note: Not all re rules work with a grep utilities. See the man pages when in doubt.