342 lines
8.2 KiB
Plaintext
342 lines
8.2 KiB
Plaintext
|
|
||
|
archie uses ed(1) regular expressions in a number of
|
||
|
commands.
|
||
|
|
||
|
A regular expression, on the one hand, is a string like
|
||
|
any other; a sequence of characters. On the other
|
||
|
hand, special characters within the string have certain
|
||
|
functions which make regular expressions useful when
|
||
|
trying to match portions of other strings. In the fol-
|
||
|
lowing discussion and examples, a string containing a
|
||
|
regular expression will be called the ``pattern'', and
|
||
|
the string against which it is to be matched is called
|
||
|
the ``reference string''.
|
||
|
|
||
|
Regular expressions allow one to search for "all strings
|
||
|
ending with the letters 'ize'" or "all strings beginning
|
||
|
with a number between 1 and 3 and ending in a comma".
|
||
|
|
||
|
In order to accomplish this, regular expressions co-opt
|
||
|
the use of some characters to have special meaning.
|
||
|
They also provide for these characters to lose their
|
||
|
special meaning if the user so desires. The rules for
|
||
|
regular expresssion are
|
||
|
|
||
|
|
||
|
c Any character c matches itself unless it has been
|
||
|
assigned other special meaning as listed below. Most
|
||
|
special characters can be escaped (made to lose its
|
||
|
special meaning), by placing the character '\' in front
|
||
|
of it. This doesn't apply to '{' which is non-special
|
||
|
until it is escaped. Thus although '*' normally has
|
||
|
special meaning the string '\*' matches itself.
|
||
|
|
||
|
Example:
|
||
|
|
||
|
The pattern
|
||
|
|
||
|
acdef
|
||
|
|
||
|
matches
|
||
|
|
||
|
s83acdeffff or acdefsecs or acdefsecs
|
||
|
|
||
|
but not
|
||
|
|
||
|
accdef or aacde1f
|
||
|
|
||
|
That is it will any string that contains ``acdef'' any-
|
||
|
where in the reference string.
|
||
|
|
||
|
Example:
|
||
|
|
||
|
Normally the characters '*' and '$' are special,
|
||
|
but the pattern
|
||
|
|
||
|
a\*bse\$
|
||
|
|
||
|
acts as above. That is any reference string containing
|
||
|
``*abse$'' as a substring will be flagged as a match.
|
||
|
|
||
|
|
||
|
|
||
|
. A period matches any character except the newline
|
||
|
character. This is known as the wildcard character.
|
||
|
|
||
|
Example:
|
||
|
|
||
|
The pattern
|
||
|
|
||
|
....
|
||
|
|
||
|
will match any 4 characters in the reference string,
|
||
|
except a newline character.
|
||
|
|
||
|
|
||
|
^ If `^' appears at the begining of the pattern then it
|
||
|
is said to ``anchor'' the match to the beginning of the
|
||
|
line. That is, the reference string must start with the
|
||
|
pattern following the `^'. If this character appears
|
||
|
anywhere else other than at the beginning of the line,
|
||
|
then it is no longer considered special, and matches
|
||
|
itself as any non-special character would. Similarly if
|
||
|
it starts a string but is escaped, it matches itself.
|
||
|
|
||
|
Example:
|
||
|
|
||
|
The pattern
|
||
|
|
||
|
^efghi
|
||
|
|
||
|
Will match
|
||
|
|
||
|
efghi or efghijlk
|
||
|
|
||
|
but not
|
||
|
|
||
|
abcefghi
|
||
|
|
||
|
That is the pattern will match only those reference
|
||
|
strings starting with ``efghi''. Just containing the
|
||
|
substring is not sufficient.
|
||
|
|
||
|
|
||
|
$ Occurring at the end of the pattern, this character
|
||
|
``anchors'' the pattern to the end of the line (refer-
|
||
|
ence string). A '$' occurring anywhere else in the pat-
|
||
|
tern is regarded as a non-special. Similarly if it is
|
||
|
at the end of the pattern but is escaped, it is non-
|
||
|
special.
|
||
|
|
||
|
Example:
|
||
|
|
||
|
The pattern
|
||
|
|
||
|
efghi$
|
||
|
|
||
|
Will match
|
||
|
|
||
|
efghi or abcdefghi
|
||
|
|
||
|
but not
|
||
|
|
||
|
efghijkl
|
||
|
|
||
|
That is the pattern will match only those reference
|
||
|
strings ending with ``efghi''. Just containing the sub-
|
||
|
string is not sufficient.
|
||
|
|
||
|
|
||
|
\< This sequence in the pattern causes the one character
|
||
|
regular expression following it only to match something
|
||
|
at the beginning of a word: the beginning of a line or
|
||
|
just before a letter, digit or underline character, or
|
||
|
just after a charcter which is not one of these.
|
||
|
|
||
|
Example:
|
||
|
|
||
|
The pattern
|
||
|
|
||
|
\<abc
|
||
|
|
||
|
would match the last 'abc' in the reference string
|
||
|
|
||
|
@hijabc#+abc
|
||
|
|
||
|
but not the first since the first 'abc' did not start
|
||
|
on a ``word'' boundary.
|
||
|
|
||
|
|
||
|
\> Constrains the one-character regular expression fol-
|
||
|
lowing it to be at the end of a ``word'' as defined
|
||
|
above.
|
||
|
|
||
|
|
||
|
[string]
|
||
|
|
||
|
One or more characters within square brackets. This
|
||
|
pattern matches any single character within the brack-
|
||
|
ets. The caret, '^', has a special meaning if it is the
|
||
|
first character in the series: the pattern will match
|
||
|
any character other than one in the list.
|
||
|
|
||
|
Example:
|
||
|
|
||
|
The pattern
|
||
|
|
||
|
[^abc]
|
||
|
|
||
|
Will match any character except 'a', 'b' or 'c'.
|
||
|
|
||
|
To match a right bracket, ']', in the list it must be
|
||
|
put first:
|
||
|
|
||
|
[]ab01]
|
||
|
|
||
|
For a caret, '^', in the list it can appear anywhere
|
||
|
but first.
|
||
|
|
||
|
In
|
||
|
|
||
|
[ab^01]
|
||
|
|
||
|
the caret loses its special meaning.
|
||
|
|
||
|
|
||
|
The '-' character is special within square brackets. It
|
||
|
is interpreted as a range of characters (in the ASCII
|
||
|
character set) and will match any single character
|
||
|
within that range. '[a-z]' matches any lower case
|
||
|
letter. The '-' can be made non special by placing it
|
||
|
first or last within the square brackets.
|
||
|
|
||
|
|
||
|
The characters '$', '*' and '.' are not special within
|
||
|
square brackets.
|
||
|
|
||
|
|
||
|
Example:
|
||
|
|
||
|
The pattern
|
||
|
|
||
|
[ab01]
|
||
|
|
||
|
matches a single occurence of a character from the set
|
||
|
'a', 'b', '0', '1'.
|
||
|
|
||
|
Example:
|
||
|
|
||
|
The pattern
|
||
|
|
||
|
[^ab01]
|
||
|
|
||
|
will match any single character other than 'a', 'b',
|
||
|
'0', '1'.
|
||
|
|
||
|
|
||
|
Example :
|
||
|
|
||
|
The pattern
|
||
|
|
||
|
[a0-9b]
|
||
|
|
||
|
which matches one of 'a', 'b' or a digit between 0 and
|
||
|
9 inclusive.
|
||
|
|
||
|
Example :
|
||
|
|
||
|
The pattern
|
||
|
|
||
|
[^a0-9b.$]
|
||
|
|
||
|
|
||
|
means any single character not 'a', 'b' '.' , '$' or a
|
||
|
digit between 0 and 9 inclusive.
|
||
|
|
||
|
* An asterisk following a regular expression in the pat-
|
||
|
tern has the effect of matching zero or more
|
||
|
occurrences of that expression.
|
||
|
|
||
|
Example:
|
||
|
|
||
|
The pattern
|
||
|
|
||
|
a*
|
||
|
|
||
|
means zero or more occurrences of the character 'a'.
|
||
|
|
||
|
|
||
|
Example:
|
||
|
|
||
|
The pattern
|
||
|
|
||
|
[A-Z]*
|
||
|
|
||
|
means zero or more occurrences of the upper case alpha-
|
||
|
bet.
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
\{m\}
|
||
|
|
||
|
\{m,\}
|
||
|
|
||
|
\{m,n\}
|
||
|
|
||
|
A one-character regular expression followed by one of
|
||
|
the three of these constructions causes a range of
|
||
|
occurrences of that regular expression to be matched.
|
||
|
If it is followed by \{m\} where m is a non-negative
|
||
|
integer between 0 and 255 (inclusive), then exactly m
|
||
|
occurrences of that regular expression are matched. If
|
||
|
followed by \{m,\}, then at least m occurrences are
|
||
|
matched. Finally, if it is followed by \{m,n\} (where
|
||
|
n is a non-negative integer between 0 and 255 and where
|
||
|
n > m), then between m and n occurrences of the expres-
|
||
|
sion are matched.
|
||
|
|
||
|
Example:
|
||
|
|
||
|
The pattern
|
||
|
|
||
|
ab\{3\}
|
||
|
|
||
|
would match any substring in the reference string of an
|
||
|
'a' followed by exactly 3 'b's.
|
||
|
|
||
|
Example:
|
||
|
|
||
|
The pattern
|
||
|
|
||
|
ab\{3,\}
|
||
|
|
||
|
would match any substring in the reference string of an
|
||
|
'a' followed by at least 3 'b's.
|
||
|
|
||
|
|
||
|
Example:
|
||
|
|
||
|
The pattern
|
||
|
|
||
|
ab\{3,5\}
|
||
|
|
||
|
would match any substring in the reference string of an
|
||
|
'a' followed by at least 3 but at most 5 'b's.
|
||
|
|
||
|
|
||
|
Common Problems with Regular Expression
|
||
|
|
||
|
|
||
|
(1) When matching a substring it is not necessary to use
|
||
|
the wildcard character to match the part of the refer-
|
||
|
ence string preceeding and following the substring.
|
||
|
|
||
|
Example:
|
||
|
|
||
|
The pattern
|
||
|
|
||
|
abcd
|
||
|
|
||
|
will match any reference string containing this pat-
|
||
|
tern. It is not necessary to use
|
||
|
|
||
|
.*abcd.*
|
||
|
|
||
|
as the pattern.
|
||
|
|
||
|
|
||
|
(2) In order to constrain a pattern to the entire reference
|
||
|
pattern, use the the construction:
|
||
|
|
||
|
^pattern$
|
||
|
|
||
|
|
||
|
(3) The easiest way to obtain case insensitivity in a regu-
|
||
|
lar expression is to use the '[]' operator. For exam-
|
||
|
ple, a pattern to match the word ``hello'' regarless of
|
||
|
the case of the letters would be:
|
||
|
|
||
|
[Hh][Ee][Ll][Ll][Oo]
|
||
|
|