add release dir
This commit is contained in:
341
release/base/help/english/regex/=
Normal file
341
release/base/help/english/regex/=
Normal file
@@ -0,0 +1,341 @@
|
||||
|
||||
archie uses ed(1) regular expressions in a number of
|
||||
commands.
|
||||
|
||||
A regular expression, on the one hand, is a string like
|
||||
any other; a sequence of characters. On the other
|
||||
hand, special characters within the string have certain
|
||||
functions which make regular expressions useful when
|
||||
trying to match portions of other strings. In the fol-
|
||||
lowing discussion and examples, a string containing a
|
||||
regular expression will be called the ``pattern'', and
|
||||
the string against which it is to be matched is called
|
||||
the ``reference string''.
|
||||
|
||||
Regular expressions allow one to search for "all strings
|
||||
ending with the letters 'ize'" or "all strings beginning
|
||||
with a number between 1 and 3 and ending in a comma".
|
||||
|
||||
In order to accomplish this, regular expressions co-opt
|
||||
the use of some characters to have special meaning.
|
||||
They also provide for these characters to lose their
|
||||
special meaning if the user so desires. The rules for
|
||||
regular expresssion are
|
||||
|
||||
|
||||
c Any character c matches itself unless it has been
|
||||
assigned other special meaning as listed below. Most
|
||||
special characters can be escaped (made to lose its
|
||||
special meaning), by placing the character '\' in front
|
||||
of it. This doesn't apply to '{' which is non-special
|
||||
until it is escaped. Thus although '*' normally has
|
||||
special meaning the string '\*' matches itself.
|
||||
|
||||
Example:
|
||||
|
||||
The pattern
|
||||
|
||||
acdef
|
||||
|
||||
matches
|
||||
|
||||
s83acdeffff or acdefsecs or acdefsecs
|
||||
|
||||
but not
|
||||
|
||||
accdef or aacde1f
|
||||
|
||||
That is it will any string that contains ``acdef'' any-
|
||||
where in the reference string.
|
||||
|
||||
Example:
|
||||
|
||||
Normally the characters '*' and '$' are special,
|
||||
but the pattern
|
||||
|
||||
a\*bse\$
|
||||
|
||||
acts as above. That is any reference string containing
|
||||
``*abse$'' as a substring will be flagged as a match.
|
||||
|
||||
|
||||
|
||||
. A period matches any character except the newline
|
||||
character. This is known as the wildcard character.
|
||||
|
||||
Example:
|
||||
|
||||
The pattern
|
||||
|
||||
....
|
||||
|
||||
will match any 4 characters in the reference string,
|
||||
except a newline character.
|
||||
|
||||
|
||||
^ If `^' appears at the begining of the pattern then it
|
||||
is said to ``anchor'' the match to the beginning of the
|
||||
line. That is, the reference string must start with the
|
||||
pattern following the `^'. If this character appears
|
||||
anywhere else other than at the beginning of the line,
|
||||
then it is no longer considered special, and matches
|
||||
itself as any non-special character would. Similarly if
|
||||
it starts a string but is escaped, it matches itself.
|
||||
|
||||
Example:
|
||||
|
||||
The pattern
|
||||
|
||||
^efghi
|
||||
|
||||
Will match
|
||||
|
||||
efghi or efghijlk
|
||||
|
||||
but not
|
||||
|
||||
abcefghi
|
||||
|
||||
That is the pattern will match only those reference
|
||||
strings starting with ``efghi''. Just containing the
|
||||
substring is not sufficient.
|
||||
|
||||
|
||||
$ Occurring at the end of the pattern, this character
|
||||
``anchors'' the pattern to the end of the line (refer-
|
||||
ence string). A '$' occurring anywhere else in the pat-
|
||||
tern is regarded as a non-special. Similarly if it is
|
||||
at the end of the pattern but is escaped, it is non-
|
||||
special.
|
||||
|
||||
Example:
|
||||
|
||||
The pattern
|
||||
|
||||
efghi$
|
||||
|
||||
Will match
|
||||
|
||||
efghi or abcdefghi
|
||||
|
||||
but not
|
||||
|
||||
efghijkl
|
||||
|
||||
That is the pattern will match only those reference
|
||||
strings ending with ``efghi''. Just containing the sub-
|
||||
string is not sufficient.
|
||||
|
||||
|
||||
\< This sequence in the pattern causes the one character
|
||||
regular expression following it only to match something
|
||||
at the beginning of a word: the beginning of a line or
|
||||
just before a letter, digit or underline character, or
|
||||
just after a charcter which is not one of these.
|
||||
|
||||
Example:
|
||||
|
||||
The pattern
|
||||
|
||||
\<abc
|
||||
|
||||
would match the last 'abc' in the reference string
|
||||
|
||||
@hijabc#+abc
|
||||
|
||||
but not the first since the first 'abc' did not start
|
||||
on a ``word'' boundary.
|
||||
|
||||
|
||||
\> Constrains the one-character regular expression fol-
|
||||
lowing it to be at the end of a ``word'' as defined
|
||||
above.
|
||||
|
||||
|
||||
[string]
|
||||
|
||||
One or more characters within square brackets. This
|
||||
pattern matches any single character within the brack-
|
||||
ets. The caret, '^', has a special meaning if it is the
|
||||
first character in the series: the pattern will match
|
||||
any character other than one in the list.
|
||||
|
||||
Example:
|
||||
|
||||
The pattern
|
||||
|
||||
[^abc]
|
||||
|
||||
Will match any character except 'a', 'b' or 'c'.
|
||||
|
||||
To match a right bracket, ']', in the list it must be
|
||||
put first:
|
||||
|
||||
[]ab01]
|
||||
|
||||
For a caret, '^', in the list it can appear anywhere
|
||||
but first.
|
||||
|
||||
In
|
||||
|
||||
[ab^01]
|
||||
|
||||
the caret loses its special meaning.
|
||||
|
||||
|
||||
The '-' character is special within square brackets. It
|
||||
is interpreted as a range of characters (in the ASCII
|
||||
character set) and will match any single character
|
||||
within that range. '[a-z]' matches any lower case
|
||||
letter. The '-' can be made non special by placing it
|
||||
first or last within the square brackets.
|
||||
|
||||
|
||||
The characters '$', '*' and '.' are not special within
|
||||
square brackets.
|
||||
|
||||
|
||||
Example:
|
||||
|
||||
The pattern
|
||||
|
||||
[ab01]
|
||||
|
||||
matches a single occurence of a character from the set
|
||||
'a', 'b', '0', '1'.
|
||||
|
||||
Example:
|
||||
|
||||
The pattern
|
||||
|
||||
[^ab01]
|
||||
|
||||
will match any single character other than 'a', 'b',
|
||||
'0', '1'.
|
||||
|
||||
|
||||
Example :
|
||||
|
||||
The pattern
|
||||
|
||||
[a0-9b]
|
||||
|
||||
which matches one of 'a', 'b' or a digit between 0 and
|
||||
9 inclusive.
|
||||
|
||||
Example :
|
||||
|
||||
The pattern
|
||||
|
||||
[^a0-9b.$]
|
||||
|
||||
|
||||
means any single character not 'a', 'b' '.' , '$' or a
|
||||
digit between 0 and 9 inclusive.
|
||||
|
||||
* An asterisk following a regular expression in the pat-
|
||||
tern has the effect of matching zero or more
|
||||
occurrences of that expression.
|
||||
|
||||
Example:
|
||||
|
||||
The pattern
|
||||
|
||||
a*
|
||||
|
||||
means zero or more occurrences of the character 'a'.
|
||||
|
||||
|
||||
Example:
|
||||
|
||||
The pattern
|
||||
|
||||
[A-Z]*
|
||||
|
||||
means zero or more occurrences of the upper case alpha-
|
||||
bet.
|
||||
|
||||
|
||||
|
||||
|
||||
\{m\}
|
||||
|
||||
\{m,\}
|
||||
|
||||
\{m,n\}
|
||||
|
||||
A one-character regular expression followed by one of
|
||||
the three of these constructions causes a range of
|
||||
occurrences of that regular expression to be matched.
|
||||
If it is followed by \{m\} where m is a non-negative
|
||||
integer between 0 and 255 (inclusive), then exactly m
|
||||
occurrences of that regular expression are matched. If
|
||||
followed by \{m,\}, then at least m occurrences are
|
||||
matched. Finally, if it is followed by \{m,n\} (where
|
||||
n is a non-negative integer between 0 and 255 and where
|
||||
n > m), then between m and n occurrences of the expres-
|
||||
sion are matched.
|
||||
|
||||
Example:
|
||||
|
||||
The pattern
|
||||
|
||||
ab\{3\}
|
||||
|
||||
would match any substring in the reference string of an
|
||||
'a' followed by exactly 3 'b's.
|
||||
|
||||
Example:
|
||||
|
||||
The pattern
|
||||
|
||||
ab\{3,\}
|
||||
|
||||
would match any substring in the reference string of an
|
||||
'a' followed by at least 3 'b's.
|
||||
|
||||
|
||||
Example:
|
||||
|
||||
The pattern
|
||||
|
||||
ab\{3,5\}
|
||||
|
||||
would match any substring in the reference string of an
|
||||
'a' followed by at least 3 but at most 5 'b's.
|
||||
|
||||
|
||||
Common Problems with Regular Expression
|
||||
|
||||
|
||||
(1) When matching a substring it is not necessary to use
|
||||
the wildcard character to match the part of the refer-
|
||||
ence string preceeding and following the substring.
|
||||
|
||||
Example:
|
||||
|
||||
The pattern
|
||||
|
||||
abcd
|
||||
|
||||
will match any reference string containing this pat-
|
||||
tern. It is not necessary to use
|
||||
|
||||
.*abcd.*
|
||||
|
||||
as the pattern.
|
||||
|
||||
|
||||
(2) In order to constrain a pattern to the entire reference
|
||||
pattern, use the the construction:
|
||||
|
||||
^pattern$
|
||||
|
||||
|
||||
(3) The easiest way to obtain case insensitivity in a regu-
|
||||
lar expression is to use the '[]' operator. For exam-
|
||||
ple, a pattern to match the word ``hello'' regarless of
|
||||
the case of the letters would be:
|
||||
|
||||
[Hh][Ee][Ll][Ll][Oo]
|
||||
|
||||
Reference in New Issue
Block a user