[Home]
[Search]
[CTG]
[RTL]
[UGR]
regexp.h
RegExp is a C++ class to handle regular expressions.
Regular expressions
are a powerful method of string pattern matching. The RegExp class is
the core foundation for adding powerful string pattern matching capabilities
to programs like grep, text editors, awk, sed, etc. The regular expression
language used is the same as that commonly used, however, some of the very
advanced forms may behave slightly differently.
The linkable runtime library version of RegExp is for ASCII chars. To use
it with Unicode, compile the file \dm\src\core\regexp.cpp:
sc -c regexp -D_UNICODE
RegExp class members are:
RegExp();
~RegExp();
unsigned re_nsub;
regmatch_t *pmatch;
int compile(char *pattern, char *attributes, int ref);
int test(char *string, int startindex = 0);
char *replace(char *format);
char *replace2(char *format);
static char *replace3(char *format, char *input,
unsigned re_nsub, regmatch_t *pmatch);
static char *replace4(char *input, regmatch_t *text, char *replacement);
- struct regmatch_t
-
Is a simple type representing a match. It contains the members:
- int rm_so;
- index into the input string of the start of a match
- int rm_eo;
- index just past the end of the match
- RegExp();
-
This is the constructor. It builds a regular expression object.
- ~RegExp();
-
This is the destructor.
- unsigned re_nsub;
- regmatch_t *pmatch;
-
pmatch[0] contains the match for the entire regular expression.
If the regular expression contained parenthesized subexpressions, the matches
for those subexpressions are in the pmatch[1..re_nsub] array.
- int compile(char *pattern, char *attributes, int ref);
-
Compiles a regular expression given by pattern and modified by
attributes into an internal format.
A regular expression must be compiled
before it is used. Separating the compilation step from the match step
means that the time consuming compilation needs to be done only once, and then
the fast executing internal format can be used for repeated matches.
- pattern
- is the regular expression string.
- attributes
- is NULL or a string containing either or both of the
characters i and g:
- i
- the regular expression is case insensitive
- g
- the regular expression is global
- ref
- is a flag to the RegExp object:
- 0
- The RegExp object should make its own copy of
pattern.
- 1
- The RegExp object can refer to the caller's copy
of the pattern string.
This means that the pattern string cannot be
free'd or delete'd until the RegExp object is destructed.
Return Value
- !=0
- Successful compilation
- 0
- Failed to compile - the regular expression was not
valid
- int test(char *string, int startindex = 0);
-
Scans a string starting at position startindex looking for
a match against the previously compiled regular expression.
Return Value
- !=0
- Successful match. Member pmatch[0] is set to
where the expression match is, and pmatch[1..re_nsub] is the array
of any subexpression matches.
- 0
- Failed to find a match.
- char *replace(char *format);
-
Once a regular expression has been run through compile() and matched
against a source string with test(), then replace() can be used
to merge a format string with the matched text.
The format string consists of:
- &
- replace this character with the match specified
by member text.
- \n
- replace with the nth subexpression,
where n
is 1..9, specified by member pmatch[n].
- \c
- replace with the character c.
- c
- any other characters c are copied to the
output string.
Return Value
The merged output string is returned. It was allocated by malloc(),
and so should be free'd by free().
- char *replace2(char *format);
-
This is more advanced than replace() in that it can handle more than
9 subexpressions.
The format string consists of:
- $$
- $
- $&
- The matched substring.
- $`
- The portion of string that precedes the matched substring.
- $'
- The portion of string that follows the matched substring.
- $n
- The nth capture, where n is a single digit 1-9
and $n is not followed by a decimal digit.
If n <= re_nsub and the nth capture is undefined, use the empty
string instead. If n > re_nsub,
no characters are copied to the output.
- $nn
- The nnth capture, where nn
is a two-digit decimal number 01-99.
If nn <= re_nsub and the nnth capture is undefined, use the empty
string instead. If nn > re_nsub,
no characters are copied to the output.
- $c
- where c is any other character causes
$c to be copied to the output string.
- c
- any other characters c are copied to the
output string.
- static char *replace3(char *format, char *input, regmatch_t *text,
unsigned re_nsub, regmatch_t *pmatch);
-
replace3() is the same as replace2(), except that it
does not need a RegExp instance. Instead, it requires the parameters:
- input
- The source string from where the matched
text comes from.
- re_nsub
- Number of subexpression matches.
- pmatch[1+re_nsub]
- Array of those subexpression matches,
with pmatch[0] being the match for the entire expression.
- static char *replace4(char *input, regmatch_t *text, char *replacement);
-
replace4() does not require a RegExp instance. It performs a simple
merge of input with replacement. Characters from
input[text->re_so] to input[text->re_eo] are replaced with
replacement.
Copyright © 1995-2001 Digital Mars. All Rights Reserved.