Jakarta-Regexp 1.2: RE

RE is an efficient, lightweight regular expression evaluator/matcher class. Regular expressions are pattern descriptions which enable sophisticated matching of strings. Details on the syntax of regular expression patterns are given below.


 
Characters
unicodeChar Matches any identical unicode character \ Used to quote a meta-character (like '*') \\ Matches a single '\' character \0nnn Matches a given octal character \xhh Matches a given 8-bit hexadecimal character \\uhhhh Matches a given 16-bit hexadecimal character \t Matches an ASCII tab character \n Matches an ASCII newline character \r Matches an ASCII return character \f Matches an ASCII form feed character
Character Classes
[abc] Simple character class [a-zA-Z] Character class with ranges [^abc] Negated character class
Standard POSIX Character Classes
[:alnum:] Alphanumeric characters. [:alpha:] Alphabetic characters. [:blank:] Space and tab characters. [:cntrl:] Control characters. [:digit:] Numeric characters. [:graph:] Characters that are printable and are also visible. (A space is printable, but not visible, while an `a' is both.) [:lower:] Lower-case alphabetic characters. [:print:] Printable characters (characters that are not control characters.) [:punct:] Punctuation characters (characters that are not letter, digits, control characters, or space characters). [:space:] Space characters (such as space, tab, and formfeed, to name a few). [:upper:] Upper-case alphabetic characters. [:xdigit:] Characters that are hexadecimal digits.
Non-standard POSIX-style Character Classes
[:javastart:] Start of a Java identifier [:javapart:] Part of a Java identifier
Predefined Classes
. Matches any character other than newline \w Matches a "word" character (alphanumeric plus "_") \W Matches a non-word character \s Matches a whitespace character \S Matches a non-whitespace character \d Matches a digit character \D Matches a non-digit character
Boundary Matchers
^ Matches only at the beginning of a line $ Matches only at the end of a line \b Matches only at a word boundary \B Matches only at a non-word boundary
Greedy Closures
A* Matches A 0 or more times (greedy) A+ Matches A 1 or more times (greedy) A? Matches A 1 or 0 times (greedy) A{n} Matches A exactly n times (greedy) A{n,} Matches A at least n times (greedy) A{n,m} Matches A at least n but not more than m times (greedy)
Reluctant Closures
A*? Matches A 0 or more times (reluctant) A+? Matches A 1 or more times (reluctant) A?? Matches A 0 or 1 times (reluctant)
Logical Operators
AB Matches A followed by B A|B Matches either A or B (A) Used for subexpression grouping
Backreferences
\1 Backreference to 1st parenthesized subexpression \2 Backreference to 2nd parenthesized subexpression \3 Backreference to 3rd parenthesized subexpression \4 Backreference to 4th parenthesized subexpression \5 Backreference to 5th parenthesized subexpression \6 Backreference to 6th parenthesized subexpression \7 Backreference to 7th parenthesized subexpression \8 Backreference to 8th parenthesized subexpression \9 Backreference to 9th parenthesized subexpression

All closure operators (+, *, ?, {m,n}) are greedy by default, meaning that they match as many elements of the string as possible without causing the overall match to fail. If you want a closure to be reluctant (non-greedy), you can simply follow it with a '?'. A reluctant closure will match as few elements of the string as possible when finding matches. {m,n} closures don't currently support reluctancy.


Copyright © 2000 Apache Software Foundation. All Rights Reserved.