Jakarta-Regexp 1.2: RE

RE is an efficient, lightweight regular expression evaluator/matcher class. Regular expressions are pattern descriptions which enable sophisticated matching of strings. Details on the syntax of regular expression patterns are given below.


 

  Characters
 

    unicodeChar          Matches any identical unicode character
    \                    Used to quote a meta-character (like '*')
    \\                   Matches a single '\' character
    \0nnn                Matches a given octal character
    \xhh                 Matches a given 8-bit hexadecimal character
    \\uhhhh               Matches a given 16-bit hexadecimal character
    \t                   Matches an ASCII tab character
    \n                   Matches an ASCII newline character
    \r                   Matches an ASCII return character
    \f                   Matches an ASCII form feed character

 

  Character Classes
 

    [abc]                Simple character class
    [a-zA-Z]             Character class with ranges
    [^abc]               Negated character class

 

  Standard POSIX Character Classes
 

    [:alnum:]            Alphanumeric characters. 
    [:alpha:]            Alphabetic characters. 
    [:blank:]            Space and tab characters. 
    [:cntrl:]            Control characters. 
    [:digit:]            Numeric characters. 
    [:graph:]            Characters that are printable and are also visible. (A space is printable, but not visible, while an `a' is both.) 
    [:lower:]            Lower-case alphabetic characters. 
    [:print:]            Printable characters (characters that are not control characters.) 
    [:punct:]            Punctuation characters (characters that are not letter, digits, control characters, or space characters). 
    [:space:]            Space characters (such as space, tab, and formfeed, to name a few). 
    [:upper:]            Upper-case alphabetic characters. 
    [:xdigit:]           Characters that are hexadecimal digits.
         
 

  Non-standard POSIX-style Character Classes
 

    [:javastart:]        Start of a Java identifier
    [:javapart:]         Part of a Java identifier

 

  Predefined Classes
 

    .                    Matches any character other than newline
    \w                   Matches a "word" character (alphanumeric plus "_")
    \W                   Matches a non-word character
    \s                   Matches a whitespace character
    \S                   Matches a non-whitespace character
    \d                   Matches a digit character
    \D                   Matches a non-digit character

 

  Boundary Matchers
 

    ^                    Matches only at the beginning of a line
    $                    Matches only at the end of a line
    \b                   Matches only at a word boundary
    \B                   Matches only at a non-word boundary

 

  Greedy Closures
 

    A*                   Matches A 0 or more times (greedy)
    A+                   Matches A 1 or more times (greedy)
    A?                   Matches A 1 or 0 times (greedy)
    A{n}                 Matches A exactly n times (greedy)
    A{n,}                Matches A at least n times (greedy)
    A{n,m}               Matches A at least n but not more than m times (greedy)

 

  Reluctant Closures
 

    A*?                  Matches A 0 or more times (reluctant)
    A+?                  Matches A 1 or more times (reluctant)
    A??                  Matches A 0 or 1 times (reluctant)

 

  Logical Operators
 

    AB                   Matches A followed by B
    A|B                  Matches either A or B
    (A)                  Used for subexpression grouping

 

  Backreferences
 

    \1                   Backreference to 1st parenthesized subexpression
    \2                   Backreference to 2nd parenthesized subexpression
    \3                   Backreference to 3rd parenthesized subexpression
    \4                   Backreference to 4th parenthesized subexpression
    \5                   Backreference to 5th parenthesized subexpression
    \6                   Backreference to 6th parenthesized subexpression
    \7                   Backreference to 7th parenthesized subexpression
    \8                   Backreference to 8th parenthesized subexpression
    \9                   Backreference to 9th parenthesized subexpression

All closure operators (+, *, ?, {m,n}) are greedy by default, meaning that they match as many elements of the string as possible without causing the overall match to fail. If you want a closure to be reluctant (non-greedy), you can simply follow it with a '?'. A reluctant closure will match as few elements of the string as possible when finding matches. {m,n} closures don't currently support reluctancy.