| Perl Regex Feature | Required | PCRE | Java | XPath | GNU ERE |
XML | POSIX ERE | GNU BRE | POSIX BRE |
|---|---|---|---|---|---|---|---|---|---|
| \Q...\E escapes a string of metacharacters | |
|
|
|
|
|
|
|
|
| \x00 through \xFF (ASCII character) | |
|
|
|
|
|
|
|
|
| \n (LF), \r (CR) and \t (tab) | |
|
|
|
|
|
|
|
|
| \f (form feed) and \v (vtab) | |
|
|
|
|
|
|
|
|
| \a (bell) and \e (escape) | |
|
|
|
|
|
|
|
|
| \cA through \cZ (control character) | |
|
|
|
|
|
|
|
|
| \ca through \cz (control character) | |
|
|
|
|
|
|
|
|
| Hyphen in [\d-z] is a literal | |
|
|
|
|
|
|
|
|
| Backslash escapes one character class metacharacter | |
|
|
|
|
|
|
|
|
| \Q...\E escapes a string of character class metacharacters | |
Java 6 | |
|
|
|
|
|
|
| \d shorthand for digits | |
ascii | ascii | |
|
|
|
|
|
| [\b] backspace | |
|
|
|
|
|
|
|
|
| \A (start of string) | |
|
|
|
|
|
|
|
|
| \Z (end of string, before final line break) | |
|
|
|
|
|
|
|
|
| \z (end of string) | |
|
|
|
|
|
|
|
|
| ? after any of the above quantifiers to make it "lazy" | |
|
|
|
|
|
|
|
|
| (?:regex) (non-capturing group) | |
|
|
|
|
|
|
|
|
| \10 through \99 (backreferences) | |
|
|
|
n/a | n/a | |
|
|
| Forward references \1 through \9 | |
|
|
|
n/a | n/a | |
|
|
| Nested references \1 through \9 | |
|
|
|
n/a | n/a | |
|
|
| (?i) (case insensitive) | |
|
flag | |
|
|
|
|
|
| (?s) (dot matches newlines) | |
|
flag | |
|
|
|
|
|
| (?m) (^ and $ match at line breaks) | |
|
flag | |
|
|
|
|
|
| (?x) (free-spacing mode) | |
|
flag | |
|
|
|
|
|
| (?-ismxn) (turn off mode modifiers) | |
|
|
|
|
|
|
|
|
| (?ismxn:group) (mode modifiers local to group) | |
|
|
|
|
|
|
|
|
| (?>regex) (atomic group) | |
|
|
|
|
|
|
|
|
| (?=regex) (positive lookahead) | |
|
|
|
|
|
|
|
|
| (?!regex) (negative lookahead) | |
|
|
|
|
|
|
|
|
| (?<=text) (fixed length positive lookbehind) | |
finite length | |
|
|
|
|
|
|
| (?<!text) (fixed length negative lookbehind) | |
finite length | |
|
|
|
|
|
|
| \G (start of match attempt) | |
|
|
|
|
|
|
|
|
| (?(?=regex)then|else) (using any lookaround) | |
|
|
|
|
|
|
|
|
| (?(1)then|else) | |
|
|
|
|
|
|
|
|
| (?#comment) | |
|
|
|
|
|
|
|
|
| Free-spacing syntax supported | |
|
|
|
|
|
|
|
|
| \X (Unicode grapheme) | |
|
|
|
|
|
|
|
|
| \x{0} through \x{FFFF} (Unicode character) | |
|
|
|
|
|
|
|
|
| \pL through \pC (Unicode properties) | |
|
|
|
|
|
|
|
|
| \p{L} through \p{C} (Unicode properties) | |
|
|
|
|
|
|
|
|
| \p{Lu} through \p{Cn} (Unicode property) | |
|
|
|
|
|
|
|
|
| \p{L&} and \p{Letter&} (equivalent of [\p{Lu}\p{Ll}\p{Lt}] Unicode properties) | |
|
|
|
|
|
|
|
|
| \p{IsL} through \p{IsC} (Unicode properties) | |
|
|
|
|
|
|
|
|
| \p{IsLu} through \p{IsCn} (Unicode property) | |
|
|
|
|
|
|
|
|
| \p{Letter} through \p{Other} (Unicode properties) | |
|
|
|
|
|
|
|
|
| \p{Lowercase_Letter} through \p{Not_Assigned} (Unicode property) | |
|
|
|
|
|
|
|
|
| \p{IsLetter} through \p{IsOther} (Unicode properties) | |
|
|
|
|
|
|
|
|
| \p{IsLowercase_Letter} through \p{IsNot_Assigned} (Unicode property) | |
|
|
|
|
|
|
|
|
| \p{Arabic} through \p{Yi} (Unicode script) | |
|
|
|
|
|
|
|
|
| \p{IsArabic} through \p{IsYi} (Unicode script) | |
|
|
|
|
|
|
|
|
| \p{BasicLatin} through \p{Specials} (Unicode block) | |
|
|
|
|
|
|
|
|
| \p{InBasicLatin} through \p{InSpecials} (Unicode block) | |
|
|
|
|
|
|
|
|
| \p{IsBasicLatin} through \p{IsSpecials} (Unicode block) | |
|
|
|
|
|
|
|
|
| Part between {} in all of the above is case insensitive | |
|
|
|
|
|
|
|
|
| Spaces, hyphens and underscores allowed in all long names listed above (e.g. BasicLatin can be written as Basic-Latin or Basic_Latin or Basic Latin) | |
Java 5 | |
|
|
|
|
|
|
| \P (negated variants of all \p as listed above) | |
|
|
|
|
|
|
|
|
| \p{^...} (negated variants of all \p{...} as listed above) | |
|
|
|
|
|
|
|
|
| \p{IsAlpha} POSIX character class | |
|
|
|
|
|
|
|
|
| Backslash escapes one metacharacter | |
|
|
|
|
|
|
|
|
| [abc] character class | |
|
|
|
|
|
|
|
|
| [^abc] negated character class | |
|
|
|
|
|
|
|
|
| [a-z] character class range | |
|
|
|
|
|
|
|
|
| \w shorthand for word characters | |
ascii | ascii | |
|
|
|
|
|
| \s shorthand for whitespace | |
ascii | ascii | ascii | |
ascii | |
|
|
| \D, \W and \S shorthand negated character classes | |
|
|
|
|
|
|
|
|
| . (dot; any character except line break) | |
|
|
|
|
|
|
|
|
| ^ (start of string/line) | |
|
|
|
|
|
|
|
|
| $ (end of string/line) | |
|
|
|
|
|
|
|
|
| \b (at the beginning or end of a word) | |
ascii | |
|
|
|
|
|
|
| \B (NOT at the beginning or end of a word) | ascii | |
|
|
|
|
|
|
|
| \| (alternation) | |
|
|
|
|
|
|
\| |
|
| ? (0 or 1) | |
|
|
|
|
|
|
\? |
|
| * (0 or more) | |
|
|
|
|
|
|
|
|
| + (1 or more) | |
|
|
|
|
|
|
\+ |
|
| {n} (exactly n) | |
|
|
|
|
|
|
\{n\} | \{n\} |
| {n,m} (between n and m) | |
|
|
|
|
|
|
\{n,m\} |
\{n,m\} |
| {n,} (n or more) | |
|
|
|
|
|
|
\{n,\} |
\{n,\} |
| (regex) (numbered capturing group) | |
|
|
|
|
|
\( \) |
\( \) |
|
| \1 through \9 (backreferences) | |
|
|
|
|
|
|
|
|
| Backreferences non-existent groups are an error | |
|
|
|
n/a | n/a | |
|
|
| Backreferences to failed groups also fail | |
|
|
|
n/a | n/a | |
|
|
| [:alpha:] POSIX character class | ascii | |
|
|
|
|
|
|
|
| Character class is a single token | |
|
|
n/a | n/a | n/a | n/a | n/a | |
| # starts a comment | |
|
|
n/a | n/a | n/a | n/a | n/a |
