Strategy for Regular Expression Support

Foswiki strives to support the rich Perl regular expression syntax wherever regular expressions are required. However, because Foswiki has to interface with third party tools and libraries, it is not always to support all the features of Perl regular expressions in all places.

Any developer who implements an interface to such a third-party tool must make every effort to map all the functionality of Perl regular expressions to the tool. It will not always be possible to support everything, so the following table lists the features of regular expressions that are required to be available. The features are chosen from those described in http://www.regular-expressions.info/refflavors.html, which compares the regular expression support provided in several important environments. The table also documents the level of support for Perl regular expressions in a number of popular implementations.

Perl Regex Feature Required PCRE Java XPath GNU ERE XML POSIX ERE GNU BRESorted ascending POSIX BRE
\| (alternation) choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes \| choice-no
(regex) (numbered capturing group)   choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes \( \) \( \)
+ (1 or more) choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes \+ choice-no
? (0 or 1) choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes \? choice-no
{n,} (n or more) choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes \{n,\} \{n,\}
{n,m} (between n and m) choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes \{n,m\} \{n,m\}
{n} (exactly n) choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes \{n\} \{n\}
\Q...\E escapes a string of metacharacters   choice-yes choice-yes choice-no choice-no choice-no choice-no choice-no choice-no
\x00 through \xFF (ASCII character) choice-yes choice-yes choice-yes choice-no choice-no choice-no choice-no choice-no choice-no
\n (LF), \r (CR) and \t (tab) choice-yes choice-yes choice-yes choice-yes choice-no choice-yes choice-no choice-no choice-no
\f (form feed) and \v (vtab)   choice-yes choice-yes choice-no choice-no choice-no choice-no choice-no choice-no
\a (bell) and \e (escape)   choice-yes choice-yes choice-no choice-no choice-no choice-no choice-no choice-no
\cA through \cZ (control character)   choice-yes choice-yes choice-no choice-no choice-no choice-no choice-no choice-no
\ca through \cz (control character)   choice-yes choice-no choice-no choice-no choice-no choice-no choice-no choice-no
Hyphen in [\d-z] is a literal   choice-yes choice-yes choice-no choice-no choice-no choice-no choice-no choice-no
Backslash escapes one character class metacharacter   choice-yes choice-yes choice-yes choice-no choice-yes choice-no choice-no choice-no
\Q...\E escapes a string of character class metacharacters   choice-yes Java 6 choice-no choice-no choice-no choice-no choice-no choice-no
\d shorthand for digits choice-yes ascii ascii choice-yes choice-no choice-yes choice-no choice-no choice-no
[\b] backspace   choice-yes choice-yes choice-no choice-no choice-no choice-no choice-no choice-no
\A (start of string)   choice-yes choice-yes choice-no choice-no choice-no choice-no choice-no choice-no
\Z (end of string, before final line break)   choice-yes choice-yes choice-no choice-no choice-no choice-no choice-no choice-no
\z (end of string)   choice-yes choice-yes choice-no choice-no choice-no choice-no choice-no choice-no
? after any of the above quantifiers to make it "lazy"   choice-yes choice-yes choice-yes choice-no choice-no choice-no choice-no choice-no
(?:regex) (non-capturing group)   choice-yes choice-yes choice-no choice-no choice-no choice-no choice-no choice-no
\10 through \99 (backreferences)   choice-yes choice-yes choice-yes choice-no n/a n/a choice-no choice-no
Forward references \1 through \9   choice-yes choice-yes choice-no choice-no n/a n/a choice-no choice-no
Nested references \1 through \9   choice-yes choice-yes choice-no choice-no n/a n/a choice-no choice-no
(?i) (case insensitive)   choice-yes choice-yes flag choice-no choice-no choice-no choice-no choice-no
(?s) (dot matches newlines)   choice-yes choice-yes flag choice-no choice-no choice-no choice-no choice-no
(?m) (^ and $ match at line breaks)   choice-yes choice-yes flag choice-no choice-no choice-no choice-no choice-no
(?x) (free-spacing mode)   choice-yes choice-yes flag choice-no choice-no choice-no choice-no choice-no
(?-ismxn) (turn off mode modifiers)   choice-yes choice-yes choice-no choice-no choice-no choice-no choice-no choice-no
(?ismxn:group) (mode modifiers local to group)   choice-yes choice-yes choice-no choice-no choice-no choice-no choice-no choice-no
(?>regex) (atomic group)   choice-yes choice-yes choice-no choice-no choice-no choice-no choice-no choice-no
(?=regex) (positive lookahead)   choice-yes choice-yes choice-no choice-no choice-no choice-no choice-no choice-no
(?!regex) (negative lookahead)   choice-yes choice-yes choice-no choice-no choice-no choice-no choice-no choice-no
(?<=text) (fixed length positive lookbehind)   choice-yes finite length choice-no choice-no choice-no choice-no choice-no choice-no
(?<!text) (fixed length negative lookbehind)   choice-yes finite length choice-no choice-no choice-no choice-no choice-no choice-no
\G (start of match attempt)   choice-yes choice-yes choice-no choice-no choice-no choice-no choice-no choice-no
(?(?=regex)then|else) (using any lookaround)   choice-yes choice-no choice-no choice-no choice-no choice-no choice-no choice-no
(?(1)then|else)   choice-yes choice-no choice-no choice-no choice-no choice-no choice-no choice-no
(?#comment)   choice-yes choice-no choice-no choice-no choice-no choice-no choice-no choice-no
Free-spacing syntax supported   choice-yes choice-yes choice-yes choice-no choice-no choice-no choice-no choice-no
\X (Unicode grapheme)   choice-yes choice-no choice-no choice-no choice-no choice-no choice-no choice-no
\x{0} through \x{FFFF} (Unicode character)   choice-yes choice-no choice-no choice-no choice-no choice-no choice-no choice-no
\pL through \pC (Unicode properties)   choice-yes choice-yes choice-no choice-no choice-no choice-no choice-no choice-no
\p{L} through \p{C} (Unicode properties)   choice-yes choice-yes choice-yes choice-no choice-yes choice-no choice-no choice-no
\p{Lu} through \p{Cn} (Unicode property)   choice-yes choice-yes choice-yes choice-no choice-yes choice-no choice-no choice-no
\p{L&} and \p{Letter&} (equivalent of [\p{Lu}\p{Ll}\p{Lt}] Unicode properties)   choice-yes choice-no choice-no choice-no choice-no choice-no choice-no choice-no
\p{IsL} through \p{IsC} (Unicode properties)   choice-no choice-yes choice-no choice-no choice-no choice-no choice-no choice-no
\p{IsLu} through \p{IsCn} (Unicode property)   choice-no choice-yes choice-no choice-no choice-no choice-no choice-no choice-no
\p{Letter} through \p{Other} (Unicode properties)   choice-no choice-no choice-no choice-no choice-no choice-no choice-no choice-no
\p{Lowercase_Letter} through \p{Not_Assigned} (Unicode property)   choice-no choice-no choice-no choice-no choice-no choice-no choice-no choice-no
\p{IsLetter} through \p{IsOther} (Unicode properties)   choice-no choice-no choice-no choice-no choice-no choice-no choice-no choice-no
\p{IsLowercase_Letter} through \p{IsNot_Assigned} (Unicode property)   choice-no choice-no choice-no choice-no choice-no choice-no choice-no choice-no
\p{Arabic} through \p{Yi} (Unicode script)   choice-yes choice-no choice-no choice-no choice-no choice-no choice-no choice-no
\p{IsArabic} through \p{IsYi} (Unicode script)   choice-no choice-no choice-no choice-no choice-no choice-no choice-no choice-no
\p{BasicLatin} through \p{Specials} (Unicode block)   choice-no choice-no choice-no choice-no choice-no choice-no choice-no choice-no
\p{InBasicLatin} through \p{InSpecials} (Unicode block)   choice-no choice-yes choice-no choice-no choice-no choice-no choice-no choice-no
\p{IsBasicLatin} through \p{IsSpecials} (Unicode block)   choice-no choice-no choice-yes choice-no choice-yes choice-no choice-no choice-no
Part between {} in all of the above is case insensitive   choice-no choice-no choice-no choice-no choice-no choice-no choice-no choice-no
Spaces, hyphens and underscores allowed in all long names listed above (e.g. BasicLatin can be written as Basic-Latin or Basic_Latin or Basic Latin)   choice-no Java 5 choice-no choice-no choice-no choice-no choice-no choice-no
\P (negated variants of all \p as listed above)   choice-yes choice-yes choice-yes choice-no choice-yes choice-no choice-no choice-no
\p{^...} (negated variants of all \p{...} as listed above)   choice-yes choice-no choice-no choice-no choice-no choice-no choice-no choice-no
\p{IsAlpha} POSIX character class   choice-no choice-no choice-no choice-no choice-no choice-no choice-no choice-no
Backslash escapes one metacharacter choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes
[abc] character class choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes
[^abc] negated character class choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes
[a-z] character class range choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes
\w shorthand for word characters choice-yes ascii ascii choice-yes choice-yes choice-yes choice-no choice-yes choice-no
\s shorthand for whitespace choice-yes ascii ascii ascii choice-yes ascii choice-no choice-yes choice-no
\D, \W and \S shorthand negated character classes choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes choice-no choice-yes choice-no
. (dot; any character except line break)   choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes
^ (start of string/line)   choice-yes choice-yes choice-yes choice-yes choice-no choice-yes choice-yes choice-yes
$ (end of string/line)   choice-yes choice-yes choice-yes choice-yes choice-no choice-yes choice-yes choice-yes
\b (at the beginning or end of a word) choice-yes ascii choice-yes choice-no choice-yes choice-no choice-no choice-yes choice-no
\B (NOT at the beginning or end of a word)   ascii choice-yes choice-no choice-yes choice-no choice-no choice-yes choice-no
* (0 or more) choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes choice-yes
\1 through \9 (backreferences) choice-yes choice-yes choice-yes choice-yes choice-yes choice-no choice-no choice-yes choice-yes
Backreferences non-existent groups are an error   choice-yes choice-yes choice-yes choice-yes n/a n/a choice-yes choice-yes
Backreferences to failed groups also fail   choice-yes choice-yes choice-yes choice-yes n/a n/a choice-yes choice-yes
[:alpha:] POSIX character class   ascii choice-no choice-no choice-yes choice-no choice-yes choice-yes choice-yes
Character class is a single token   choice-yes choice-no choice-yes n/a n/a n/a n/a n/a
# starts a comment   choice-yes choice-yes choice-no n/a n/a n/a n/a n/a

In the event that an external tool supports regular expression syntax that is not compatible with Perl, the calling code must defuse the regex feature that is not perl compatible. This may result in some loss of functionality, but is necessary to avoid confusing users.
Topic revision: r1 - 07 Apr 2010, CrawfordCurrie
 
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. see CopyrightStatement. Creative Commons LicenseGet Foswiki at sourceforge.net. Fast, secure and Free Open Source software downloads