Hello
WikiGuest
Log In
or
Register
Users
About
Blog
Extensions
Documentation
Community
Development
Tasks
Download
Support
You are here:
Foswiki
>
Development Web
>
DevelopersBible
>
RegularExpressions
(revision 1) (raw view)
<!-- * Set Y = %ICON{choice-yes}% * Set N = %ICON{choice-no}% --> ---+ Strategy for Regular Expression Support Foswiki strives to support the rich Perl regular expression syntax wherever regular expressions are required. However, because Foswiki has to interface with third party tools and libraries, it is not always to support all the features of Perl regular expressions in all places. Any developer who implements an interface to such a third-party tool must make every effort to map all the functionality of Perl regular expressions to the tool. It will not always be possible to support everything, so the following table lists the features of regular expressions that are *required* to be available. The features are chosen from those described in http://www.regular-expressions.info/refflavors.html, which compares the regular expression support provided in several important environments. The table also documents the level of support for Perl regular expressions in a number of popular implementations. | *Perl Regex Feature* | *Required* | *PCRE* | *Java* | *XPath* | *GNU ERE* | *XML* | *POSIX ERE* | *GNU BRE* | *POSIX BRE* | | Backslash escapes one metacharacter | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | | \Q...\E escapes a string of metacharacters | | %Y% | %Y% | %N% | %N% | %N% | %N% | %N% | %N% | | \x00 through \xFF (ASCII character) | %Y% | %Y% | %Y% | %N% | %N% | %N% | %N% | %N% | %N% | | \n (LF), \r (CR) and \t (tab) | %Y% | %Y% | %Y% | %Y% | %N% | %Y% | %N% | %N% | %N% | | \f (form feed) and \v (vtab) | | %Y% | %Y% | %N% | %N% | %N% | %N% | %N% | %N% | | \a (bell) and \e (escape) | | %Y% | %Y% | %N% | %N% | %N% | %N% | %N% | %N% | | \cA through \cZ (control character) | | %Y% | %Y% | %N% | %N% | %N% | %N% | %N% | %N% | | \ca through \cz (control character) | | %Y% | %N% | %N% | %N% | %N% | %N% | %N% | %N% | | [abc] character class | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | | [^abc] negated character class | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | | [a-z] character class range | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | | Hyphen in [\d-z] is a literal | | %Y% | %Y% | %N% | %N% | %N% | %N% | %N% | %N% | | Backslash escapes one character class metacharacter | | %Y% | %Y% | %Y% | %N% | %Y% | %N% | %N% | %N% | | \Q...\E escapes a string of character class metacharacters | | %Y% | Java 6 | %N% | %N% | %N% | %N% | %N% | %N% | | \d shorthand for digits | %Y% | ascii | ascii | %Y% | %N% | %Y% | %N% | %N% | %N% | | \w shorthand for word characters | %Y% | ascii | ascii | %Y% | %Y% | %Y% | %N% | %Y% | %N% | | \s shorthand for whitespace | %Y% | ascii | ascii | ascii | %Y% | ascii | %N% | %Y% | %N% | | \D, \W and \S shorthand negated character classes | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | %N% | %Y% | %N% | | [\b] backspace | | %Y% | %Y% | %N% | %N% | %N% | %N% | %N% | %N% | | . (dot; any character except line break) | | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | | ^ (start of string/line) | | %Y% | %Y% | %Y% | %Y% | %N% | %Y% | %Y% | %Y% | | $ (end of string/line) | | %Y% | %Y% | %Y% | %Y% | %N% | %Y% | %Y% | %Y% | | \A (start of string) | | %Y% | %Y% | %N% | %N% | %N% | %N% | %N% | %N% | | \Z (end of string, before final line break) | | %Y% | %Y% | %N% | %N% | %N% | %N% | %N% | %N% | | \z (end of string) | | %Y% | %Y% | %N% | %N% | %N% | %N% | %N% | %N% | | \b (at the beginning or end of a word) | %Y% | ascii | %Y% | %N% | %Y% | %N% | %N% | %Y% | %N% | | \B (NOT at the beginning or end of a word) | | ascii | %Y% | %N% | %Y% | %N% | %N% | %Y% | %N% | | \| (alternation) | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | <code>\|</code> | %N% | | ? (0 or 1) | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | <code>\?</code> | %N% | | * (0 or more) | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | | + (1 or more) | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | <code>\+</code> | %N% | | {n} (exactly n) | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | \{n\} | <code>\{n\}</code> | | {n,m} (between n and m) | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | <code>\{n,m\}</code> | <code>\{n,m\}</code> | | {n,} (n or more) | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | <code>\{n,\}</code> | <code>\{n,\}</code> | | ? after any of the above quantifiers to make it "lazy" | | %Y% | %Y% | %Y% | %N% | %N% | %N% | %N% | %N% | | (regex) (numbered capturing group) | | %Y% | %Y% | %Y% | %Y% | %Y% | %Y% | <code>\( \)</code> | <code>\( \)</code> | | (?:regex) (non-capturing group) | | %Y% | %Y% | %N% | %N% | %N% | %N% | %N% | %N% | | \1 through \9 (backreferences) | %Y% | %Y% | %Y% | %Y% | %Y% | %N% | %N% | %Y% | %Y% | | \10 through \99 (backreferences) | | %Y% | %Y% | %Y% | %N% | n/a | n/a | %N% | %N% | | Forward references \1 through \9 | | %Y% | %Y% | %N% | %N% | n/a | n/a | %N% | %N% | | Nested references \1 through \9 | | %Y% | %Y% | %N% | %N% | n/a | n/a | %N% | %N% | | Backreferences non-existent groups are an error | | %Y% | %Y% | %Y% | %Y% | n/a | n/a | %Y% | %Y% | | Backreferences to failed groups also fail | | %Y% | %Y% | %Y% | %Y% | n/a | n/a | %Y% | %Y% | | (?i) (case insensitive) | | %Y% | %Y% | flag | %N% | %N% | %N% | %N% | %N% | | (?s) (dot matches newlines) | | %Y% | %Y% | flag | %N% | %N% | %N% | %N% | %N% | | (?m) (^ and $ match at line breaks) | | %Y% | %Y% | flag | %N% | %N% | %N% | %N% | %N% | | (?x) (free-spacing mode) | | %Y% | %Y% | flag | %N% | %N% | %N% | %N% | %N% | | (?-ismxn) (turn off mode modifiers) | | %Y% | %Y% | %N% | %N% | %N% | %N% | %N% | %N% | | (?ismxn:group) (mode modifiers local to group) | | %Y% | %Y% | %N% | %N% | %N% | %N% | %N% | %N% | | (?>regex) (atomic group) | | %Y% | %Y% | %N% | %N% | %N% | %N% | %N% | %N% | | (?=regex) (positive lookahead) | | %Y% | %Y% | %N% | %N% | %N% | %N% | %N% | %N% | | (?!regex) (negative lookahead) | | %Y% | %Y% | %N% | %N% | %N% | %N% | %N% | %N% | | (?<=text) (fixed length positive lookbehind) | | %Y% | finite length | %N% | %N% | %N% | %N% | %N% | %N% | | (?<!text) (fixed length negative lookbehind) | | %Y% | finite length | %N% | %N% | %N% | %N% | %N% | %N% | | \G (start of match attempt) | | %Y% | %Y% | %N% | %N% | %N% | %N% | %N% | %N% | | (?(?=regex)then|else) (using any lookaround) | | %Y% | %N% | %N% | %N% | %N% | %N% | %N% | %N% | | (?(1)then|else) | | %Y% | %N% | %N% | %N% | %N% | %N% | %N% | %N% | | (?#comment) | | %Y% | %N% | %N% | %N% | %N% | %N% | %N% | %N% | | Free-spacing syntax supported | | %Y% | %Y% | %Y% | %N% | %N% | %N% | %N% | %N% | | Character class is a single token | | %Y% | %N% | %Y% | n/a | n/a | n/a | n/a | n/a | | # starts a comment | | %Y% | %Y% | %N% | n/a | n/a | n/a | n/a | n/a | | \X (Unicode grapheme) | | %Y% | %N% | %N% | %N% | %N% | %N% | %N% | %N% | | \x{0} through \x{FFFF} (Unicode character) | | %Y% | %N% | %N% | %N% | %N% | %N% | %N% | %N% | | \pL through \pC (Unicode properties) | | %Y% | %Y% | %N% | %N% | %N% | %N% | %N% | %N% | | \p{L} through \p{C} (Unicode properties) | | %Y% | %Y% | %Y% | %N% | %Y% | %N% | %N% | %N% | | \p{Lu} through \p{Cn} (Unicode property) | | %Y% | %Y% | %Y% | %N% | %Y% | %N% | %N% | %N% | | \p{L&} and \p{Letter&} (equivalent of [\p{Lu}\p{Ll}\p{Lt}] Unicode properties) | | %Y% | %N% | %N% | %N% | %N% | %N% | %N% | %N% | | \p{IsL} through \p{IsC} (Unicode properties) | | %N% | %Y% | %N% | %N% | %N% | %N% | %N% | %N% | | \p{IsLu} through \p{IsCn} (Unicode property) | | %N% | %Y% | %N% | %N% | %N% | %N% | %N% | %N% | | \p{Letter} through \p{Other} (Unicode properties) | | %N% | %N% | %N% | %N% | %N% | %N% | %N% | %N% | | \p{Lowercase_Letter} through \p{Not_Assigned} (Unicode property) | | %N% | %N% | %N% | %N% | %N% | %N% | %N% | %N% | | \p{IsLetter} through \p{IsOther} (Unicode properties) | | %N% | %N% | %N% | %N% | %N% | %N% | %N% | %N% | | \p{IsLowercase_Letter} through \p{IsNot_Assigned} (Unicode property) | | %N% | %N% | %N% | %N% | %N% | %N% | %N% | %N% | | \p{Arabic} through \p{Yi} (Unicode script) | | %Y% | %N% | %N% | %N% | %N% | %N% | %N% | %N% | | \p{IsArabic} through \p{IsYi} (Unicode script) | | %N% | %N% | %N% | %N% | %N% | %N% | %N% | %N% | | \p{BasicLatin} through \p{Specials} (Unicode block) | | %N% | %N% | %N% | %N% | %N% | %N% | %N% | %N% | | \p{InBasicLatin} through \p{InSpecials} (Unicode block) | | %N% | %Y% | %N% | %N% | %N% | %N% | %N% | %N% | | \p{IsBasicLatin} through \p{IsSpecials} (Unicode block) | | %N% | %N% | %Y% | %N% | %Y% | %N% | %N% | %N% | | Part between {} in all of the above is case insensitive | | %N% | %N% | %N% | %N% | %N% | %N% | %N% | %N% | | Spaces, hyphens and underscores allowed in all long names listed above (e.g. BasicLatin can be written as Basic-Latin or Basic_Latin or Basic Latin) | | %N% | Java 5 | %N% | %N% | %N% | %N% | %N% | %N% | | \P (negated variants of all \p as listed above) | | %Y% | %Y% | %Y% | %N% | %Y% | %N% | %N% | %N% | | \p{^...} (negated variants of all \p{...} as listed above) | | %Y% | %N% | %N% | %N% | %N% | %N% | %N% | %N% | | [:alpha:] POSIX character class | | ascii | %N% | %N% | %Y% | %N% | %Y% | %Y% | %Y% | | \p{IsAlpha} POSIX character class | | %N% | %N% | %N% | %N% | %N% | %N% | %N% | %N% | In the event that an external tool supports regular expression syntax that is *not* compatible with Perl, the calling code *must* defuse the regex feature that is not perl compatible. This may result in some loss of functionality, but is necessary to avoid confusing users.
BasicForm
edit
TopicClassification
DeveloperDocumentation
TopicSummary
InterestedParties
E
dit
|
A
ttach
|
P
rint version
|
H
istory
: r1
|
B
acklinks
|
V
iew topic
|
Edit
w
iki text
|
M
ore topic actions
Topic revision: r1 - 07 Apr 2010,
CrawfordCurrie
Development
Quick Links
Tasks and Bugs
Developers Bible
Release Plan
Feature Proposals
Topic Classification
Tools
Tasks
Index
Changes
Changes in all webs
Notifications
Statistics
Sandbox
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. see
CopyrightStatement
.