This question about Topic Markup Language and applications: Answered

Formatted Search with regular expressions

I have problems with regular expressions.

I need a search for all topics starting with "AbteilungsPortal" and I want to show the first line (the header) of the topic, not like the summary a fixed amount of characters.

The header starts with + and ends with a CR/LF




In the search string:
  • First ^ anchors the search at the beginning of the topic (or if we had multiple="on", it would anchor to beginning of the line)
  • [^a-zA-Z0-9] matches non-alphanumeric characters
  • The following * means match the non-alphanumeric characters zero or more times
  • Then AbteilungsPortal must follow that pattern.
In the pattern string:
  • ^ anchors the match to the beginning
  • ( begins the pattern to be extracted
    • [^a-zA-Z0-9] matches non-alphanumerics
    • Following * means match the non-alphanumerics zero or more times
    • ? makes the match "non-greedy" (in combination with * - match zero or more times until the first occurance of AbteilungsPortal)
    • .*? : . means "any character", * means "zero or more times", ? means "non-greedy"
  • ) finishes the pattern to be extracted
  • .* finishes the regex. In Foswiki, we must always finish $pattern() in this way
Result shown on this topic:

Searched: ^[^a-zA-Z0-9]*AbteilungsPortal
Number of topics: 1

-- PaulHarvey - 04 Apr 2010</verbatim>

Dear Paul,

I think I don't explain my problem very good.

In the company I work for we have a lot of departments (geman: Abteilung).

Each department will have a portal topic. The topic names are AbteilungsPortalA, AbteilungsPortalB, AbteilungsPortalC........ The portal topic starts with "---+!! KA-1 Machinery construction" for example

My own try:
%SEARCH{"AbteilungsPortal" scope="topic" nonoise="on" format="[[$topic]] $pattern(.*?---\+!!*([\n\r]+).*)"}%
I search for a topic its name contains "AbteilungsPortal". And I want show the first line of the founded topic.



I got it...mostly. (It helps to read the manual carefully!)
   format="[[$topic]] $pattern(.*?([:blank:].*?([\n\r]+)).*)"

New problems:

The topic I'm looking for ("AbteilungsPortalElt") starts with a heading1:
---+!! ELT-Abteilung
---++ Internal documents

The search-result is

AbteilungsPortalElt LT-Abteilung

The 'E' is surpressed! If the Text starts with 'A' or 'K' it will be shown correct.

And I tried to use the founded string as a link.

But this did not work?!?


I think it is a current bug that [:classes:] are not recognised by $pattern(), and I don't know if it is an easy to fix (we don't want to prevent future non-grep search algorithms - Development.NormaliseRegexSyntax and Development.AddMatchOperatorToQueryLanguage has some background).

Anyway, the $pattern() is treating the [:blank:] literally: matching :, b, l, a, n, k characters. I would suggest using \s instead but I seem to recall that here again $pattern() doesn't handle that notation either, I could be wrong though. Which is why I wrote a pattern to match non-alphanum characters: [^a-zA-Z0-9]

-- PaulHarvey - 06 Apr 2010

Try writing the class as [[:blank:]] - classes have to be within a double square-brackets. I'm not sure about the rest of the regex.

-- GeorgeClark - 07 Apr 2010

Dear Paul,

your idea with [^a-zA-Z0-9] is good. Now I get the results I want!
  • Note you may want to also try George's note that the character classes look like [[:blank:]] instead of [:blank:]. This would be better, especially because [^a-zA-Z0-9] does not contain accented characters, etc.

In the Sandbox I made some tries. If you (or somebody else) have time please have a look! SandboxAndreas

-- AndreasEllguth - 07 Apr 2010

Thank you for the very clear questions you wrote.

I have moved them into the Support web, because they are a nice series of questions that could be useful to other users. I hope you don't mind.


-- PaulHarvey - 07 Apr 2010

QuestionForm edit

Subject Topic Markup Language and applications
Version Foswiki 1.0.9
Status Answered
Topic revision: r9 - 07 Apr 2010, PaulHarvey - This page was cached on 14 Dec 2019 - 19:49.

The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy