Item1072: Grep (forking query seach) involving spaces and quotes fail on Windows

pencil
Priority: Normal
Current State: Needs Developer
Released In: n/a
Target Release: major
Applies To: Engine
Component: PlatformWindows
Branches:
Reported By: CrawfordCurrie
Waiting For:
Last Change By: CrawfordCurrie
It is easy to write a query search which when processed creates a Windows command line that doesn't work with Forking search. The rules for quoting in Windows are rather weird; the command.com doc says:
  • If all of the following conditions are met, then quote (") characters on the command line are preserved:
    • no /S switch
    • exactly two quote characters
    • no special characters between the two quote characters, where special is one of: &<>()@^|
    • there are one or more whitespace characters between the the two quote characters
    • the string between the two quote characters is the name of an executable file.
Right now we use quotemeta to protect the characters in a RE generated from the query processor. This doesn't work with the above rules because it escapes, but does not replace, "special characters" (& becomes \&, | becomes \| etc).

A solution would be to use character codes or character classes in the grep command line. Unfortunately GNU grep has no support for character codes and only limited support for character classes frown, sad smile

The ideal solution would be to use exec, but Windows just doesn't want to play. The workaround is to use Native or PurePerl searching.

-- CrawfordCurrie - 13 Feb 2009

Confirming this so Sven Dowideit notices it, since he's been looking at Windblows.

-- CrawfordCurrie - 05 Jun 2010

there's probably a CPAN module we can use - but atm, things are pretty dead on forking search and windows

-- SvenDowideit - 17 Jun 2010

I had made this change, from:
    if ( $Foswiki::cfg{DetailedOS} eq 'MSWin32' ) {
        #try to escape the ^ ad "" for native windows grep and apache
        $searchString =~ s/\[\^/[^^/g;
        $searchString =~ s/"/""/g;
    }
To this:
    if ( $Foswiki::cfg{DetailedOS} eq 'MSWin32' ) {
        #try to escape the ^ and "" for native windows grep and apache
        $searchString =~ s/\[\^/[^^/g;
        $searchString =~ s#\\#\\\\#g;
        $searchString =~ s#"#\\"#g;
        $searchString = q(") . $searchString . q(");
    }

And this mostly works, one issue I had was needing to change regexes with \. to [.] to work, because although doubling up escape characters worked in the general case but not this one. By this time I had come to conclusion that a lot more work was required to be sure of all the interpolations done on the command line by perl/command.exe/c-runtime before grep even looks at its arguments (the escapes mentioned by Crawford above I had not yet encountered). To be really sure I was considering compiling the grep sources with extra logic to dump out all the args somewhere, plus the original command line before being processed by the c runtime. A possibly better alternative was to consider writing the regex to a separate file and passing that to grep.

However, my fix above was certainly good enough to test performance and because of the limit of 8192 bytes I could only achieve about 100 topics per grep. This was much slower then PurePerl. To make any of this worthwhile you also need to increase the number of topics that can be passed — there is apparently a threshhold where grep will outperform PurePerl. One way would be to change directory to .../data and then simply pass Web/Topic rather than c:/Program Files/Foswiki/Foswiki_production/data/Web/Topic. This would then probably allow 200 topics or so, and we may then gain performance benefits. However, it's not clear how much.

Alas, grep does not allow you to pass a file listing all the file you want to search, you could of course create a different version of grep that allows this, but why bother, NativeSearch makes more sense then.

Talking of NativeSearch, I thought it was worth mentioning that because of strawberry perl it was easy to install. As strawberry comes with a complete set of tools. I was able to change to the tools/nativesearch directory and use dmake which worked first time. I grabbed pcre lib binaries of the web (2007 versions) to go along with this and I was good to go. As these pcre libraries are a few years old I plan to recompile the latest from source, partly to get the latest, and also to test if the strawberry tools are up to it. It the strawberry tools work it would be something to bear in mind for Windows users.

I think Sven meant this module http://search.cpan.org/~dsb/Argv/

However as I say, the grep issue is not just about basic functionality. If performance is still less than PurePerl then why bother with it? Would it even be worth considering dropping grep from Windows foswiki — it would be one less dependency to worry about?

-- JulianLevens - 17 Jun 2010

Not sure why this is waiting for Sven. He is drowning in Waiting for items.

Julian opens a good question. Why bother with grep search on Windows if it is so bad?

-- KennethLavrsen - 31 Jul 2010

Bumping it to me is no good; I tried to fix this before, and failed, so I wrote NativeSearch. Sven thought he could do better. My approach to handling this would be to doc "don't use grep on Windows" which is something anyone can do.

Removed both Sven and myself from the "Waiting for" list.

-- CrawfordCurrie - 14 Aug 2010

ItemTemplate edit

Summary Grep (forking query seach) involving spaces and quotes fail on Windows
ReportedBy CrawfordCurrie
Codebase 1.0.9, trunk
SVN Range Foswiki-1.0.0, Thu, 08 Jan 2009, build 1878
AppliesTo Engine
Component PlatformWindows
Priority Normal
CurrentState Needs Developer
WaitingFor
Checkins distro:f34be737936b
TargetRelease major
ReleasedIn n/a
CheckinsOnBranches
trunkCheckins
masterCheckins
ItemBranchCheckins
Release02x01Checkins
Release02x00Checkins
Release01x01Checkins
Topic revision: r9 - 24 Mar 2017, CrawfordCurrie
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy