Item2646: Fix search code so that forking also works reliably in Windows

pencil
Priority: Normal
Current State: Needs Developer
Released In: n/a
Target Release: n/a
Applies To: Engine
Component: PlatformWindows
Branches:
Reported By: KennethLavrsen
Waiting For:
Last Change By: CrawfordCurrie

METASEARCH and SEARCH inconsistent between PurePerl and Forking

We seem to have a problem with the search code. It does not correctly handle the maximum length command line you have available in Windows.

At the moment we recommend Windows users to use PurePerl. But we really should be able to use forking as well.

It may work today depending on the length of the path to the Foswiki installation directory.

This is the text from a previous bug Item2504 which was resolved with the work around to recommend pure perl search for Windows.

I have refactored the information from 2504 below.

-- KennethLavrsen - 16 Jan 2010


Take this simple METASEARCH
<table>
%METASEARCH{type="parent" web="%WEB%" topic="WebHome" format="<tr><td>[[$web.$topic][$topic]]</td></tr>"}% 
</table>

This successfully shows the children of WebHome as delivered with Foswiki 1.0.8 (stricly 1.0.6 upgraded with 1.0.7 and 1.0.8 upgrades), but not my own topics which have this as their parent.

Using degug=raw to show you one topic (of about 4) that does not get found (I had to remove % in front of META as it gets removed on save otherwise):

META:TOPICINFO{author="JulianLevens" date="1260800812" format="1.1" version="1.1"}%
META:TOPICPARENT{name="WebHome"}%
---+!! !PCI

Whereas this provided topic (WebNotify) with foswiki:

META:TOPICINFO{author="ProjectContributor" date="1231502400" format="1.1" version="1"}%
META:TOPICPARENT{name="WebHome"}%

Is found as part of that search.

However, if I switch to PurePerl, then both items are found.

But

This search:

%SEARCH{ "*." topic="FileSend*Converter" scope="text" type="regex" nosearch="on" nonoise="on" nototal="off" format="$n()---+++$topic%BR%$percntINCLUDE{\"$topic\" section=\"summary\"}$percnt"}%

Works fine with the Forking algorithm, but pure perl gives me this output on the page:

Could not perform search. Error was: Quantifier follows nothing in regex; marked by <-- HERE in m/* <-- HERE ./ at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki/Store/SearchAlgorithms/PurePerl.pm line 41, line 1. at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki/Store/SearchAlgorithms/PurePerl.pm line 41 Foswiki::Store::SearchAlgorithms::PurePerl::__ANON__('META:TOPICINFO{author="JulianLevens" date="1249485343" forma...') called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki/Store/SearchAlgorithms/PurePerl.pm line 47 Foswiki::Store::SearchAlgorithms::PurePerl::search('*.', 'ARRAY(0x1f43e0c)', 'HASH(0x1fa2494)', 'C:/PROGRA~1/Foswiki/Foswiki_1_0_8_pa/data/Main/', undef, 'Main') called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki/Store/RcsFile.pm line 332 Foswiki::Store::RcsFile::searchInWebContent('Foswiki::Store::RcsLite=HASH(0x1ff31ac)', '*.', 'ARRAY(0x1f43e0c)', 'HASH(0x1fa2494)') called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki/Store.pm line 2029 Foswiki::Store::searchInWebContent('Foswiki::Store=HASH(0xf088e4)', '*.', 'Main', 'ARRAY(0x1f43e0c)', 'HASH(0x1fa2494)') called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki/Search.pm line 260 Foswiki::Search::_searchTopics('Foswiki::Search=HASH(0x1780f14)', 'Main', 'text', 'regex', 'HASH(0x1780ef4)', 'ARRAY(0x1f6050c)', 'FileSendBinConverter', 'FileSendCountsConverter', 'FileSendEZTSVConverter', ...) called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki/Search.pm line 680 Foswiki::Search::searchWeb('Foswiki::Search=HASH(0x1780f14)', 'inline', 1, 'topic', 'FileSend*Converter', 'search', '*.', 'basetopic', 'FileSend', ...) called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki.pm line 3836 Foswiki::__ANON__() called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/CPAN/lib//Error.pm line 379 eval {...} called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/CPAN/lib//Error.pm line 371 Error::subs::try('CODE(0x17800d4)', 'HASH(0x1780e54)') called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki.pm line 3845 Foswiki::SEARCH('Foswiki=HASH(0x74fe2c)', 'Foswiki::Attrs=HASH(0x1780d14)', 'FileSend', 'Main', 'Foswiki::Meta=HASH(0x1d7d0a4)') called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki.pm line 2872 Foswiki::_expandTagOnTopicRendering('Foswiki=HASH(0x74fe2c)', 'SEARCH', ' "*." topic="FileSend*Converter" scope="text" type="regex" no...', 'FileSend', 'Main', 'Foswiki::Meta=HASH(0x1d7d0a4)') called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki.pm line 2777 Foswiki::_processTags('Foswiki=HASH(0x74fe2c)', '---+!! File Send\x{a}%TOC%\x{a}%STARTSECTION{type="include"}%\x{a}---++ I...', 'CODE(0xe13954)', 16, 'FileSend', 'Main', 'Foswiki::Meta=HASH(0x1d7d0a4)') called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki.pm line 2694 Foswiki::expandAllTags('Foswiki=HASH(0x74fe2c)', 'SCALAR(0xe14144)', 'FileSend', 'Main', 'Foswiki::Meta=HASH(0x1d7d0a4)') called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki.pm line 3022 Foswiki::handleCommonTags('Foswiki=HASH(0x74fe2c)', '---+!! File Send\x{a}%TOC%\x{a}%STARTSECTION{type="include"}%\x{a}---++ I...', 'Main', 'FileSend', 'Foswiki::Meta=HASH(0x1d7d0a4)') called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki/UI/View.pm line 388 Foswiki::UI::View::_prepare('---+!! File Send\x{a}%TOC%\x{a}%STARTSECTION{type="include"}%\x{a}---++ I...', 'Foswiki=HASH(0x74fe2c)', 'Main', 'FileSend', 'Foswiki::Meta=HASH(0x1d7d0a4)', 0) called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki/UI/View.pm line 368 Foswiki::UI::View::view('Foswiki=HASH(0x74fe2c)') called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki/UI.pm line 304 Foswiki::UI::__ANON__() called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/CPAN/lib//Error.pm line 379 eval {...} called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/CPAN/lib//Error.pm line 371 Error::subs::try('CODE(0x936c54)', 'HASH(0x1d845dc)') called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki/UI.pm line 391 Foswiki::UI::_execute('Foswiki::Request=HASH(0xec2874)', 'CODE(0xec755c)', 'view', 1) called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki/UI.pm line 275 Foswiki::UI::handleRequest('Foswiki::Request=HASH(0xec2874)') called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki/Engine/CGI.pm line 29 Foswiki::Engine::CGI::run('Foswiki::Engine::CGI=HASH(0xd75c04)') called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/bin/view line 45 


It was found that the search had a typo but there was still an error. This text has been removed for clarity.

The typo was %SEARCH{ "*." should have been %SEARCH{ ".*"


After some testing I found this in the Apache error logs. Crucially, the Sandbox::sysCommand error is not created when I switch to PurePerl searching.

This would suggest that the forking algorithm is not entirely successful calling grep under Windows. Is this a clue?

[Tue Dec 22 13:53:14 2009] [error] [client 10.132.96.221] [Tue Dec 22 04:53:12 2009] CGI.pm: Use of uninitialized value $_ in -d at C:/strawberry/perl/lib/CGI.pm line 4083., referer: http://tw4-wiki/
[Tue Dec 22 13:53:14 2009] [error] [client 10.132.96.221] [Tue Dec 22 04:53:12 2009] CGI.pm: Use of uninitialized value $_ in -d at C:/strawberry/perl/lib/CGI.pm line 4083., referer: http://tw4-wiki/
[Tue Dec 22 13:53:14 2009] [error] [client 10.132.96.221] WARNING: Sandbox::sysCommand commandline probably too long (8885), referer: http://tw4-wiki/
[Tue Dec 22 13:53:14 2009] [error] [client 10.132.96.221] WARNING: Sandbox::sysCommand commandline probably too long (8782), referer: http://tw4-wiki/
[Tue Dec 22 13:53:14 2009] [error] [client 10.132.96.221] WARNING: Sandbox::sysCommand commandline probably too long (8897), referer: http://tw4-wiki/
[Tue Dec 22 13:53:14 2009] [error] [client 10.132.96.221] WARNING: Sandbox::sysCommand commandline probably too long (8873), referer: http://tw4-wiki/
[Tue Dec 22 13:53:14 2009] [error] [client 10.132.96.221] WARNING: Sandbox::sysCommand commandline probably too long (8765), referer: http://tw4-wiki/
[Tue Dec 22 13:53:14 2009] [error] [client 10.132.96.221] WARNING: Sandbox::sysCommand commandline probably too long (8899), referer: http://tw4-wiki/
[Tue Dec 22 13:53:14 2009] [error] [client 10.132.96.221] WARNING: Sandbox::sysCommand commandline probably too long (8796), referer: http://tw4-wiki/
[Tue Dec 22 13:53:14 2009] [error] [client 10.132.96.221] WARNING: Sandbox::sysCommand commandline probably too long (8911), referer: http://tw4-wiki/
[Tue Dec 22 13:53:14 2009] [error] [client 10.132.96.221] WARNING: Sandbox::sysCommand commandline probably too long (8887), referer: http://tw4-wiki/
[Tue Dec 22 13:53:14 2009] [error] [client 10.132.96.221] WARNING: Sandbox::sysCommand commandline probably too long (8779), referer: http://tw4-wiki/

This certainly gave me an idea. I created a topic called WikiChild1 and that appears even under the forking algorithm. This possibly explains why the apparently standard topics Web... and Wiki... appear whereas mine earlier in the alphabet do not. It suggests the possibility to me that a list of topics to search is passed to grep and when this list is too big the early entries are ignored. Remember I'm running on Windows (more details above) which may be relevant.

-- JulianLevens - 22 Dec 2009

If the problem is that grep cannot take all the topics under Windows because of command line max, then there is no solution other than PurePerl which is not a bad solution I would say.

Then the actions would be

  • Change the configure setting to non expert
  • Document in configure that PurePerl should be used for native Windows (not cygwin) - if I understand the conclusion right
  • Put a note in InstallationGuide about this.

This is what we decided to do in Item2504

-- KennethLavrsen - 05 Jan 2010

Had a chat with Sven and my assumption is wrong.

Here is the IRC log

[02:11] <Lavr> Sven what is your view on http://foswiki.org/Tasks/Item2504? Should Windows always search with PurePerl?
[02:12] <SvenDowideit> no :)
[02:12] <SvenDowideit> my opinion is that we should fix our bugs
[02:13] <SvenDowideit> grep used to work quite well on windows, but somewhere it stopped being as reliable
[02:14] <SvenDowideit> its a bit surprising because i recal running the unit tests on windows last time i had time
[02:14] <Lavr> The submitters argument is that it fails because Windows has a limit on max number of characters on command line. So maybe it will always fail if we pass too long a string to grep?
[02:14] <SvenDowideit> that has always been the case
[02:14] <SvenDowideit> on unix too
[02:14] <SvenDowideit> thats why there is code there attempting to deal with it
[02:15] <Lavr> it seems this is the reporters problem. Try and read his follow up carefully.
[02:15] <SvenDowideit> but the attempt is very simplistic -
[02:15] <SvenDowideit> if i had time, i would have already
[02:15] <SvenDowideit> at this point i'm already stealing time to try the rc
[02:15] <Lavr> That is OK. I just wanted to hear your view and I got it. Thanks
[02:16] * toffe82 has joined #foswiki
[02:16] <SvenDowideit> i did put a comment in the sandbox code wrt how we should recode it iirc
[02:16] <SvenDowideit> atm it chooses a number of topics to add to the command line each time
[02:17] <SvenDowideit> what it should to is calculate based on the length of the data path and the topic names +1space
[02:17] <SvenDowideit> and then compare to the known length of the command buffer
[02:17] <Lavr> Ah so it can fail if people on the site uses very long topic names
[02:18] <SvenDowideit> (also slightly doccoed in the code i think)
[02:18] <SvenDowideit> worse
[02:18] <SvenDowideit> it will fail more the longer the path to the data dir
[02:18] <Lavr> blast
[02:18] <SvenDowideit> ie - c:\ProgramFiles\foswiki\foswiki\data
[02:19] <SvenDowideit> but the maths would not be that hard to impl
[02:19] <Lavr> I'll copy this IRC trail to the report.

-- KennethLavrsen - 08 Jan 2010

PurePerl will be OK with documentation updated for other Windows users. I'll try to look at fixing the forking option, but no idea how long that will take, my workload is way too high right now. As I intend to move to Fast CGI eventually, will the NativeSearchContrib be another alternative or does that also depend on grep and have similar issues?

-- JulianLevens - 11 Jan 2010

I use NativeSearchContrib with FastCGI and ModPerl and it works well as far as i can tell, but I never tested it on windows.

-- GilmarSantosJr - 11 Jan 2010

Rleated? Support.Question380

-- PaulHarvey - 14 Jan 2010

This new item derived from Item2504 has been set to Waiting For JulianLevens and set to Urgent with 1.1.0 scope.

-- KennethLavrsen - 16 Jan 2010

No way is this urgent. NativeSearchContrib (which does not use grep or any other command-line tool) has worked on Windows for a long time now. Flipping back to "Normal" and "Waiting" to see if we can determine what the real problem is here, because it's not clear from the discussion above (several people talking at cross-purposes, AFAICT)

-- CrawfordCurrie - 09 Apr 2010

During my investigations I found that the code already have comments/code pertaining to this:
# process topics in sets, fix for Codev.ArgumentListIsTooLongForSearch
    my $maxTopicsInSet = 512;    # max number of topics for a grep call
      #TODO: the number is actually dependant on the length of the path to each file
      #SMELL: the following while loop should probably be made by sysCommand, as this is a leaky abstraction.
    ##heck, on pre WinXP its only 2048, post XP its 8192 - http://support.microsoft.com/kb/830473
    $maxTopicsInSet = 128 if ( $Foswiki::cfg{DetailedOS} eq 'MSWin32' );

I also note http://partmaps.org/era/unix/arg-max.html, that max topics of 512 could cause problems on non-windows boxes, albeit unlikely and on old systems.

As far as I can see it's not possible to pass, to grep, the files to process via another file.

A quick fix for me if I set change this last line above to:
    $maxTopicsInSet = 64 if ( $Foswiki::cfg{DetailedOS} eq 'MSWin32' );

then I can use the forking algorithm successfully. By adding some prints to STDERR I was able to get a better idea of how the forking sub search calls sandbox->systemCommand and how systemCommand expands this into the final command line. systemCommand handles this expansion a little differently between OS'es, so a one size fits's all calculation is not easy. The calculation would also have to be done each time around the loop; e.g. when searching a total of 1000 topics use 100 topics in first set and 75 in the next and so on, depending on the length of topic names selected.

Note that the new size of 64, I did calculate based on our set-up as reasonable, and indeed probably is for the vast majority of Windows set-up. Of course if someone else has a deep directory structure and/or long web-name and/or long topic names and/or a large regex then that could still break. It's pretty clear that 128 is just too generous.

The general problem with this idea is that it would create a fragile link between systemCommand and the forking sub search. A change to systemCommand could cause the calculation to be too generous and grep too fail.

Pushing this logic into systemCommand would make more sense, but systemCommand was designed, quite reasonably, to be agnostic to the nature of the command passed, by using templates and so on. It will require introducing a special list of params that are expanded into the template last. The idea being that if 1000 characters are already consumed, then 7192 are left, then by inserting one 'special' param one item at a time just before the total budget is used up would allow for this. It would also require systemCommand to return the list shortened by those consumed and the caller of course to handle that appropriately.

Is this a reasonable approach? Or have missed something important?

A performance question: at what size will forking become slower than pure-perl due to increased overhead of extra grep calls?

Note: a very simple patch is possible (to 64 from 128) to allow the forking search to work on Windows, but caveats would need to added to the docs. Could a configure variable be used to set the limit? Indeed, is all the work suggested above overkill?

-- JulianLevens - 16 Apr 2010

Some supporting maths (where appropriate size includes all escaping prior to passing to grep):
Element Size Notes
grep command 128 overkill, on my machine it's 54: c:/PROGRA~1/GnuWin32/bin/grep "-E" "-i" "-l" "-H" "--"
single regex 512 overkill, my largest is around 150 and that's with a number of AND (ie ';') chunks, and aren't these chunks broken down and grepped one by one?

That leaves 7552 (8192 - 128 - 512)
Element Size Notes
Path name 50 for a particular set-up this is a fixed size: eg C:\\PROGRA~1\\Foswiki\\Foswiki_1_0_8_pa\\data\\ with some effort (renaming directory and reconfiguring) this could be made quite a bit smaller if necessary
Extras 6 \\ + .txt
Web Name 12 these two will vary topic by topic
Topic Name 38 but an average length of 50 combined is reasonable

That's 106 per topic in the search. That suggests a maximum of 71 topics to pass to each grep call (7552 / 106). Turning that on it's head and setting the limit to 64, allows an extra 12 chars per average topic name length. (Or pessimistically set the limit to 32 to allow for very large topic path names).

  • In theory this will not cater for all Windows installations
  • In practice this will cater for all Windows installations †

On my set-up with the current Windows limit set to 128 topics my searches were only failing by a few hundred bytes not thousands.

So my 'patch', such as it is, follows: Forking.pm circa line 68:

  -  $maxTopicsInSet = 128 if ( $Foswiki::cfg{DetailedOS} eq 'MSWin32' );
  +  $maxTopicsInSet = 64 if ( $Foswiki::cfg{DetailedOS} eq 'MSWin32' );

Sorry, I'm not au fait with SVN GIT et al just yet.

†: probably

-- JulianLevens - 23 Apr 2010

Hold that thought, a number of my application searches are failing. I doubt this has anythig to do with the above, but suggests another problem with Forking.pm on Windows.

More to follow ...

-- JulianLevens - 27 Apr 2010

I've amended Forking.pm but updating the following block as follows:
   if ( $Foswiki::cfg{DetailedOS} eq 'MSWin32' ) {
            #try to escape the ^ and "" for native windows grep and apache
            $searchString =~ s/\[\^/[^^/g;

            # Fix escaping and quoting for Windows
            $searchString =~ s#\\#\\\\#g;
            $searchString =~ s#"#\\"#g;
            $searchString = q(") . $searchString . q(");
   }

My searches are now largely correct, with one fly in the ointment being the need to convert '\.' within searches to [.] (and I cannot be sure that there are further special cases with '\').

However, I've also done some rough timing (in seconds) just using stopwatch on the browser as follows.

Application Forking 64 Forking 100 !PurePerl
List company applications 53 21 12
List company teams 65 48 15

I feel that the difference in timing is sufficient to be considered significant, even though this performance testing has not been exhaustive.

Forking 64/100 refers to the number of files allowed per grep. It suggests the possibility that performance could indeed improve considerably if 512 files per set were possible as in Unix/Linux.

I am left to recommend PurePerl for Windows and rely on:
  • Future foswiki caching to improve performance
  • Possibly NativeSearchContrib
  • Enhance grep on windows to allow all the files to searched, to be specified in one go, by writing them to a separate file but ...

-- JulianLevens - 28 May 2010

we should probably commit the patch, even if its incomplete, as its an improvement..

-- SvenDowideit - 08 Jul 2010

in a depressing bit of parallel evolution, I made the maxTopicInSet dependant on some maths in Item9134 - but I'm adding the quoting fixes to see how things improve.

mmm, it might not come to anything tho - it looks like trunk forking on windows is pretty unhappy.

-- SvenDowideit - 15 Jul 2010

ItemTemplate edit

Summary Fix search code so that forking also works reliably in Windows
ReportedBy KennethLavrsen
Codebase 1.0.8, trunk
SVN Range
AppliesTo Engine
Component PlatformWindows
Priority Normal
CurrentState Needs Developer
WaitingFor
Checkins distro:99d0017aa28f
TargetRelease n/a
ReleasedIn n/a
CheckinsOnBranches
trunkCheckins
masterCheckins
ItemBranchCheckins
Release01x01Checkins
Topic revision: r12 - 20 Jun 2015, CrawfordCurrie
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy