You are here: Foswiki>Tasks Web>Item13660 (31 Dec 2015, GeorgeClark)Edit Attach

Item13660: The SEARCH returns mixed NFC/NFD on OS X (causing problem for example in the TreePlugin)

pencil
Priority: Normal
Current State: Duplicate
Released In: 2.1.0
Target Release: minor
Applies To: Engine
Component: I18N, SEARCH, TreePlugin, Unicode
Branches:
Reported By: JozefMojzis
Waiting For: JozefMojzis
Last Change By: GeorgeClark

The problem

The following SEARCH returns mixed NFD/NFC strings on OS X.
%SEARCH{search=".*" web="Sandbox" format="   * $topic $parent" scope="topic" regex="on" nosearch="on" nototal="on" noempty="on"}%
the $topic is NFD normalised string, and the $parent is NFC.

Because the TreePlugin calls %SEARCH internally, it gets confused from the results and therefore it didn't works OK on OS X.

Repo the TreePlugin issue (on OS X)

  • create a topic "Caca" (parent WebHome)
  • create a topic "ZuZu" (parent Caca)
  • create a topic "ČaČa" (parent WebHome)
  • create a topic "ŽuŽu" (parent ČaČa)
  • use the %TREEVIEW{}% macro
The result (on OS X| will be something like
   4 WebHome
   4.1 CaCa
   4.1.1 ZuZu
   4.2 ČaČa
   5 ŽuŽu
So, the for the ascii shows ok the parent/child relationship, but not for the unicode topic.

Easy test for the SEARCH (on OS X)

  • create few unicode topic names in the Sandbox (like ČaČa and it's child ŽuŽu)
  • create a topic !SearchTest with the above %SEARCH
  • use
    curl http://localhost:8080/Sandbox/SearchTest?skin=plain" | od -bc
  • it will shown something like the following
0014360   151 076 040 074 154 151 076 040 132 314 214 165 132 314 214 165
           i   >       <   l   i   >       Z    ̌  **   u   Z    ̌  **   u
0014400   040 074 141 040 150 162 145 146 075 042 057 123 141 156 144 142
               <   a       h   r   e   f   =   "   /   S   a   n   d   b
0014420   157 170 057 045 143 064 045 070 143 141 045 143 064 045 070 143
           o   x   /   %   c   4   %   8   c   a   %   c   4   %   8   c
0014440   141 042 076 304 214 141 304 214 141 074 057 141 076 012 074 057
           a   "   >   Č  **   a   Č  **   a   <   /   a   >  \n   <   /
or using noautolink
0010760   154 151 076 040 132 314 214 165 132 314 214 165 040 304 214 141
           l   i   >       Z    ̌  **   u   Z    ̌  **   u       Č  **   a
0011000   304 214 141 012 074 057 154 151 076 074 057 165 154 076 040 012
           Č  **   a  \n  
e.g. clearly visible - the "ŽuŽu" is NFD encoded and its parent ČaČa is NFC.

Quick & dirty fix for the TreePlugin

diff --git a/lib/Foswiki/Plugins/TreePlugin.pm b/lib/Foswiki/Plugins/TreePlugin.pm
index 5c30ca5..7829581 100644
--- a/lib/Foswiki/Plugins/TreePlugin.pm
+++ b/lib/Foswiki/Plugins/TreePlugin.pm
@@ -23,6 +23,7 @@ package Foswiki::Plugins::TreePlugin;
 
 use strict;
 use warnings;
+use Unicode::Normalize qw(NFC);
 
 use Foswiki::Func;
 
@@ -530,7 +531,8 @@ sub doSEARCH {
 "%SEARCH{search=\"$searchVal\" web=\"$searchWeb\" format=\"$searchTmpl\" scope=\"$searchScope\" regex=\"on\" nosearch=\"on\" nototal=\"on\" noempty=\"on\" excludetopic=\"$excludetopic\" topic=\"$includetopic\"}%";
     &Foswiki::Func::writeDebug($search) if $debug;
 
-    return Foswiki::Func::expandCommonVariables($search);
+    my $search_result = Foswiki::Func::expandCommonVariables($search);
+    return $Foswiki::UNICODE ? NFC($search_result) : $search_result;
 }
 
 =pod

Of course, the above isn't correct solution. We need fix the NFC/NFD at it's roots - e.g. everywhere where we doing decode_utf8($string) we should do NFX(decode_utf8($string)), where the NFX is NFD or NFC - whatever on what reach consensus the core dev team.

-- JozefMojzis - 01 Sep 2015

Jomo, is this task fixed by the changes made for Item13405?

-- GeorgeClark - 24 Dec 2015

Unfortunately no. The SEARCH still returns for the $search_result string like:
Sandbox|Z\x{30c}uZ\x{30c}u|\x{10c}a\x{10c}a|\$outnum [[Sandbox.Z\x{30c}uZ\x{30c}u][Z\x{30c}uZ\x{30c}u]] <br />";
e.g the parent is returned as NFC - ČaČa (x{10c}a\x{10c}a) - but the topicame ŽuŽu is NFD (Z\x{30c}uZ\x{30c}u). Applying the above patch (NFC-ing the search result) works.

-- JozefMojzis - 26 Dec 2015

Just for the record: tested on "be33f8f40df93f37d796caac361a23f3a7aa2655".

-- JozefMojzis - 26 Dec 2015

Ahh... it works. The NFCNormalizeFilenames was unset. SOOOORRY for the caused confusion. frown, sad smile I set it 1st time and works everything as i reported in the Item13405. And later, when you asked about this TREEPLUGIN test I forgot to set it again.

IMHO, this cfg setting is really bad idea to do it manually. cry The configure (Foswiki.spec) is really so dumb and doesn't allows one simple condition as default? such
 $Foswiki::cfg{NFCNormalizeFilenames} = 1 if $^O =~ /darwin/;

-- JozefMojzis - 30 Dec 2015

I wonder if there is some easy way to test if the file system is NFC or NFD. Rather than an OS test, which would miss remote file system situations.

-- GeorgeClark - 31 Dec 2015

I think the solution is a change in bootstrap. We create a file using a NFC filename in the data directory, and then read it back. If it changes, we are probably on a NFD system, and we can set the normalize flag correctly.

-- GeorgeClark - 31 Dec 2015

If you want be "politically correct" smile the tests should be done per web and per directory based. Imagine an DBI based storage. The "data" is NFC but the /pub could be NFD.

Or, some /data/Some and/or /pub/Another could be symlinked to remote...

So, yes, youre right - the file creation test could help - but it isn't an bulletproof solution too. So, imho - is enough done the "simple $^O match to darwin . (at least util someone will not report some bug) :).

-- JozefMojzis - 31 Dec 2015

well, I implemented the test aganst the data directory. I think trying to probe every directory under the data and pub trees is excessive. And if installed on an OSX system, then the simple probe of data should be sufficient. I'd hope.

-- GeorgeClark - 31 Dec 2015

Setting this task to duplicate. It's fixed in Item13405.

-- GeorgeClark - 31 Dec 2015
 

  • this file is called ČáŘý - uploaded from OS X's NFD filesystem:
    Error: (3) can't find %c4%8c%c3%a1%c5%98%c3%bd.png in Tasks

ItemTemplate edit

Summary The SEARCH returns mixed NFC/NFD on OS X (causing problem for example in the TreePlugin)
ReportedBy JozefMojzis
Codebase 2.0.3, 2.0.2, 2.0.1, 2.0.0, trunk
SVN Range
AppliesTo Engine
Component I18N, SEARCH, TreePlugin, Unicode
Priority Normal
CurrentState Duplicate
WaitingFor JozefMojzis
Checkins
TargetRelease minor
ReleasedIn 2.1.0
CheckinsOnBranches
trunkCheckins
masterCheckins
ItemBranchCheckins
Release02x00Checkins
Release01x01Checkins
I Attachment Action Size Date Who Comment
ČáŘý.pngpng ČáŘý.png manage 3 K 13 Dec 2015 - 18:08 JozefMojzis this file is called ČáŘý - uploaded from OS X's NFD filesystem
Topic revision: r10 - 31 Dec 2015, GeorgeClark
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy