You are here: Foswiki>Tasks Web>Item9889 (05 Jul 2015, GeorgeClark)Edit Attach

Item9889: viewfile script fails for attachments on topics with names containing special characters

Priority: Normal
Current State: Closed
Released In: 2.0.0
Target Release: major
Applies To: Engine
Component: FoswikiUIViewfile, I18N
Reported By: PascalSchuppli
Waiting For:
Last Change By: GeorgeClark

The bug

The bug manifests itself when you view attachments through the viewfile script, which is the case for example when you have a RewriteRule like the one below in place to protect attachments from public access:

RewriteRule ^wiki/pub/([^\/]+)/([^\/]+)/([^\/]+)$ cgi-bin/viewfile/$1/$2?filename=$3 [L,PT]

(Note: I also use the german UTF-8 site charset and specify upper and lower national characters manually in the configure script, but I believe this bug will also appear with other charsets, as long as a special character gets URL-encoded.)

To reproduce it:

  1. Create A topic that contains special characters in its name, e.g. german umlauts ("Kursübersicht").
  2. Attach a simple text file to this topic.
  3. Then try to view the attachment with the viewfile script (or by clicking on it if you have a RewriteRule like the one above...)
  4. The viewing fails with an "Attachment ? does not exist" exception.

Explanation and bugfix

Because the URL to the attachment includes the topic name, which contains a special character, the URL and thus path_info (from which the topic name is taken in lib/Foswiki/UI/ will contain an URL-encoded special character. However, lib/Foswiki/UI/ does not seem to url-decode the topic name before untainting and validating it. The call to isValidTopicName in then returns 0 because the topic string contains %, which are filtered out by Foswikis default NameFilter. So in lib/Foswiki/UI/, $topic remains undefined. If you change the following line

# The next element on the path has to be the topic name
$topic = Foswiki::Sandbox::untaint(

to this:

# The next element on the path has to be the topic name
my $decodedTopicName = Foswiki::urlDecode(shift(@path));
#$decodedTopicName = $session->UTF82SiteCharSet($decodedTopicName);
$topic = Foswiki::Sandbox::untaint(

then things work as they should with the special character topic names. If you uncomment the UTF8SiteCharSet-Line, then normal topic names break, but it might be needed in some cases (it's done for the filename and I'm not all too sure what's going on in it...)

BTW, while the filename is being treated for url-encoded characters, the web name also doesn't get treated, so the same bug would probably appear if you had a web with special characters in its name.

Would someone please confirm this and create a bug fix that's more than just my quick hack smile

-- PascalSchuppli - 25 Oct 2010

I would hesitate to just implement this fix.

I have tested with what we believe will be 1.1.1 and I have the following settings that should work well with all latin languages setup with an iso character set

  • $Foswiki::cfg{UseLocale} = 1;
  • $Foswiki::cfg{Site}{Locale} = 'en_US.ISO-8859-1';
  • $Foswiki::cfg{Site}{LocaleRegexes} = 1;
  • $Foswiki::cfg{UpperNational} = '';
  • $Foswiki::cfg{LowerNational} = '';
  • $Foswiki::cfg{Site}{CharSet} = 'iso-8859-1';

You may want to use en_US.ISO-8859-15 instead and the language could be another instead of en_US but that should mean little to how Foswiki works in practical life.

And then I can create topics and attach attachments with Danish letters like æøå or german letters like ü, ö, ä.

We know that UTF8 is not well supported yet and we are looking for someone who are experts in this area. Mainly for cyrillic and Chinese/Japanese. We have not seen any interest from arab or hebrew and it is impossible to develop these things when you have no clue how the language works character/right to left etc.

Pascal if you are setting your site up for German and you have not yet had it in production at all, then my advice is - use iso character set iso-8859-15 and forget UTF8 for now

-- KennethLavrsen - 25 Oct 2010

Hmmm, well, how about you test if my hacky bugfix breaks the code with a normal (iso-8859-1) charset and if it doesn't, implement it? Because this IS a bug in the code and it needs fixing; you can either fix it now or have someone rediscover it five years from now when UTF-8 gets moved up from experimental status to production. It took me three hours to figure out why attachments on topics with umlauts wouldn't work, so that's three hours of work someone else won't have to do in the future if you fix it now that you know what causes the problem.

I know UTF-8 is still experimental, but I've been using it for quite a while in production wikis (telling people not to use Umlauts in topic names as a workaround and finally getting fed up with this and looking for the reason this doesn't work today) and it's been working better and better over the last two years (to the point where even the WYSIWYG-Editor now works; it didn't use to). I could report a few other bugs that have to do with UTF-8, but none of them are problematic enough to be show stoppers. This one is seriously annoying. Please fix it now.

BTW, I'm surprised that things work fine with iso-8859-1, I expected this bug to be solely related to the failure to url-decode the topic name before passing it to the topic name validation, which should break iso-8859-1 as well. Strange that it doesn't.

-- PascalSchuppli - 25 Oct 2010

Deferring to 1.2 and the UTF-8 project.

-- GeorgeClark - 13 Dec 2011

Jan, can you or maybe Crawford validate the fix that's suggested in this task? The fix seems reasonable to me, but I'm not all that savvy on utf8.

-- GeorgeClark - 24 Jun 2014

The analysis is wrong, and the fix too.

The CGI specification RFC 3875 section 4.1.5 says "Unlike a URI path, the PATH_INFO is not URL-encoded". It is clear from this that the PATH_INFO environment variable is not URL encoded in CGI scripts. doesn't re-encode, so path_info() is not url-encoded either. So it should not be necessary for scripts to URL decode path_info and indeed, neither nor do this.

RFC 3875 section 4.1.5 further says "treatment of non US-ASCII characters in the path is system-defined", which is where a lot of people get confused. So what happens when a URL containing 'illegal' is written into the browser and sent to the server? Well, the baseline definition of a URL only allows for a subset of 7-bit ASCII to pass unencoded, so the browser automatically encodes any funnies that do not meet this specification. RFC 3986 states that these funnies must first be converted to UTF8 before encoding. The CGI specification sees to it that these bytes are url-unencoded, so by the time the string hits the script, it is a UTF-8 encoded byte string (just like any data string in Foswiki when the {Site}{CharSet} is 'utf-8').

So, what is happening in the OP? Well, viewfile does it's own path_info analysis, which mirrors that done in in one important respect viz. it does not re-encode the topic and web names into the site charset (though strangely enough, it handles the filename parameter). Both the web and the topic names need to be run through Foswiki::UTF82SiteCharset in, but neither should be URL-decoded first.

I have not tested any of this (other than by reading code); it is just my understanding of how it all works.

-- CrawfordCurrie - 24 Jun 2014

Based on Crawford's analysis I'm pushing this off to 2.0. We need unit tests as well.

-- GeorgeClark - 23 Dec 2014

Just tested on 1.2.0 (unicode) and it works.

-- Main.CrawfordCurrie - 24 Jun 2015 - 15:39

ItemTemplate edit

Summary viewfile script fails for attachments on topics with names containing special characters
ReportedBy PascalSchuppli
Codebase 1.1.0
SVN Range
AppliesTo Engine
Component FoswikiUIViewfile, I18N
Priority Normal
CurrentState Closed
TargetRelease major
ReleasedIn 2.0.0
Topic revision: r13 - 05 Jul 2015, GeorgeClark
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy