You are here: Foswiki>Tasks Web>Item10825 (05 Jul 2015, GeorgeClark)Edit Attach

Item10825: Using save in a rest call destroys character encoding.

pencil
Priority: Urgent
Current State: Closed
Released In: 2.0.0
Target Release: major
Applies To: Engine
Component: FoswikiUISave
Branches: trunk
Reported By: MichaelDaum
Waiting For:
Last Change By: GeorgeClark
Given you POST a form to the save handler asynchronously, e.g. via jquery.form's ajaxSubmit. Then this content will be utf8 encoded no matter what the charset of the html page is. That's standard by the specs. That's different to a normal POST which will use the same charset as indicated by the html page it comes from.

Foswiki's Request object however silently assumes that anything it receives is encoded the way it has said while rendering the html page.

So no surprise chars get mangled during a rest call of save.

Next problem is that the call itself does not specify any charset in the http headers part of the Content-Type header. That's a no-go unfortunately.

From what I have tested the only http header that differentiates a normal POST from an ajax POST is the X-Requested-With header, which is set to XMLHttpRequest in case of a rest call ... which also indicates that the content is encoded in utf8.

Here's the patch to trunk's Request.pm that tries to fix this:

Index: lib/Foswiki/Request.pm
===================================================================
--- lib/Foswiki/Request.pm      (revision 11812)
+++ lib/Foswiki/Request.pm      (working copy)
@@ -407,11 +407,24 @@
 
     my ( $key, @value ) = rearrange( [ 'NAME', [qw(VALUE VALUES)] ], @p );
 
+    my $requestedWith = $this->header("X-Requested-With") || '';
+
     return @{ $this->{param_list} } unless defined $key;
     if ( defined $value[0] ) {
         push @{ $this->{param_list} }, $key
           unless exists $this->{param}{$key};
-        $this->{param}{$key} = ref $value[0] eq 'ARRAY' ? $value[0] : [@value];
+
+        if ($requestedWith eq 'XMLHttpRequest') {
+          # transform to site charset
+          if (ref $value[0] eq 'ARRAY') {
+            $this->{param}{$key} = [map(toSiteCharSet($_), @{$value[0]})];
+          } else {
+            $this->{param}{$key} = [map(toSiteCharSet($_), @value)];
+          }
+        } else {
+          $this->{param}{$key} = ref $value[0] eq 'ARRAY' ? $value[0] : [@value];
+        }
+
     }
     if ( defined $this->{param}{$key} ) {
         return wantarray
@@ -423,6 +436,28 @@
     }
 }
 
+sub toSiteCharSet {
+    my $encoded = shift;
+
+    return $encoded
+      if ( !defined $Foswiki::cfg{Site}{CharSet}
+        || $Foswiki::cfg{Site}{CharSet} =~ m/^utf-?8$/i );
+
+    require Encode;
+    import Encode;
+    my $encoding = Encode::resolve_alias( $Foswiki::cfg{Site}{CharSet} );
+    if ( not $encoding ) {
+        print STDERR "'Conversion from $Foswiki::cfg{Site}{CharSet} not supported, or name not recognised - check perldoc Encode::Supported\n";
+        return $encoded;
+    }
+    else {
+
+        # converts to {Site}{CharSet}, generating HTML NCR's when needed
+        my $octets = Encode::decode( 'utf-8', $encoded );
+        return Encode::encode( $encoding, $octets, &Encode::FB_HTMLCREF() );
+    }
+}
+
 =begin TML

Any comments?

-- MichaelDaum - 01 Jun 2011

This needs to work correctly for any request, not just XHR.

So please test both X-Requested-With as well as Content-Type.

Lack of Content-Type in a request which contains body data, such as POST, leaves us guessing about the nature of the data, and should be considered a very easily fixed bug of the client which built the request (unless the data is so alien that it can't be described with a MIME type or charset).

-- PaulHarvey - 01 Jun 2011

Please don't leave the "Waiting for" field empty when asking for feedback. I assume you meant to ask Micha, so I put his name in.

-- CrawfordCurrie - 08 Jun 2011

It was me leaving WaitingFor empty. That's by purpose to express the concept "anybody who feels inclined to comment / can somebody cross-check and leave his 2cent / This is sensitive stuff / I just don't want break it without a second opinion".

From what I can tell: it saved my life on a client's intranet using some custom ajax code, similar to the async "Save+Continue" in NatEditPlugin.

-- MichaelDaum - 08 Jun 2011

Just one minor comment: as Paul wrote, it might make sense to encode more than just XHR, and some stuff which might not be in UTF-8. Si maybe just extend the code so that the toSiteCharset function takes an encoding as argument, and passes utf-8 for an XHR. Then it can easily be extended to get the encoding from the HTTP header, and re-encode it there.

Removing anybody from the WaitingFor field as plague doesn't like it. Micha, if you want other people to comment, ask them, or put them in the field.

-- OlivierRaginel - 25 Jun 2011

Michael. There is a developer desire to have me release a 1.1.4 within the next very few days. Is this one you want to fix the next day or so or do we have to carry it over to 1.1.5?

I agree it is an important bug

-- KennethLavrsen - 26 Jun 2011

I have been promising to help "finish" this task but got distracted on tackling deeper UTF-8 issues on trunk. I don't have time in the next few days to help do any more for 1.1.4.

At minimum we should just apply Michael's fix as-is for 1.1.4. XHR are probably the vast majority of rest script requests anyway. So in that case we should open a new urgent task to cover non-XHR requests for 1.1.5.

But it would be nice if somebody has time to add code and test non-XHR requests for 1.1.4. I just can't do it this week.

-- PaulHarvey - 26 Jun 2011

first up do not duplicate existing functions

there's already a Foswiki::I18N::toSiteCharSet. if it needs some care, fix it.

  • Nother version is part of the WysiwygPlugin, called RESTParameter2SiteCharSet() ... so there already is redundancy. And they are causing trouble as they are applied multiple times to the same string by foswiki. -- MichaelDaum - 27 Jul 2011

secondly, I'd be tempted to set a foswiki context for it, and thus test for that context - that way plugins or whatever also get to know about it.

-- SvenDowideit - 01 Jul 2011

these charset conversion functions desperately need a new home

-- MichaelDaum - 01 Jul 2011

Warning: some plugins implementing REST handlers decode url params to the site's charset on their own. If both - the core and the plugins - do so, things go wrong badly. See Item10995.

For what it's worth, there is quite an array of plugins that I reviewed that all implement some kind of encoding/decoding similar but not 100% equivalent to the method above, most notably also found in WysiwygPlugin::Handlers.pm, called RESTParameters2CharSet().

-- MichaelDaum - 23 Jul 2011

Which plugins?

-- CrawfordCurrie - 26 Sep 2011

The above patch is wrong. Basically we can't fix it until all of Foswiki decodes to internal string representation ... nothing we want to do on foswiki-1.1.x as it affects lots of code that needs fixing. So downgrading this bug item to normal not to block 1.1.4 anymore.

-- MichaelDaum - 18 Oct 2011

Concur, but I really want this "done" for 1.2/2.0 (and I will help with the work), so I'm making it urgent again against ReleaseTarget minor.

-- PaulHarvey - 19 Oct 2011

Item11242 marked as duplicate of this item.

-- MichaelDaum - 09 Nov 2011

Also: Item11932

-- MichaelDaum - 08 Jun 2012

Latest NatEditPlugin now has got a REST handler for save+continue which is a thin wrapper around the normal save to perform the parameter transcoding.

WysiwygPlugin basically does the same for its REST handlers.

So this all isn't really party time.

-- MichaelDaum - 10 Jul 2012

So why not use the one in WysiwygPlugin?

-- SvenDowideit - 11 Jul 2012

Cus I've got wikis with no WysiwygPlugin installed. And gawd it would be better to remove it from NatEditPlugin and WysiwygPlugin and make it all work without jumping thru burning hoops. Right now I am more driven by urgent requests of users not being able to save&continue.

-- MichaelDaum - 11 Jul 2012

See also ImprovedRESTSupport

-- CrawfordCurrie - 27 Mar 2013

Discussed at FoswikiCamp2014, and agreed that documenting save script as inappropriate for REST (unless the site is Unicode) is appropriate.

-- CrawfordCurrie - 13 Mar 2014

 

ItemTemplate edit

Summary Using save in a rest call destroys character encoding.
ReportedBy MichaelDaum
Codebase 1.1.3, trunk
SVN Range
AppliesTo Engine
Component FoswikiUISave
Priority Urgent
CurrentState Closed
WaitingFor
Checkins distro:1f0f70c49c89 distro:f55b50520be7
TargetRelease major
ReleasedIn 2.0.0
CheckinsOnBranches trunk
trunkCheckins distro:1f0f70c49c89 distro:f55b50520be7
masterCheckins
ItemBranchCheckins
Release01x01Checkins
Topic revision: r27 - 05 Jul 2015, GeorgeClark
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy