cross
New Foswiki release 2.1.6 is available with important security fixes.
Sourceforge foswiki email lists being discontinued. Subscribe to the new Foswiki announce and discuss lists at MailingLists

Feature Proposal: Make Foswiki::Request responsible for parsing the query path and identifying the Web, Topic & Attachment.

Motivation

Currently Foswiki.pm parses apart the query path, so it uniformly applies to all scripts. However there are some cases where the default of Web/Subweb/Topic doesn't apply:
  • rest path uses rest/<Subject>/</Verb>
  • viewfile adds the Attachment name to the path
  • JsonRpcContrib uses jsonrpc/<Namespace>/</Method>

We need a way to parse the query path differently based upon the type of request.

Description and Documentation

  1. Add an internal parse function to the Foswiki::Request. This will be based somewhat on the parser implemented in Foswiki::Address, and in Foswiki.pm
  2. Add methods to get the results:
    • web() -> returns the Web/Subweb name
    • topic() -> returns the topic name
  3. Modify Foswiki::Contrib::JsonRpcContrib::Request to subclass Foswiki::Request. Overrides the web() and topic() methods, to return the web and topic names extracted from the JSON request.
  4. Extend the SwitchBoard definition to specify the Request object name.
  5. Modify Foswiki.pm to use the request methods to recover the web and topic names.

I think that it would also be appropriate to create a Foswiki::Request::Attachment which implements an attachment() method to recover the filename from the query path, and/or the filename query param. Switchboard entries for viewfile (and xsendfile) would use the Attachment type request.

For Rest, we could also implement a Foswiki::Request::Rest which properly assigns web/topic from the query params, or topic query param.

Examples

Revised switchboard entry for JsonRpcContrib:
    $Foswiki::cfg{SwitchBoard}{jsonrpc} = {
        package      => 'Foswiki::Contrib::JsonRpcContrib',
        request      => 'Foswiki::Contrib::JsonRpcContrib::Request',
        function     => 'dispatch',
        context      => { jsonrpc => 1 },
    };

Impact

See also Tasks.Item13969 for discussion of the issues in the current implementation.

%WHATDOESITAFFECT%
edit

Implementation

This moves Foswiki::Request away from compatibility with CPAN:CGI::Request.

-- Contributors: GeorgeClark - 15 Mar 2016

Discussion

+1

-- MichaelDaum - 15 Mar 2016

I'm all for this.

I would strongly suggest to place the parsing into Foswiki::Address::Parser which would become a common place for addresses to be parsed. This could then be reused by the Query code and elsewhere — separation of concerns and all that.

-- JulianLevens - 15 Mar 2016 - 09:11

A word of warning; are you proposing to validate the web and topic parsed out in the request? If not, you need to consider different names for the functions, as you are otherwise implying semantics to URL components that are not necessarily true.

It might help if this feature proposal specified exactly how the URL will be broken down and the component parts identified.

-- Main.CrawfordCurrie - 15 Mar 2016 - 09:32

The Foswiki::Address parser attempted to guess the web, subweb, topic and attachment names universally based upon the presence of (/) and (.) delimiters, and was quite unsuccessful at times due to the ambiguity of our syntax. By subclassing the request object based upon the type of request (action), it can know whether or not an attachment name is expected,

I think I see where you are going with the semantics, and think the functions might be better called requestedWeb and requestedTopic. I'm wondering if we need isValid functions of some sort, so that the caller can know if the requested value is valid, present but invalid, or missing.

The parser would accept the following:
  • Default behaviour (called by Foswiki::Request)
    • Accept Web/Subweb/Topic, Web/Subweb.Topic Web/Subweb(/notopic)
    • Avoid calling store for existence checks unless required to resolve ambiguity.
    • Break ambiguity as:
      • Trailing (/) forces the last component to be a Web name
      • (.) as last separator forces last component to be a Topic name
      • If last component is ambiguously Web or Topic:
        • If exists as a topic name, return as a topic
        • If no topic exists, and web exists, return as a web name, topic name missing
        • No web exists and no topic exists, return as topic name.
    • Validate each component per our isValid web or topic name regular expressions
    • Fail on encountering the first illegal component. ie. /Sandbox/badweb/Anotherweb/Topic would return nothing. It would not just omit "badweb" and return Sandbox/Anotherweb...

And here I need some guidance. Should the request "filter out" bad characters, or always return the requested value, but with a flag showing the data as illegal/invalid. I can see the value of filtering very early, so that requestedWeb() cannot return "Evil<script>..Web. Ah, maybe this clarifies the function names:
  • validatedWeb() returns a fully validated web name, or nothing
  • requestedWeb() returns the raw request, which should be treated with care
  • ...

Separate from the parser, the Foswiki::Request
    • topic query param always overrides the path.
      • If web component specified, then path completely ignored topic=Someweb.SomeTopic
      • If only topic component specified, then web parsed from the path topic=SomeTopic
      • If web is missing from both topic query param and path, then use the query param defaultweb
    • It will not return a default value if it cannot resolve to a topic.

When subclassed as an "Attachment request, all of the above applies except:
    • Use the common parser with a flag to indicate a filename is expected.
    • Provides a function requestedFilename()
    • Last component can possibly be a filename. (required if filename query param is not provided)
    • dot (.) cannot be used as the Web.Topic delimiter, since dot is a legal filename character.

When subclassed as a "Rest request"
    • Use common parser with flag saying to ignore the query path.
    • Provides functions requestedSubject() and requestedVerb()
    • Only returns values for requestedWeb() and requestedTopic() when topic and/or defaultweb query params are provided.

The JSON request would not use the parser at all. The web and topic would be taken from the JSON data. -- GeorgeClark - 15 Mar 2016

First of all, I'm totally for this proposal. This is how it should have been from the beginning. I would only propose a bit different approach to the implementation with a touch of Moo (aha, what else would be expected? wink ) features in mind. Same approach could be implemented within the classical Perl paradigm too.

The main demand to the Foswiki::Request object is to remain as abstract as possible. A decent user must not care of the particular request type he is dealing with unless it is really necessary for him to know.

  • Foswiki::Request is not a fully functional parser but a container class. Actual parser object is stored on Foswiki::Request::type attribute.
  • Foswiki::Request methods like xxxWeb(), xxxTopic(), xxxAttachment(), etc are referring to corresponding methods of the object in type. With Moo it's implemented using delegation; the classical approach would require wrappers.
  • Actual request parsers are doing Foswiki::Request::Parser role. The role itself provides basic parsing functionality sufficient for the view action. In classical approach parsers derive from from the basic Foswiki::Request::Parser.
  • JSON does the role too but only for the purpose of support of Moo delegation. Could stand on its own in the classical model.
  • Actual parser is determined using the SwitchBoard config key as mentioned above.

Withing this model Foswiki::Request extracts only the action/script part of the URL (<URL prefix>/<action>/<path>). Using SwitchBoard it determines the class responsible for this kind of action and creates an instance of this class in the type attribute. That's all it does. The rest is the parser's job. Delegation/method wrapping keeps the implementation details hidden from the end user while direct access using the type attribute provides full and flexible access to action-specfic functionality.

A note on filtering. Within this model a user doesn't have access to an unchecked unpurified data. If he needs it he has to ask the parser directly using a method with unsafe prefix. It would be a job for a non-lazy determined maniac to use $app->request->type->unsafeRequestedWeb instead of $app->request->web. wink

-- VadimBelman - 15 Mar 2016

George, on validation. IMHO processing should never filter out "bad" characters. What it can do is decide whether the string is "good" i.e. if it meets the requirements for a web/topic name. It it fails to meet these requirements (Foswiki::isValidWebName() fails) then it is not a valid web/topic name, and can't be made into one by filtering. So if you use a backtick in the "web" part of the URL, requestedWeb should return undef - it couldn't be determined.

-- Main.CrawfordCurrie - 16 Mar 2016 - 12:11

See also ContinueCanonicalSCRIPTURLDev. Should the Foswiki::Request also have a utility function which reverses the effect of the parser. Foswiki::Request::Parser::parse() vsl Foswiki::Request::Parser::construct(). Rather than having action based exceptions embedded in the SRIPTURL macro, SCRIPTURL would also examine the SwitchBoard for the action, and use the complementary function of the request type to generate the url for the new request.

-- GeorgeClark - 20 Mar 2016

As I noted on IRC – I like the idea of same object taking care of both parsing and forming query paths. No matter what is hidden inside – for an end user it would be $app->req.

What I don't like as a pretending to be a perfectionist wink is that Parser and construct() doesn't fit into single expression. I would prefer a Request class to have both parse() and construct() methods. They may hand over their job to some underlying modules – that's implementation details nobody would really care about. Predetermination of these two methods would also simplify the SwitchBoard entries by dumping the function key away. Another advantage is increased code readability because anybody studying some Foswiki::NewContrib::Request class would now exactly what these two methods does without diggin' around seeking for corresponding SwitchBoard entry. Well, yes, comments, TML documentation – I know they should be around. Still, they better focus on implementation details we need to know about. Yet, it would simplify access to these methods in cases when one cannot be sure about the exact kind of the request object. Referring to $app->req->parse($path) is faster than usage a mediator which would have to find a correct method before calling it.

This is my suggestion. I would very much insist on it for a simple reason that I would like to have the new Request class soon because the new Foswiki::App will not have any kind of web(), topic(), etc. methods.

PS. Hopefully I didn't make any systems mistake here. Lack of spare time doesn't give me a chance to recheck the code. Therefore I completely rely on my memory and it's not the best one in the world. smile

-- VadimBelman - 21 Mar 2016

The issue is that "construct" could be for any action, while Request parse is tailored for the specific action requested by the browser. Because the "type of request" comes from the macro in the construct process, we can't just use the current request object.

-- GeorgeClark - 22 Mar 2016

With relation to parse I would put it this way:

  • Making it public doesn't harm.
  • Even though we don't see a scenario which would need parsing a wikipath doesn't mean such scenario doesn't exists. Besides if I eventually will implement the new plugins model too extending their role beyond current limits then such scenarios would be even more likely to appear.

Yet, I see no point of making something private unless it's really-really local to a class and senseless outside of the class context. As soon as parsing is used outside of the Request anyway it's not private by definition.

And then again: it's no harm for it to be public. smile

BTW, this made me think about the parse() outcome. Obviously it shall set the Request attributes (like web, topic, for sure). But in addition to this it would useful have it return key/value pair suitable for passing to new() constructor.

-- VadimBelman - 22 Mar 2016

Okay, so parse() will be public
  • input (optional) a string to be parsed. If not provided then operates on the request object.
  • output - returns a hash - name value pairs of the information appropriate to the class. Web, Topic, Attachment ???
  • Sets the object attributes (web, topic, etc.) but ONLY when called without a string. We should not have the request object itself modified external to the request.

For naming, Foswiki::Request (base class, stub methods to be implemented by the request types.
  • Foswiki::Request::WebTopic Handles our standard actions, view, edit, ... (any request that takes /Web/Topic in the path with topic= and defaultweb= url param
  • Foswiki::Request::Attachment The attachment view requests, viewfile, xsendfile
  • Foswiki::Request::Rest Parses Subject/Verb, topic= and defaultweb= query param.
  • Foswiki::Request::JsonRPC Parses the NameSpace/Method, and extracts web, topic, etc. from the json request.

-- GeorgeClark - 22 Mar 2016

BTW, take into account that the base data structure the App is dealing with on startup is env hashref which is either a ref to %ENV or PSGI environment hash which is pretty much similar to CGI environment passed in %ENV. This hashref will be passed to the new() constructor with env key but is also accessible on $Foswiki::app->env.

I'm not sure if it is necessary to have a dedicated Foswiki::Request::WebTopic. To me as an end-user it doesn't have anything specific that is not a part of any other request type. Other three requests are only adding something to the base request type or change its behavior in some aspects. Therefore I would have Foswiki::Request as a full-fledged class, not a virtual or role. Three others would simply inherit from it.

parse() must never set object attributes. As a matter of fact it could be a static method of Foswiki::Request unless it somehow depends on the running environment besides of the requested URL. As a static method parse() might be called within BUILDARGS() method. It's return could be injected into constructor parameters hash to be used as initial values of corresponding object attributes. Sounds complicated but in fact it is as simple as:

has web => (is=>'ro',);
has topic => (is=>'ro',);

around BUILDARGS => sub {
    my $orig = shift;
    my $class = shift;
    my %params = @_;
    my %request_hash = parse($params{env}{PATH_INFO} // '') if defined $params{env};
    return $orig->($class, %request_hash, @_);
};

Here web and/or topic would be initialized using corresponding keys from parse() return unless there is/are user-defined ones. This is why %request_hash precedes @_ – to make the override possible.

Within the classic perl OO model the %request_hash could simply be inserted into the hashref argument of bless.

I'm basing this proposal on the assumption that request object is immutable; i.e. it depends solely on incoming request and doesn't change over Foswiki::App lifetime. Would for any reason we need to work with different request then a new object has to be created and fed with another init hash:

my $req2 = Foswiki::Request->new( Foswiki::Request::parse($path_str) );

-- VadimBelman - 22 Mar 2016

I just have checked the first commit to the new Request branch. It is not much to add to what we discussed on IRC. I would still vote for static parse() method. The only reason for it to be an object method is fetching of query path from the object attribute; but that could be done from outside of the method. $path parameter has to be mandatory and this is all we need from it. If we come to the moment (and perhaps we will come to it) where inheriting of the parse() would be needed then it could be either done using Moo's around keyword (works nicely with static methods – look at BUILDARGS inheriting in my branch); or using SUPER:: which does the job too.

There was also a concern raised on the chat about 4 different sources of info about the action. I could be mistaken but shouldn't there be two sources only: URL and command line? One way or another but it makes its way into %ENV from URL no matter what environment are we started with. But even if my assumption is incorrect and an important nuance has been missed it doesn't change the fact that any of these sources could be processed before any other processing is done making it not a big deal to supply the parse() with a path string.

Second note is about the error key in return hash. I think it's not needed. URL is a sensitive entity in many respect including security. Trying to guess what the user meant is a way to create problems of many kinds. Yet in it's current form the errors array is useful for human eyes only. Throwing an exception would be the way to go. In the new model that would be Foswiki::Exception::URLParse which, depending on wether the action is known and what is it, may trigger several different scenarios:

  1. Action is not known – set it to oops and send back the error page.
  2. Action is a RPC call – send back a error code.
  3. Action is REST – not sure, but so far oops looks like the most correct reaction.
  4. Otherwise set action to view, URL to the default and system message to a kind of 'Malformed URL' message – actually, what the text attribute of the exception is set to.

-- VadimBelman - 27 Mar 2016

This has been implemented in the Item14033 branch:
  • Foswiki::Request
  • Foswiki::Request::Attachment
  • Foswiki::Request::Rest (Maybe should be named REST for consistency)
  • Foswiki::Request::JSON
There are a few issues in particular with JSON that need resolution.
  • The JSON Request object was originally created to model after the Foswiki (and CGI) request objects. It duplicates some methods from Foswiki::Request, however some are in conflict:
    • param(). This is a read/write method. It is used by Engine to populate the Request object from the environment, used elsewhere to read from the Request, and used by the JSON Request to read the JSON object parameters. As long as it's truly read-only for the JSON data, we can probably coexist, and pass through any write operations to the SUPER::param(). However it might be better to implement a jsonparam() method. Another inconsistency, is that param() without arguments returns the list of URL parameters, while the JSON request separately implements params() to list the parameters.
  • method(). Returns the HTTP method (GET / POST) at the Request level, but the json method at the Request::JSON level. This is probably okay to overlap, the JSON Request constructor can verify that the correct HTTP method is being used

-- GeorgeClark - 08 May 2016
 
Topic revision: r18 - 15 Mar 2017, GeorgeClark - This page was cached on 22 Jun 2018 - 06:55.

The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy