Item1403: Use of uninitialized value at lib/Foswiki.pm

pencil
Priority: Normal
Current State: Closed
Released In: 1.0.5
Target Release: patch
Applies To: Engine
Component:
Branches:
Reported By: JoenioCosta
Waiting For:
Last Change By: KennethLavrsen

lib/Foswiki.pm

Index: core/lib/Foswiki.pm
===================================================================
--- core/lib/Foswiki.pm   (revisão 3356)
+++ core/lib/Foswiki.pm   (cópia de trabalho)
@@ -1779,8 +1779,9 @@
         if ($cgiQuery) {
             my $agent = $cgiQuery->user_agent();
             if ($agent) {
-                $agent =~ m/([\w]+)/;
-                $extra .= ' ' . $1;
+                if($agent =~ m/([\w]+)/) {
+                   $extra .= ' ' . $1;
+                }
             }
         }
     }

-- JoenioCosta

Thanks for the report and fix.

Checked into SVN

-- KennethLavrsen - 02 Apr 2009

As I raised on IRC, for me, this only adresses half the problem there (not to mention we could use only one if, which would be me being picky).

Issue is that the user agent will be cut to its first word. I'm not sure it's by design, and even less sure it's the intended behavior, and the one a user might expect.

I would personally fix it that way:

Index: core/lib/Foswiki.pm
===================================================================
--- core/lib/Foswiki.pm   (revisão 3356)
+++ core/lib/Foswiki.pm   (cópia de trabalho)
@@ -1779,8 +1779,9 @@
         if ($cgiQuery) {
             my $agent = $cgiQuery->user_agent();
-             if ($agent) {
-                $agent =~ m/([\w]+)/;
-                $extra .= ' ' . $1;
+             if($agent && $agent =~ /^(.+)$/) {
+                $extra .= ' ' . $1; # Untaint it
             }
         }
     }

-- OlivierRaginel - 08 Apr 2009

I agree with Babar's improved fix. Joenio's fix addressed the symptom, but not the root problem.

-- CrawfordCurrie - 08 Apr 2009

Sorry, I had put a \S in the pattern above. This is just as bad as \w, as it won't match anything that has spaces, which is for me the root cause of the issue.

-- OlivierRaginel - 08 Apr 2009

The original issue was that when a client does not have any agent name - you ended up with warnings about uninitialized value.

THAT problem was correctly fixed by Joenio Costa and that is what I checked in.

If there is an additional problem then it is not yet described in this bug report.

The original code took the first full word in the agent string and appended it. The suggested patch above puts the entire string. You certainly do not want to put the entire long long loooong agent string in the log entry. The log format is designed to put a single word there.

I cannot support Oliviers patch as it will make the logs impossible to read. If someone needs the entire agent string look in the Apache log.

Another thing is: "What is the value of always seeing Mozilla?"

All browsers start with Mozilla/# where # is a version. So the value of this string is close to zero.

If we are to change the logging we should try and do something more intelligent. But I would not want to put too much CPU power into "giving the lamb a fifth leg". In fact I would not do anything unless someone comes up with a brilliant idea what to put in the column instead of Mozilla or a blank which is what you always get today.

So what is the big problem with "fixing the symptom and not the root cause?????"

My view would be to remove the agent string from this log column and reserve it only for the other messages we also show in that column. Like the email address of the person that registers.

-- KennethLavrsen - 13 Apr 2009

As alluded to by Kenneth, the change would alter the output format of the logs, and as such, even if deemed useful (which I agree it may not be), is probably more appropriate for a minor release rather than a patch release.

-- IsaacLin - 13 Apr 2009

Whatever.

-- OlivierRaginel - 13 Apr 2009

I imagine the original reason for putting the agent string in the log was to make it susceptible to analysis. Since one-word agent strings are rarely useful (most browsers pretend to be something else at some point in their devious lives) the full string is required to have any hope of decoding the real user agent. So I still support Babar's change. However since that same information is available from the server logs, I do wonder why it was ever added to this log.

-- CrawfordCurrie - 13 Apr 2009

For the record, I ended up to the same conclusion as Crawford, and as I have way less history of "our" project than you guys, I let you decide what's best.

I just think we're hiding things under the carpet.

And also, I do support Kenneth raising it, and incorporating Joenio's changes. I just didn't felt it should end there, that's all, so I wouldn't have marked this as "closed", even though, as Kenneth notes, it fixes the bug. Or maybe create a new Task stating that this user-agent string in the logs should be reviewed / enhanced / explained, ....

-- OlivierRaginel - 13 Apr 2009

The current feature uses that column for multiple things. And one is adding the first word in the agent string.

For the Foswiki log to be useful it must add something the Apache log does not do.

The Apache log is typically difficult to read by the human eye because it contains too much information. Each line is a kilometer long and all the JS and graphics and CSS files are also listed. The Foswiki log is brief, only contains the page views, and adds some extra information from the inside like revisions of topics, reprev etc. This is what makes it useful.

The Agent string is only added when the user is guest. My guess is that the one that coded it, wanted the information to quickly see what was loading the server. And the typical thing that loads a server is either a site sucking tool or a search bot. I believe many site sucking bots either hide themselves as a real browser or say they are Mozilla. But those should be blocked by the Apache config if you follow the advice from ApacheConfigGenerator.

Some of the search bots have their name as first word. msnbot as an example. Google presents itself as Mozilla but has the Googlebot later in the string.

The reason I am against showing the whole agent string is that it drowns the other use of the field. The log becomes difficult to read for a human. If the purpose of the Agent string is to show which search agent.

Maybe we should just simply make a little regex with a list of the top 10 search agent strings represented by the single word that describes it (msnbot, Googlebot ...) and let the matched word appear. Mozilla will be one and shown if nothing else matched. By defining the regex right this should work without too much overhead. I can try and experiment a little and make a proposal that makes the agent a single word but with a little more value than today.

A compromize could be to match a single word first. And if no match put the entire agent string. This will make the log format readable in 99% of the log and a little less readable where some site sucker has operated. But then this may be OK as this is then the information you are most likely looking for.

Let me play a little.

I will open a new enhancement report when I have something and keep this one in Waiting for Release.

-- KennethLavrsen - 14 Apr 2009

ItemTemplate edit

Summary Use of uninitialized value at lib/Foswiki.pm
ReportedBy JoenioCosta
Codebase
SVN Range Foswiki-1.0.0, Thu, 08 Jan 2009, build 1878
AppliesTo Engine
Component
Priority Normal
CurrentState Closed
WaitingFor
Checkins distro:c55d5a8d6b1a distro:b0584ff65c9a
TargetRelease patch
ReleasedIn 1.0.5
Topic revision: r14 - 25 Apr 2009, KennethLavrsen
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy