/ Zope / Apsis / Pound Mailing List / Archive / 2003 / 2003-10 / Pound-1.5: incorrect of URL matching

[ << ] [ >> ]

[ OWA question / archive list? / "Steven ... ] [ HA with Zope / Christian Klinger ... ]

Pound-1.5: incorrect of URL matching
"Yoshinori TAKESAKO" <y_takesako(at)dreamarts.co.jp>
2003-10-21 15:13:44 [ FULL ]
Hi, 

There are some problems in Pound-1.5.

When the Pound upgraded to 1.5 from 1.4, 
gloval direcvites of URL matching such as 'CSsegment', 
'CSparameter', 'CSqid', 'CSqval' and 'CSfragment' was added.

If you access the following URL via Pound-1.5, an error occurs.

  e.g.
  http://www.example.com/cgi-bin/printenv.cgi?
  http://www.example.com/cgi-bin/printenv.cgi?a=1&

  It returns the following error message.

    "501 Not Implemented"
    This method may not be used. 

I think that it is the bug of check_URL() at "http.c".
So I made a patch and test program, put it on my site.

  http://namazu.org/~takesako/pound/errata.html

Please check and try it.

Best regards,

--
   Namazu Project - Search engine software of Japanese language.
     Yoshinori TAKESAKO <y_takesako(at)dreamarts.co.jp>

Re: Pound-1.5: incorrect of URL matching
Robert Segall <roseg(at)apsis.ch>
2003-10-21 17:11:04 [ FULL ]
On Tuesday 21 October 2003 15:13, Yoshinori TAKESAKO wrote:[...]

Sorry, but this is not a bug - at most a problem and possibly a feature. The 
URLs you give as examples are not legal according to the RFCs (see for 
example RFC2616), thus Pound correctly rejects them.

I would welcome additional opinions on the desirability of URL checking - I 
have had other messages about people experiencing difficulties with them.[...]

Re: Pound-1.5: incorrect of URL matching
"Yoshinori TAKESAKO" <y_takesako(at)dreamarts.co.jp>
2003-10-21 18:18:02 [ FULL ]
Thank you for replying so quickly.

At Tue, 21 Oct 2003 17:11:04 +0200
 Robert Segall <roseg(at)apsis.ch> wrote:[...]

I am sorry if you had any bad feelings. 

At my understanding, RFC2396 is the most trusted definition 
of URI Generic Syntax.

-----------------------------------------------------------------------

A. Collected BNF for URI

      URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
      absoluteURI   = scheme ":" ( hier_part | opaque_part )
      relativeURI   = ( net_path | abs_path | rel_path ) [ "?" query ]

      hier_part     = ( net_path | abs_path ) [ "?" query ]
      opaque_part   = uric_no_slash *uric

      uric_no_slash = unreserved | escaped | ";" | "?" | ":" | "(at)" |
                      "&" | "=" | "+" | "$" | ","

      net_path      = "//" authority [ abs_path ]
      abs_path      = "/"  path_segments
      rel_path      = rel_segment [ abs_path ]

      rel_segment   = 1*( unreserved | escaped |
                          ";" | "(at)" | "&" | "=" | "+" | "$" | "," )

      scheme        = alpha *( alpha | digit | "+" | "-" | "." )

      authority     = server | reg_name

      reg_name      = 1*( unreserved | escaped | "$" | "," |
                          ";" | ":" | "(at)" | "&" | "=" | "+" )

      server        = [ [ userinfo "(at)" ] hostport ]
      userinfo      = *( unreserved | escaped |
                         ";" | ":" | "&" | "=" | "+" | "$" | "," )

      hostport      = host [ ":" port ]
      host          = hostname | IPv4address
      hostname      = *( domainlabel "." ) toplabel [ "." ]
      domainlabel   = alphanum | alphanum *( alphanum | "-" ) alphanum
      toplabel      = alpha | alpha *( alphanum | "-" ) alphanum
      IPv4address   = 1*digit "." 1*digit "." 1*digit "." 1*digit
      port          = *digit

      path          = [ abs_path | opaque_part ]
      path_segments = segment *( "/" segment )
      segment       = *pchar *( ";" param )
      param         = *pchar
      pchar         = unreserved | escaped |
                      ":" | "(at)" | "&" | "=" | "+" | "$" | ","

      query         = *uric

      fragment      = *uric

      uric          = reserved | unreserved | escaped
      reserved      = ";" | "/" | "?" | ":" | "(at)" | "&" | "=" | "+" |
                      "$" | ","
      unreserved    = alphanum | mark
      mark          = "-" | "_" | "." | "!" | "~" | "*" | "'" |
                      "(" | ")"

      escaped       = "%" hex hex
      hex           = digit | "A" | "B" | "C" | "D" | "E" | "F" |
                              "a" | "b" | "c" | "d" | "e" | "f"

      alphanum      = alpha | digit
      alpha         = lowalpha | upalpha

      lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" |
                 "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" |
                 "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
      upalpha  = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
                 "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
                 "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
      digit    = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
                 "8" | "9"

-----------------------------------------------------------------------

According to this definition, 'query' or 'fragment' can be 
empty character sequence also OK. Is possibly my idea wrong?

--
  Yoshinori TAKESAKO <y_takesako(at)dreamarts.co.jp>

Re: Pound-1.5: incorrect of URL matching
Robert Segall <roseg(at)apsis.ch>
2003-10-22 10:23:00 [ FULL ]
On Tuesday 21 October 2003 18:18, Yoshinori TAKESAKO wrote:[...]

I assure you: no bad feelings at all. This list is for discussing Pound and 
that is what we are doing. You are welcome to raise any issue you want, and 
we welcome bug reports - it is the only way Pound will improve.
[...]

I re-read 2396 and indeed they specify <n>* to mean "n or more
repetitions of 
the following entity, with n defaulting to 0". Under this reading your 
interpretation is correct and we'll fix this issue in the near future - 
announcement to be made on the list.[...]

Re: Pound-1.5: incorrect of URL matching
"Yoshinori TAKESAKO" <y_takesako(at)dreamarts.co.jp>
2003-10-22 13:13:47 [ FULL ]
At Wed, 22 Oct 2003 10:23:00 +0200
 Robert Segall <roseg(at)apsis.ch> wrote:[...]

Thanks for your reply. I'm glad. 
[...]

Thank you very much in advance for your kind cooperation.

For your information, here is my simple patch.
http://namazu.org/~takesako/pound/errata.html

And, I have a suggestion.

- I'd like to enable or disable the URL matching. -

Because, 
Pound-1.5 attempts to filter out illegal request URLs.
But some web browsers and servers may be exchanging illegal 
request URLs, irrespective of not according to the RFCs.

It means that it may become impossible to treat the URLs
by Pound, it was able to be exchanged between the front 
end browser and the back end server in the past.

We can say that it is Pound's feature, but some people 
say that it is a problem of Pound.

So, Pound should disable the URL matching. It makes a 
little quick speed up, but security becomes weak.
I think that this is the matter which Pound's user 
can choose one. (enable or disable the URL matching)

I think, simple is the best.

--
  Namazu Project - Search engine software in Japanese.
    Yoshinori TAKESAKO <y_takesako(at)dreamarts.co.jp>

MailBoxer