Proposal (Was: When are and aren't two URLs the same?)

Thomas Broyer t.broyer at gmail.com
Sat Apr 22 17:52:54 UTC 2006


2006/4/21, Johannes Ernst <jernst+lists.danga.com at netmesh.us>:
> Well, speaking just about our code at NetMesh, we currently would
> have two entries in our Yadis cache for URLs
>      http://foo.com/a%20b
> and
>      http://foo.com/a+b
> and chances are that if you brought those two URLs to the same
> Relying Party based on our code, they would create separate
> "accounts" in the database. I consider that a bug ... because there
> is no practical way that
>      http://foo.com/a%20b
> and
>      http://foo.com/a+b
> could produce different web pages when entered into a browser.

I consider that a bug, because "+" is not equivalent to a space per RFC3986 [1].

Correct me if I'm wrong, but they're only equivalent in the query part
of a URI when following application/x-www-form-urlencoded [2] style,
such as HTML forms using GET method, never in the path part of the
URI, where a "+" is always left as-is [*] or encoded as "%2B".

[*] because the "+" sign has no delimiting role in the "http" or
"https" schemes ; per RFC3986, section 2.2, §4 (which says that "If a
reserved character is found in a URI component and no delimiting role
is known for that character, then it must be interpreted as
representing the data octet corresponding to that character's encoding
in US-ASCII.") and RFC2616 (which assigns no delimiting role to "+")

[1] http://www.gbiv.com/protocols/uri/rfc/rfc3986.html
[2] http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1

--
Thomas Broyer


More information about the yadis mailing list