Proposal (Was: When are and aren't two URLs the same?)

Keith Howe nezroy at gmail.com
Fri Apr 21 21:57:26 UTC 2006


On 4/21/06, Johannes Ernst <jernst+lists.danga.com at netmesh.us> wrote:
> Maybe this rule shouldn't actually be in the list, or maybe it needs
> to be put differently. I'm just trying to express that
>     http://charlie/foo
> does not necessarily equal
>     http://charlie/foo
> because to be able to tell, we need to know the DNS context.
>
> Anybody have an idea how to say that better? It could be we simply
> say: DNS names in Yadis URLs must always be fully qualified.

Perhaps the algorithm itself should merely be an example
implementation of a more formal rule-set. Then the purpose and intent
of each transformation can be made clear in the rule itself. Something
like this (mapping a rule to each of the steps in the algorithm):

1. Yadis identity URLs must be fully qualified.
Reasoning: prevents ambiguity when using relative domains; the
concerned parties must come to a common agreement to resolve their
identities to qualified names in these cases. Also keeps identity URLs
properly unique.

2. Internationalized URLs (IRIs) are equivalent to their URI form.
Reasoning: preserves the semantic meaning of an identity URL.

3. Secure and insecure versions of the same protocol in an identity
URL are considered identical.
Reasoning: many sites use secure and non-secure URLs interchangeably.

4. An implicit default port is identical to an explicit default port.
Reasoning: it is not always in the user's control to decide where and
when explicit default port mappings may be applied to or stripped from
URLs. See also [i].

5. The host component is case-insensitive.
Reasoning: See [i].

6&7. An escaped character is equivalent to its unescaped counterpart.
Reasoning: See [i].

I couldn't really come up with a suitable rule for 8, and I'm not sure
it's necessary anyway, but feel free to adlib here :)

[i] I didn't notice this before when I was looking at RFC2616, but
they have a section specifically for comparing URIs. Granted, Yadis is
not obliged to follow these rules at all, but I also feel it would be
beneficial not to deviate from these rules either, as they probably
represent "expected" behavior in the minds of many users and
implementors alike.

Quoted from http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.2.3 --

3.2.3 URI Comparison
When comparing two URIs to decide if they match or not, a client
SHOULD use a case-sensitive octet-by-octet comparison of the entire
URIs, with these exceptions:

      - A port that is empty or not given is equivalent to the default
        port for that URI-reference;
        - Comparisons of host names MUST be case-insensitive;
        - Comparisons of scheme names MUST be case-insensitive;
        - An empty abs_path is equivalent to an abs_path of "/".
Characters other than those in the "reserved" and "unsafe" sets (see
RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding.

For example, the following three URIs are equivalent:

      http://abc.com:80/~smith/home.html
      http://ABC.com/%7Esmith/home.html
      http://ABC.com:/%7esmith/home.html


More information about the yadis mailing list