Proposal (Was: When are and aren't two URLs the same?)

Keith Howe nezroy at
Fri Apr 21 21:57:26 UTC 2006

On 4/21/06, Johannes Ernst < at> wrote:
> Maybe this rule shouldn't actually be in the list, or maybe it needs
> to be put differently. I'm just trying to express that
>     http://charlie/foo
> does not necessarily equal
>     http://charlie/foo
> because to be able to tell, we need to know the DNS context.
> Anybody have an idea how to say that better? It could be we simply
> say: DNS names in Yadis URLs must always be fully qualified.

Perhaps the algorithm itself should merely be an example
implementation of a more formal rule-set. Then the purpose and intent
of each transformation can be made clear in the rule itself. Something
like this (mapping a rule to each of the steps in the algorithm):

1. Yadis identity URLs must be fully qualified.
Reasoning: prevents ambiguity when using relative domains; the
concerned parties must come to a common agreement to resolve their
identities to qualified names in these cases. Also keeps identity URLs
properly unique.

2. Internationalized URLs (IRIs) are equivalent to their URI form.
Reasoning: preserves the semantic meaning of an identity URL.

3. Secure and insecure versions of the same protocol in an identity
URL are considered identical.
Reasoning: many sites use secure and non-secure URLs interchangeably.

4. An implicit default port is identical to an explicit default port.
Reasoning: it is not always in the user's control to decide where and
when explicit default port mappings may be applied to or stripped from
URLs. See also [i].

5. The host component is case-insensitive.
Reasoning: See [i].

6&7. An escaped character is equivalent to its unescaped counterpart.
Reasoning: See [i].

I couldn't really come up with a suitable rule for 8, and I'm not sure
it's necessary anyway, but feel free to adlib here :)

[i] I didn't notice this before when I was looking at RFC2616, but
they have a section specifically for comparing URIs. Granted, Yadis is
not obliged to follow these rules at all, but I also feel it would be
beneficial not to deviate from these rules either, as they probably
represent "expected" behavior in the minds of many users and
implementors alike.

Quoted from --

3.2.3 URI Comparison
When comparing two URIs to decide if they match or not, a client
SHOULD use a case-sensitive octet-by-octet comparison of the entire
URIs, with these exceptions:

      - A port that is empty or not given is equivalent to the default
        port for that URI-reference;
        - Comparisons of host names MUST be case-insensitive;
        - Comparisons of scheme names MUST be case-insensitive;
        - An empty abs_path is equivalent to an abs_path of "/".
Characters other than those in the "reserved" and "unsafe" sets (see
RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding.

For example, the following three URIs are equivalent:

More information about the yadis mailing list