Proposal (Was: When are and aren't two URLs the same?)
nezroy at gmail.com
Fri Apr 21 21:57:26 UTC 2006
On 4/21/06, Johannes Ernst <jernst+lists.danga.com at netmesh.us> wrote:
> Maybe this rule shouldn't actually be in the list, or maybe it needs
> to be put differently. I'm just trying to express that
> does not necessarily equal
> because to be able to tell, we need to know the DNS context.
> Anybody have an idea how to say that better? It could be we simply
> say: DNS names in Yadis URLs must always be fully qualified.
Perhaps the algorithm itself should merely be an example
implementation of a more formal rule-set. Then the purpose and intent
of each transformation can be made clear in the rule itself. Something
like this (mapping a rule to each of the steps in the algorithm):
1. Yadis identity URLs must be fully qualified.
Reasoning: prevents ambiguity when using relative domains; the
concerned parties must come to a common agreement to resolve their
identities to qualified names in these cases. Also keeps identity URLs
2. Internationalized URLs (IRIs) are equivalent to their URI form.
Reasoning: preserves the semantic meaning of an identity URL.
3. Secure and insecure versions of the same protocol in an identity
URL are considered identical.
Reasoning: many sites use secure and non-secure URLs interchangeably.
4. An implicit default port is identical to an explicit default port.
Reasoning: it is not always in the user's control to decide where and
when explicit default port mappings may be applied to or stripped from
URLs. See also [i].
5. The host component is case-insensitive.
Reasoning: See [i].
6&7. An escaped character is equivalent to its unescaped counterpart.
Reasoning: See [i].
I couldn't really come up with a suitable rule for 8, and I'm not sure
it's necessary anyway, but feel free to adlib here :)
[i] I didn't notice this before when I was looking at RFC2616, but
they have a section specifically for comparing URIs. Granted, Yadis is
not obliged to follow these rules at all, but I also feel it would be
beneficial not to deviate from these rules either, as they probably
represent "expected" behavior in the minds of many users and
Quoted from http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.2.3 --
3.2.3 URI Comparison
When comparing two URIs to decide if they match or not, a client
SHOULD use a case-sensitive octet-by-octet comparison of the entire
URIs, with these exceptions:
- A port that is empty or not given is equivalent to the default
port for that URI-reference;
- Comparisons of host names MUST be case-insensitive;
- Comparisons of scheme names MUST be case-insensitive;
- An empty abs_path is equivalent to an abs_path of "/".
Characters other than those in the "reserved" and "unsafe" sets (see
RFC 2396 ) are equivalent to their ""%" HEX HEX" encoding.
For example, the following three URIs are equivalent:
More information about the yadis