Proposal (Was: When are and aren't two URLs the same?)

Martin Atkins mart at degeneration.co.uk
Tue Apr 25 07:33:12 UTC 2006


Johannes Ernst wrote:
> On Apr 22, 2006, at 18:41, Martin Atkins wrote:
> 
>> * Do the canonicalizations from RFC2616 in order that the URL is
>> actually retrievable.
> 
> Which canonicalizations are you referring to? Section ...? I can only 
> find things that don't seem to deal with URLs.

To be honest, I don't know. Someone posted a chunk of it in another part
of this thread and I just snagged the RFC number. I didn't even know
what that RFC was until a few moments ago. :)

>> * Do a character-by-character string comparison for equality.
> 
> This is not currently implemented practice, however. The particular 
> problem the (re-)triggered my thinking on this was that certain  relying
> party code happened to ask for URL http://mylid.net:80/jernst  (by
> specifying the port 80 in the HTTP Host header) when I had  entered
> http://mylid.net/jernst at the relying party. Our code did  exactly what
> you suggest -- character-by-character string comparison  -- and said "no
> identity known by this name".
> 
> Now the argument can be made that this trivial transformation -- 
> dropping redundant port 80 -- should be made by the identity host,  but
> unless we all agree what are and aren't allowed, encouraged, .. 
> transformations, that way lies interoperability hell. Note that just 
> delegating the responsibility to the identity host did not work in  this
> case, because the relying party did not pass through the URL  character
> by character, considering it to be perfectly within its  limits to
> insert port 80 (and not unreasonably so).

I don't disagree that we need to say something about this in the spec.
My main concern was against going overboard with extra rules that don't
exist anywhere else. The rules that already exist for HTTP are what I
was attepting to ape, though clearly I missed one in this respect.

I would expect — obviously incorrectly — an HTTP server to automatically
strip off that :80 from the Host: header, since it's quite clearly
redundant. Was it your CGI script that was failing, or was your web
server (Apache?) simply failing to match the right VirtualHost? I'd be
concerned if the latter is the case. If the former is the case, then the
blame is on CGI for not handling this kind of thing for you, but really
that'd be up to your CGI library of choice.

So yes, acting the same when :80 is present to when no port is specified
is another rule that we need to write down. The no-port case should be
the canonical form, of course.



More information about the yadis mailing list