Proposal (Was: When are and aren't two URLs the same?)
Martin Atkins
mart at degeneration.co.uk
Tue Apr 25 07:33:12 UTC 2006
Johannes Ernst wrote:
> On Apr 22, 2006, at 18:41, Martin Atkins wrote:
>
>> * Do the canonicalizations from RFC2616 in order that the URL is
>> actually retrievable.
>
> Which canonicalizations are you referring to? Section ...? I can only
> find things that don't seem to deal with URLs.
To be honest, I don't know. Someone posted a chunk of it in another part
of this thread and I just snagged the RFC number. I didn't even know
what that RFC was until a few moments ago. :)
>> * Do a character-by-character string comparison for equality.
>
> This is not currently implemented practice, however. The particular
> problem the (re-)triggered my thinking on this was that certain relying
> party code happened to ask for URL http://mylid.net:80/jernst (by
> specifying the port 80 in the HTTP Host header) when I had entered
> http://mylid.net/jernst at the relying party. Our code did exactly what
> you suggest -- character-by-character string comparison -- and said "no
> identity known by this name".
>
> Now the argument can be made that this trivial transformation --
> dropping redundant port 80 -- should be made by the identity host, but
> unless we all agree what are and aren't allowed, encouraged, ..
> transformations, that way lies interoperability hell. Note that just
> delegating the responsibility to the identity host did not work in this
> case, because the relying party did not pass through the URL character
> by character, considering it to be perfectly within its limits to
> insert port 80 (and not unreasonably so).
I don't disagree that we need to say something about this in the spec.
My main concern was against going overboard with extra rules that don't
exist anywhere else. The rules that already exist for HTTP are what I
was attepting to ape, though clearly I missed one in this respect.
I would expect — obviously incorrectly — an HTTP server to automatically
strip off that :80 from the Host: header, since it's quite clearly
redundant. Was it your CGI script that was failing, or was your web
server (Apache?) simply failing to match the right VirtualHost? I'd be
concerned if the latter is the case. If the former is the case, then the
blame is on CGI for not handling this kind of thing for you, but really
that'd be up to your CGI library of choice.
So yes, acting the same when :80 is present to when no port is specified
is another rule that we need to write down. The no-port case should be
the canonical form, of course.
More information about the yadis
mailing list