URL canonicalization

Dan Libby danda at videntity.org
Wed Sep 14 15:32:35 PDT 2005

Michael 'hacker' Krelin wrote:

>On Wed, Sep 14, 2005 at 03:53:28PM -0600, Dan Libby wrote:
>I'm afraid you're overdoing it. You should not lowercase anything past
You're right.  I mistakenly thought the OpenID library was lowercasing.
At present, its not even lowercasing hostname.

>>"https://sally.people.com/" be treated as a separate identity from
>>"http://sally.people.com/"?    Or should the protocol be ignored?
>Clearly https://sally.people.com/ can have content different from
>http://sally.people.com/ so you can't just ignore it.
Fair enough.  Makes things simpler anyway.

Still, I wonder about things like %20 and + in the URL. They can mean
the same character when unescaped.

Or a URL like "http://people.com/users//sally", where the "//" should be
converted to a single '/' by the webserver.

So that seems like 2 points for the spec to address:
 1) lower-casing of domain names
 2) normalization of query string

Any others?

Dan Libby

