URL canonicalization

Mark Rafn dagon at dagon.net
Wed Sep 14 16:18:09 PDT 2005


On Wed, 14 Sep 2005, Dan Libby wrote:

> Hi, in my database, I need to uniquely keep track of visitors that are
> logging in via remote OpenID servers.  The best key available is their
> identity url.  But that leaves me with a question about how exactly to
> canonicalize it, that the spec does not clearly address.

> "Note that the user can leave off "http://" and the trailing "/". A
> consumer must canonicalize the URL, following redirects and noting the
> final URL. The final, canonicalized URL is the user's identity URL."

The spec is unclear on what you should consider to be an identity.  By 
design, I suspect.  There can be multiple claimed identites that all map 
to a single canonical identity URL, and multiple canonical identity URLs 
that have the same delegate identity URL.  Only the delegate identity URL 
is actually confirmed by the server.

So, do you choose to treat dagon.net and dagon.net/index.html (which 
canonicalize to different URLs, but have the same content and delegate to 
the same URL) be the same or different?  They're the same delegate URL 
now, but could be different later, or one or both could stop delegating 
and authenticate as different canonical URLs that happen to have the same 
content.

I'd argue that you should treat different claimed identity strings as 
unique individuals, even if they currently have the same delegate, or 
canonicalize to the same identity URL.

> Okay, so case-insensitivity is fairly obvious. I'm already lower-casing
> everything.

You WILL fail if you do that.  You can pick whatever case you like for a 
domain name, but the path component of a URL is case-sensitive.

> But what about http vs https?    For example, should
> "https://sally.people.com/" be treated as a separate identity from
> "http://sally.people.com/"?    Or should the protocol be ignored?

They can someday have different content, even if they don't today, or 
even if one redirects to the other today.  You'd best treat them as 
different.

> I suppose the issue can be broadened to: the spec is a bit vague about
> canonicalization of identity URLs.  Can we get clarification?

The spec is clear about canonicalization of URLs - you must add 
http:// and trailing / if needed, and you must follow redirects.

It is NOT clear whether the claimed identity, the canonical identity URL, 
or the delegate identity URL should be considered by consumers to be the 
unique individual.  I'd argue for claimed identity, but others may 
disagree.
--
Mark Rafn    dagon at dagon.net    <http://www.dagon.net/>


More information about the yadis mailing list