Proposal (Was: When are and aren't two URLs the same?)

Johannes Ernst at
Fri Apr 21 21:28:58 UTC 2006

On Apr 21, 2006, at 14:00, Jonathan Daugherty wrote:

> # You are right, there is no strong technical reason why it should be
> # so, so you can talk me out of it ;-) but I'd prefer it the way as
> # proposed.
> It makes me feel uncomfortable to apply a long list of transforms to
> identity URLs.  If you concede that there's no technical reason *to*
> do something, you probably shouldn't.

Is susceptibility to phishing a technical reason for you?

As Kim Cameron pointed out so memorably, we must consider the user  
part of the identity system in whatever we do. The majority of the  
elements in my list of transformations are motivated by that  

>   I'd feel much better about some
> of these if you can outline specific problems solved (i.e. why a given
> transform MUST be applied).  Doing this on a matter of preference and
> style is not a good idea, and I don't buy arguments that involve "most
> people" or "most servers".  I don't want us to open up a can of worms
> with identity canonicalization, but I'd be very glad to hear solid,
> itemized reasoning about why this stuff is necessary.  (Some of them
> feel wrong, and I don't know enough to comment on some others.)

As I said, we don't need to do all of this. Right now, the current  
consensus is something like 0 items on the list (because we haven't  
defined it). My list has 8 items. I'm fine with an agreement on any  
number preferably larger than 0 but it does not need to be 8 ;-)

Maybe, as you have done below, we can look at example URLs and see  
whether or not a particular item in my list of transforms is helpful  
or not.

> # 4. if the host component refers to a relative name, replace the host
> # component with a fully-qualified DNS name. For example, the URL
> # http://charlie/foo, used within company's intranet,
> # would be converted to
> I think you're really asking for trouble here.  I think you're
> assuming that has the same meaning regardless of
> where it's used.  Correct me if I'm wrong, but this will probably also
> involve a DNS operation, too.

Maybe this rule shouldn't actually be in the list, or maybe it needs  
to be put differently. I'm just trying to express that
does not necessarily equal
because to be able to tell, we need to know the DNS context.

Anybody have an idea how to say that better? It could be we simply  
say: DNS names in Yadis URLs must always be fully qualified.

> # 6. all components of the path must be unescaped to the maximum
> # extent possible. For example, if a URL contained %41 as a character,
> # this character needs to be replaced by its unescaped version A.
> This should be done anyway (but only once, of course).

What I'm trying to say is that I believe it is legal to use %41 in  
place of any A in any URL. Because of that, we need to say how to  
compare URLs because obviously, character-by-character does not work  
in this case.

> # 7. any spaces in the path are replaced by +.
> This means that "" will be url-unescaped and be
> transformed to "", right?  Doesn't that seem a
> little aggressive?

Well, speaking just about our code at NetMesh, we currently would  
have two entries in our Yadis cache for URLs
and chances are that if you brought those two URLs to the same  
Relying Party based on our code, they would create separate  
"accounts" in the database. I consider that a bug ... because there  
is no practical way that
could produce different web pages when entered into a browser.

What does your code do?

> -- 
>   Jonathan Daugherty
>   JanRain, Inc.

Johannes Ernst
NetMesh Inc.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: lid.gif
Type: image/gif
Size: 973 bytes
Desc: not available
Url :
-------------- next part --------------

More information about the yadis mailing list