Proposal (Was: When are and aren't two URLs the same?)
Jonathan Daugherty
cygnus at janrain.com
Fri Apr 21 21:00:11 UTC 2006
# You are right, there is no strong technical reason why it should be
# so, so you can talk me out of it ;-) but I'd prefer it the way as
# proposed.
It makes me feel uncomfortable to apply a long list of transforms to
identity URLs. If you concede that there's no technical reason *to*
do something, you probably shouldn't. I'd feel much better about some
of these if you can outline specific problems solved (i.e. why a given
transform MUST be applied). Doing this on a matter of preference and
style is not a good idea, and I don't buy arguments that involve "most
people" or "most servers". I don't want us to open up a can of worms
with identity canonicalization, but I'd be very glad to hear solid,
itemized reasoning about why this stuff is necessary. (Some of them
feel wrong, and I don't know enough to comment on some others.)
# 4. if the host component refers to a relative name, replace the host
# component with a fully-qualified DNS name. For example, the URL
# http://charlie/foo, used within company example.com's intranet,
# would be converted to http://charlie.example.com/foo.
I think you're really asking for trouble here. I think you're
assuming that charlie.example.com has the same meaning regardless of
where it's used. Correct me if I'm wrong, but this will probably also
involve a DNS operation, too.
# 6. all components of the path must be unescaped to the maximum
# extent possible. For example, if a URL contained %41 as a character,
# this character needs to be replaced by its unescaped version A.
This should be done anyway (but only once, of course).
# 7. any spaces in the path are replaced by +.
This means that "http://foo.com/a%20b" will be url-unescaped and be
transformed to "http://foo.com/a+b", right? Doesn't that seem a
little aggressive?
--
Jonathan Daugherty
JanRain, Inc.
More information about the yadis
mailing list