proposal for capabilities lookup

Sat Nov 19 10:13:14 PST 2005

Ernst Johannes <jernst+lists.danga.com <at> netmesh.us> writes:

> 
> I'm not entirely following what you are proposing. Comments in-line.
> 
> > I think two goals are most important for adoption:
> > * Those using their own site as an identity URL must be able to add
> > YADIS support without writing any code or greatly disturbing their
> > existing site. (The HTML indirection case.) This implies a regular  
> > GET,
> > as you mentioned.
> 
> So we agree on this one.
> 
> > * In *that same request* consumers must somehow indicate that it's
> > really capabilities that they are interested in, so that larger  
> > identity
> > hosts like LiveJournal can remove the layer of indirection and just
> > return the capabilities directly, without having to generate a rather
> > costly journal page that's just going to be discarded. This is the
> > purpose I was bending "Accept" for.
> 
> That's why I was talking about the "local convention". Or do you mean  
> a global convention?
> 
> The idea of returning a new URL from which the capabilities can be  
> retrieved is modeled directly after XML: XML files also typically  
> specify their schema through a separate URL, rather than in-lining it  
> (which they could, but which is undesirable for a variety of reasons).
> 
> This compromise is usually acceptable because DTDs change very  
> slowly, while XML files change relatively quickly: which is why XML  
> processors all have this funny local cache of retrieved DTDs.
> 
> > It seems to me that the path of least disruption is just to use a  
> > custom
> > request header in place of Accept in my proposal. However, this has  
> > the
> > drawback that it's nonstandard and thus proxy servers are less  
> > likely to
> > be able to cope with its presence causing a different response,  
> > even if
> > it appears in the "Vary" header field.
> 
> Note that my proposal does not require such a custom request header,  
> it only uses a non-standard response header.
> 
> > I seem to remember that a very similar conclusion was reached when we
> > were discussing this for OpenID, which is why OpenID only supports the
> > HTML indirection case despite its inherent inefficiency.
> 
> The OpenID case is a little different because the URL points to the  
> OpenID identity server, rather than a capabilities lookup (which does  
> not exist in OpenID). On another level, of course there is a parallel  

A couple comments, Johannes, and not necessarily aimed at you, but to the 
board here in general.

1. As someone working on plans to support YADIS/OpenID/LID in some of our core 
infrastructure - ping servers, trackback engines, reputation systems, 
tagging/metadata frameworks, etc. - the concern here for us and service 
providers like us isn't *bandwidth* per se. CPU cycles are important to 
conserve, but the down side of multiple fetches is really tied to *time* 
delays, rather than the bandwidth an additional step would consume. Not saying 
bandwidth shouldn't be a concern -- hold on, yes I am. Bandwidth is something 
we should happily trade off for better adoption and flexibility. 

Identity systems do not fail because they are bandwidth hogs.

2. One of the profound advantages of OpenID is what I've come to call 
the "ISO" -- "in spite of" -- factor. OpenID has a high "in spite of" 
advantage; users can boot strap their own URLs with little or no help from the 
service provider of the actual URL. As long as I have control over the HTML 
that is being served, I don't need anything from my service provider to play, 
just a way to edit the <head> section of my HTML page.

This is such a small, humble-looking feature, that it's *very* easy to 
overlook or underweight its importance. 

3. Johannes and I have discussed this, but I think it's worth pointing out. A 
CGI call is impossible (with standard configurations) to serve with a static 
file. If my request looks like this:

 http://idserver.net/mgraves?meta=capabilities

I'm stuck if I don't have control over my Apache environment. Even then, 
there's a mod that must be bolted on to Apache to support the semantics of the 
CGI.  Now if my request is:

http://idserver.net/mgraves/yadis.xml

I've now got the ability as a user to "bootstrap" myself, by starting up 
notepad, and configuring the file as needed, and saving it to 
http://idserver.net/mgraves/

The interesting thing here is that orienting things toward static file 
semantics ( GET "yadis.xml" vs. CGI call) does *not* hinder the service 
provider from serving static file requests with dynamic responses. In other 
words, this URL:

http://idserver.net/mgraves/yadis.xml

is *ostensibly* a static file on the server, but with straightforward changes 
to the Apache configuration, this same URL can be served from a database or 
other resource, rather than just feeding back a static file. "yadis.xml" may 
not exist as a discrete file on any user's home directory for a given service 
provider. It may just be synthesized on demand and spit back as the 
appropriate file by Apache.

The bottom line here is that URL semantics are not symmetric. Orienting a 
specification around CGI semantics generally *precludes* users from 
bootstrapping themselves without the help, knowledge, or permission of the 
host/webserver. Orienting a specification around static file semantics 
(Get "yadis.xml") does *not* preclude service providers from automating and 
virtualizing the serving of "yadis.xml" or any other "static" request.

It is for this reason that I have urged that wherever possible, static file 
semantics be embraced in HTTP requests, as it lets end users work things out 
for themselves independently when needed -- a huge advantage in gaining 
adoption for the spec.

(I realize this can't be extended completely. You can ask for a complete VCard 
for example "http://idserver.net/mgraves/xvcard.xml", and that can be served 
with a user-edited static file, but that's the simplest case. In the real 
world, the "view" of a VCard is predicated on who's asking. When you need to 
know the identity/credential of the asking party to determine the proper 
response, a simple static file name won't suffice as a request. So the 
question isn't whether parameterized URLs will be necessary -- they will -- 
but whether they will be necessary even for a core minimum functionality. I'm 
currently thinking that a minimal setup would *need* parameterized URLs, which 
would enable the adoption-friendly "notepad principle" to help this thing 
spread.)

4. There's some interesting suggestions here about the "ideal" approach to 
this problem -- HTTP OPTIONS, etc. Anything that is going to require 
extensions to conventional behavior of existing web servers is kryponite for 
adoption, however. There are lots of priorities represented on this list, but 
for my part, and for the direction I'd like to see the infrastructure evolve, 
the priority is just one thing: adoption.  Anything that slows adoption 
(changes to Apache) should be included grudgingly if at all. "The great is the 
enemy of the good" comes to mind here.

5. URL squatting. While Microsoft did its typical blunder in the "Favicon" 
introduction (unregistrered MIME types, etc.), I don't think that the Wiki is 
correct in saying that favicon.ico is an example of what the W3 folks complain 
about as "URI squatting". Also, I don't think I'd even agree that it works 
against the architecture principles of the Web. If it does, then the 
conventional use of "index.html" does too, and it seems we've learned to live 
with that. :-)

If you're OK with "index.html" being a *convention* for serving HTTP requests 
on a directory, then I can't see any problem with "yadis.xml" becoming an 
analogous convention to "index.html".  Again, this is straightforward advocacy 
of "convention over configuration", but I think that in this case, it's 
crucial for minimizing the amount of accomodation by users and hosts to 
achieve maximum adoption in the shortest possible timeframe.

6. Something to consider: Maybe Brad's right and the HTML <head> section is 
all that should ever be changed to configure a URL for being used as an 
OpenID/YADIS-enabled URL. What if we changed the architecture from one that 
relied on the identity URL to answer a bunch of questions, and simply asked 
the identity URL to do one thing: tell us your authoritative identity server.

If we did this, a given URL would only point to the identity server, and 
rather than querying the identity URL for its capabilities, we would simply 
ask the identity server for anything else we needed. (I know Brad will 
probably read this and think "Duh! This is what I've been saying all 
along!").  Instead of asking for this:

http://superhost.com/mgraves/yadis.xml

The identity consumer would determine the delegated ID server for the URL (a 
la OpenID), and ask that server: "What are the capabilities for 
http://superhost.com/mgraves ?"  

Once that indirection is achieved, the game is completely changed in terms of 
the web server semantics. If we simply scrape some HTML to determine the 
delegated ID server for the URL, and carry out everything else with the ID 
server, we have a) minimal (maybe zero) impact on the hosting provider for the 
Identity URL, and maximum flexibility in terms of features for the ID server.

The ID server can be written anyway that's necessary. It's new machinery, and 
has to be developed and deployed somehow anyway. The Host server can't be 
required to change, or adoption will be severely constrained.  Right now I'm 
thinking if we just bow to Brad's (and David Recordon's) original instincts, 
and make the Identity URL simply a lightweight pointer to a powerful Identity 
server (based on open standards defined here), then we escape a lot of the 
boxes were are trapped in in this discussion currently.

Sorry for such a long rambling post. I've been reading along for a while, but 
just don't have time to chip in here for the most part.

-Michael Graves