Server losing secrets?

Fri Jun 24 12:39:37 PDT 2005

So here's a case I ran into today, testing my sample openID server 
against livejournal's consumer (on one of the goathack systems):

1. I gave my identity URL to the consumer.
2. I had a bug in my server code, causing the transaction to fail.
3. I found and fixed the bug, and restarted the server.
4. The server was using only transient storage of secrets.
5. The server was setting the expiration for its secrets one month into 
the future.
6. The consumer still had a handle cached, that wasn't useable.
7. The server had no way to tell the consumer that the handle wasn't 
useable.

I quickly realized what was going on, and lowered the server's secret 
expiration to two minutes, and moved the other goathack system.

The problem was that it was too late on the first system.  It already 
had an association with my server, so it wasn't going to try to get a 
new one.  There was no way for my server to signal the consumer that the 
problem was an unrecognized assoc_handle, and that fetching a new 
association would solve the problem.

While the case I was working on was a bit silly (transient storage, but 
setting the expiration a month in the future?), I can see much more 
reasonable cases where the same problem will arise.  For instance, a 
server setting the expiration a month in the future using a MySQL DB to 
store secrets, and getting that table corrupted by a particularly vicous 
crash involving say...  Your colo facility losing power unexpectedly, 
and having your hardware doing write caching even though it said it 
wasn't?  (Just to pick a case a few people might be familiar with)  :)

There needs to be a way to recover from something like that in the spec. 
  Some system needs to exist where the server can tell the consumer that 
it didn't recognize the assoc_handle it received, and to get a new 
association and try again.

How should that be specified?

Carl