Few queries on atomicity of requests

Thu Jun 19 06:27:37 UTC 2008

On Jun 18, 2008, at 23:02, Daniel wrote:

> I have a couple other ideas I'll share in case anyone likes them...
>
> 1.) Let the application decide when an object is "too stale."
>
> Currently memcached is setup so an expired element is never available,
> even though it's still in memory.
>
> Perhaps another way memcached could work is to report pack, with the
> data, the age of the data.  Then the app can decide if that is too old
> or not, based on it's needs, and refresh as necessary.

	This is one of the processes that was described already.

	Adding further metadata to the cache and changing the protocol to  
return it isn't really an option at this point, but it's easy to add  
to the data in your application.

> 2.) Rather than having a dog pile, you could set a magic "I'm getting
> that" which is written to the cache on a miss. (best if it's even part
> of the original get request actually) Other processes, rather than
> jumping to the database just wait in a loop with some random timeouts
> calling memcached repeatedly until the data is available.

	You can do that today with a derived key.

> Now, for the big question I've been sitting on for months...
>
> Has anyone worked out a system using memcached that guarantee's
> memcached has the most recent data? Essentially, tying memcached to  
> the
> database so memcached is never allowed to contain any stale data?

	Yes, many people have done things like this.

	Well, guarantee is actually a bit of a difficult thing to say because  
it's not like you get 2pc or anything, but I used to have an  
application that would push cache updates through as part of DB  
updates.  I'd actually push the cache updates through *before* the DB  
writes because the DB writes were async and conflicts were resolvable.

	If not that, then you can at least have a post-transaction cache  
replacement (cache_fu in rails supports this out of the box, and I've  
build similar things for java a few times).

> I've looked at some soutions involving a timestamp with every  
> record, a
> revision code, database row locking, etc.  I think I've determined  
> that
> it can be made to work work with the data itself, by disabiling  
> caching
> when multiple writes are being processed, however I was hoping to find
> out if anyone's actually made it work.
>>

	Why would you disable caching just because something's writing?   
There's always a last write.

	Having a version column (I used to call it a ``write token'' or  
something like that) ensures that you are writing against the correct  
data.

	One option to ensure correctness to always read from the DB when  
doing a write and do a three way merge between the state you were  
originally in, the state you were trying to push and the state of the  
records in the DB currently.  It all depends on what you're doing.

> From what I understand, a system like this can only work if every
> application that accesses data does it part, but I haven't seen any
> proven examples, and it seems to be a highly complex interface that
> would require some real amazing programming magic.


	I built a lock server I do similar things with.  You can create cross  
machine locks to mutually exclude operations that need to be  
serialized across multiple systems (e.g. I use it for async jobs that  
perform search index update and propagation).  It's not meant to be  
hugely fast, so I wouldn't do it for every single row, but I haven't  
found a need for such a thing yet.

	Most of this is just tired ramblings of someone guessing at  
requirements, though.  Once there are particular constraints for an  
application, the mechanism to ensure correctness becomes more clear.

-- 
Dustin Sallings