Few queries on atomicity of requests
dustin at spy.net
Thu Jun 19 06:27:37 UTC 2008
On Jun 18, 2008, at 23:02, Daniel wrote:
> I have a couple other ideas I'll share in case anyone likes them...
> 1.) Let the application decide when an object is "too stale."
> Currently memcached is setup so an expired element is never available,
> even though it's still in memory.
> Perhaps another way memcached could work is to report pack, with the
> data, the age of the data. Then the app can decide if that is too old
> or not, based on it's needs, and refresh as necessary.
This is one of the processes that was described already.
Adding further metadata to the cache and changing the protocol to
return it isn't really an option at this point, but it's easy to add
to the data in your application.
> 2.) Rather than having a dog pile, you could set a magic "I'm getting
> that" which is written to the cache on a miss. (best if it's even part
> of the original get request actually) Other processes, rather than
> jumping to the database just wait in a loop with some random timeouts
> calling memcached repeatedly until the data is available.
You can do that today with a derived key.
> Now, for the big question I've been sitting on for months...
> Has anyone worked out a system using memcached that guarantee's
> memcached has the most recent data? Essentially, tying memcached to
> database so memcached is never allowed to contain any stale data?
Yes, many people have done things like this.
Well, guarantee is actually a bit of a difficult thing to say because
it's not like you get 2pc or anything, but I used to have an
application that would push cache updates through as part of DB
updates. I'd actually push the cache updates through *before* the DB
writes because the DB writes were async and conflicts were resolvable.
If not that, then you can at least have a post-transaction cache
replacement (cache_fu in rails supports this out of the box, and I've
build similar things for java a few times).
> I've looked at some soutions involving a timestamp with every
> record, a
> revision code, database row locking, etc. I think I've determined
> it can be made to work work with the data itself, by disabiling
> when multiple writes are being processed, however I was hoping to find
> out if anyone's actually made it work.
Why would you disable caching just because something's writing?
There's always a last write.
Having a version column (I used to call it a ``write token'' or
something like that) ensures that you are writing against the correct
One option to ensure correctness to always read from the DB when
doing a write and do a three way merge between the state you were
originally in, the state you were trying to push and the state of the
records in the DB currently. It all depends on what you're doing.
> From what I understand, a system like this can only work if every
> application that accesses data does it part, but I haven't seen any
> proven examples, and it seems to be a highly complex interface that
> would require some real amazing programming magic.
I built a lock server I do similar things with. You can create cross
machine locks to mutually exclude operations that need to be
serialized across multiple systems (e.g. I use it for async jobs that
perform search index update and propagation). It's not meant to be
hugely fast, so I wouldn't do it for every single row, but I haven't
found a need for such a thing yet.
Most of this is just tired ramblings of someone guessing at
requirements, though. Once there are particular constraints for an
application, the mechanism to ensure correctness becomes more clear.
More information about the memcached