Atomic replace and append operations
timo at tzc.com
Fri Mar 18 12:48:12 PST 2005
I'll third that one. I've suggested it before as well.
Brad answered my question a while ago with how he grabs the list of
comments on livejournal. He answered that he has a list of comment ids
in a memcache entry. He then grabs the individual ones one at a time
based on the contents of the first. When a new comment is added, it gets
added to the list of comment ids. I asked how to avoid the race
condition, but never got an answer to that. An append operation would
Gregory Block wrote:
> I'd second these additions - I've been waiting with a similar kind of
> need for some time, and would be very happy to see this added.
> On 18 Mar 2005, at 13:55, Richard Cameron wrote:
>> I'm interested in implementing two extra storage commands "areplace"
>> and "append" for memcached. The former would behave exactly like
>> "replace", but would return the object's value before it performs the
>> replace operation; the latter would simply string append the new
>> value onto the old one.
>> The motivation is not purely to add extra bloat to memcached but
>> rather to provide the minimal extra operations I'd need to be able to
>> keep track of dependency trees in the cached data.
>> For instance, if I want to cache objects A and B (the results of two
>> SQL query, say) which depends on some "thing" in the database (like a
>> particular user U's account), I'd like to be able keep track of my
>> dependencies like this:
>> U -> A,B
>> Memcached itself seems like a good choice to store this list. So,
>> when user U does something to his account, I'll want to delete the
>> newly defunct objects from the cache. I think the best I can do at
>> the moment is:
>> my_deps := [get U]
>> foreach dep $my_deps:
>> [delete $dep]
>> [set U ""]
>> Also, if I perform another cacheable operation C which depends on U,
>> I'd like to be able to add this dependency to my list. The existing
>> protocol lets me say:
>> old_deps := "get U"
>> new_deps := append(old_deps, C)
>> [set U $new_deps]
>> The trouble, of course, is that there are race conditions in both
>> cases where multi-threaded code could execute in a way which would
>> corrupt the dependency list.
>> On a single machine, I can simply guard the critical section in the
>> client code with a mutex, but this won't work on a distributed server
>> farm. It would be possible to use memcached's existing "incr" and
>> "decr" operations to produce a "global" mutex, but this is a) not
>> terribly efficient, and b) dangerous if one server incr's the
>> semaphore on entry to the critical section but crashes before it
>> exists - this would break things such that no machine could
>> subsequently update the dependency and we'd be stuck in that state
>> for good.
>> So, my proposed solution is to implement an "areplace" (atomic
>> replace) operation which looks like this:
>> my_deps = [areplace U ""]
>> and and append operation which looks like this:
>> [append U " C"]
>> I could then keep most of the bloat in the client library, not the
>> server. This approach also has the advantage of being able to track
>> dependencies over multiple memcached instances.
>> This is probably bending memcached's intended use slightly, but I
>> think it still adheres to the basic principle that it's a cache: all
>> the data can be regenerated from the database if required. The
>> advantage is that I can immediately tell when data should be marked
>> as being invalid, which means I can use much longer cache times
>> which, in turn, means that the cached items in the "long tail"
>> (object which get accessed infrequently but are expensive to compute)
>> can have much higher hit rates.
>> Seem at all sensible? I'd be interested in any comments before I go
>> away and code this up.
More information about the memcached