Atomic replace and append operations

Fri Mar 18 05:55:47 PST 2005

I'm interested in implementing two extra storage commands "areplace" 
and "append" for memcached. The former would behave exactly like 
"replace", but would return the object's value before it performs the 
replace operation; the latter would simply string append the new value 
onto the old one.

The motivation is not purely to add extra bloat to memcached but rather 
to provide the minimal extra operations I'd need to be able to keep 
track of dependency trees in the cached data.

For instance, if I want to cache objects A and B (the results of two 
SQL query, say) which depends on some "thing" in the database (like a 
particular user U's account), I'd like to be able keep track of my 
dependencies like this:

U -> A,B

Memcached itself seems like a good choice to store this list. So, when 
user U does something to his account, I'll want to delete the newly 
defunct objects from the cache. I think the best I can do at the moment 
is:

my_deps := [get U]
foreach dep $my_deps:
	[delete $dep]
[set U ""]

Also, if I perform another cacheable operation C which depends on U, 
I'd like to be able to add this dependency to my list. The existing 
protocol lets me say:

old_deps := "get U"
new_deps := append(old_deps, C)
[set U $new_deps]

The trouble, of course, is that there are race conditions in both cases 
where multi-threaded code could execute in a way which would corrupt 
the dependency list.

On a single machine, I can simply guard the critical section in the 
client code with a mutex, but this won't work on a distributed server 
farm. It would be possible to use memcached's existing "incr" and 
"decr" operations to produce a "global" mutex, but this is a) not 
terribly efficient, and b) dangerous if one server incr's the semaphore 
on entry to the critical section but crashes before it exists - this 
would break things such that no machine could subsequently update the 
dependency and we'd be stuck in that state for good.

So, my proposed solution is to implement an "areplace" (atomic replace) 
operation which looks like this:

my_deps = [areplace U ""]

and and append operation which looks like this:

[append U " C"]

I could then keep most of the bloat in the client library, not the 
server. This approach also has the advantage of being able to track 
dependencies over multiple memcached instances.

This is probably bending memcached's intended use slightly, but I think 
it still adheres to the basic principle that it's a cache: all the data 
can be regenerated from the database if required. The advantage is that 
I can immediately tell when data should be marked as being invalid, 
which means I can use much longer cache times which, in turn, means 
that the cached items in the "long tail" (object which get accessed 
infrequently but are expensive to compute) can have much higher hit 
rates.

Seem at all sensible? I'd be interested in any comments before I go 
away and code this up.

Richard.