Atomic replace and append operations
Richard Cameron
camster at citeulike.org
Fri Mar 18 05:55:47 PST 2005
I'm interested in implementing two extra storage commands "areplace"
and "append" for memcached. The former would behave exactly like
"replace", but would return the object's value before it performs the
replace operation; the latter would simply string append the new value
onto the old one.
The motivation is not purely to add extra bloat to memcached but rather
to provide the minimal extra operations I'd need to be able to keep
track of dependency trees in the cached data.
For instance, if I want to cache objects A and B (the results of two
SQL query, say) which depends on some "thing" in the database (like a
particular user U's account), I'd like to be able keep track of my
dependencies like this:
U -> A,B
Memcached itself seems like a good choice to store this list. So, when
user U does something to his account, I'll want to delete the newly
defunct objects from the cache. I think the best I can do at the moment
is:
my_deps := [get U]
foreach dep $my_deps:
[delete $dep]
[set U ""]
Also, if I perform another cacheable operation C which depends on U,
I'd like to be able to add this dependency to my list. The existing
protocol lets me say:
old_deps := "get U"
new_deps := append(old_deps, C)
[set U $new_deps]
The trouble, of course, is that there are race conditions in both cases
where multi-threaded code could execute in a way which would corrupt
the dependency list.
On a single machine, I can simply guard the critical section in the
client code with a mutex, but this won't work on a distributed server
farm. It would be possible to use memcached's existing "incr" and
"decr" operations to produce a "global" mutex, but this is a) not
terribly efficient, and b) dangerous if one server incr's the semaphore
on entry to the critical section but crashes before it exists - this
would break things such that no machine could subsequently update the
dependency and we'd be stuck in that state for good.
So, my proposed solution is to implement an "areplace" (atomic replace)
operation which looks like this:
my_deps = [areplace U ""]
and and append operation which looks like this:
[append U " C"]
I could then keep most of the bloat in the client library, not the
server. This approach also has the advantage of being able to track
dependencies over multiple memcached instances.
This is probably bending memcached's intended use slightly, but I think
it still adheres to the basic principle that it's a cache: all the data
can be regenerated from the database if required. The advantage is that
I can immediately tell when data should be marked as being invalid,
which means I can use much longer cache times which, in turn, means
that the cached items in the "long tail" (object which get accessed
infrequently but are expensive to compute) can have much higher hit
rates.
Seem at all sensible? I'd be interested in any comments before I go
away and code this up.
Richard.
More information about the memcached
mailing list