Atomic replace and append operations

Fri Mar 18 13:54:00 PST 2005

We use the database for locking, since we have to have a handle on it
anyway.

I'm down for an atomic append operation, but I want to add virtual buckets
and the trackers first, as it's kinda a prereq.  Apps coded to just append
all the time could get in trouple if there's a network blip.  With the
trackers, it'd be detected and wiped.

- Brad

On Fri, 18 Mar 2005, Timo Ewalds wrote:

> I'll third that one. I've suggested it before as well.
>
> Brad answered my question a while ago with how he grabs the list of
> comments on livejournal. He answered that he has a list of comment ids
> in a memcache entry. He then grabs the individual ones one at a time
> based on the contents of the first. When a new comment is added, it gets
> added to the list of comment ids. I asked how to avoid the race
> condition, but never got an answer to that. An append operation would
> fix it.
>
> Timo
>
> Gregory Block wrote:
>
> > I'd second these additions - I've been waiting with a similar kind of
> > need for some time, and would be very happy to see this added.
> >
> > On 18 Mar 2005, at 13:55, Richard Cameron wrote:
> >
> >>
> >> I'm interested in implementing two extra storage commands "areplace"
> >> and "append" for memcached. The former would behave exactly like
> >> "replace", but would return the object's value before it performs the
> >> replace operation; the latter would simply string append the new
> >> value onto the old one.
> >>
> >> The motivation is not purely to add extra bloat to memcached but
> >> rather to provide the minimal extra operations I'd need to be able to
> >> keep track of dependency trees in the cached data.
> >>
> >> For instance, if I want to cache objects A and B (the results of two
> >> SQL query, say) which depends on some "thing" in the database (like a
> >> particular user U's account), I'd like to be able keep track of my
> >> dependencies like this:
> >>
> >> U -> A,B
> >>
> >> Memcached itself seems like a good choice to store this list. So,
> >> when user U does something to his account, I'll want to delete the
> >> newly defunct objects from the cache. I think the best I can do at
> >> the moment is:
> >>
> >> my_deps := [get U]
> >> foreach dep $my_deps:
> >>     [delete $dep]
> >> [set U ""]
> >>
> >> Also, if I perform another cacheable operation C which depends on U,
> >> I'd like to be able to add this dependency to my list. The existing
> >> protocol lets me say:
> >>
> >> old_deps := "get U"
> >> new_deps := append(old_deps, C)
> >> [set U $new_deps]
> >>
> >> The trouble, of course, is that there are race conditions in both
> >> cases where multi-threaded code could execute in a way which would
> >> corrupt the dependency list.
> >>
> >> On a single machine, I can simply guard the critical section in the
> >> client code with a mutex, but this won't work on a distributed server
> >> farm. It would be possible to use memcached's existing "incr" and
> >> "decr" operations to produce a "global" mutex, but this is a) not
> >> terribly efficient, and b) dangerous if one server incr's the
> >> semaphore on entry to the critical section but crashes before it
> >> exists - this would break things such that no machine could
> >> subsequently update the dependency and we'd be stuck in that state
> >> for good.
> >>
> >> So, my proposed solution is to implement an "areplace" (atomic
> >> replace) operation which looks like this:
> >>
> >> my_deps = [areplace U ""]
> >>
> >> and and append operation which looks like this:
> >>
> >> [append U " C"]
> >>
> >> I could then keep most of the bloat in the client library, not the
> >> server. This approach also has the advantage of being able to track
> >> dependencies over multiple memcached instances.
> >>
> >> This is probably bending memcached's intended use slightly, but I
> >> think it still adheres to the basic principle that it's a cache: all
> >> the data can be regenerated from the database if required. The
> >> advantage is that I can immediately tell when data should be marked
> >> as being invalid, which means I can use much longer cache times
> >> which, in turn, means that the cached items in the "long tail"
> >> (object which get accessed infrequently but are expensive to compute)
> >> can have much higher hit rates.
> >>
> >> Seem at all sensible? I'd be interested in any comments before I go
> >> away and code this up.
> >>
> >> Richard.
> >>
> >
> >
> >
>
>