namespace bids

Thu Feb 9 19:31:39 UTC 2006

You can fake the ability to mass-delete chunks of your cache with some
client-side trickery, assuming that you are willing to accept a bit of
cache bloat each time you do a delete.

Let's say we have a "namespace" called some.namespace.

Now, track "revisions" of this namespace in rev.some.namespace; this is
simply an integer which starts its life at zero.

When inserting into or querying memcache, first get the value of this
key. Assuming its value is 5, the *actual* key prefix we use in the
request is "some.namespace.5"; eg. "some.namespace.5.userid".

Now a "delete" on that entire namespace can be done atomically by using
memcached's atomic increment command to increment rev.some.namespace
in-place. Any future gets/sets will go to some.namespace.6.userid. The
leftover stale data under some.namespace.5.* will be ignored and
eventually expire out of the cache.

There is a possible race condition in this scenario, of course:

Client A fetches rev.some.namespace = 5
Client B flushes some.namespace by incrementing the counter to 6
Client A performs some operation on the old revision

In the set case this is mostly harmless since Client A will just set a
memcache key which will be ignored until it expires. In the get case
client A gets some stale data. Whether the split-second chance of
recieving stale data there matters depends on your application.

Of course, if you delete too frequently you'll end up with lots of dead
data hanging around waiting to expire. If you delete *really* frequently
you might find cases where a stale revision zero is still hanging about
when the integer overflows. I should hope that an application isn't
going to be flushing its cache too often, though, else there's little
point in having the cache in the first place.

The final caveat is that (as far as I remember) the increment command
will fail if the key does not already exist. This can be mitigated by
having your application, on startup, use the "set unless there's already
a value" command to set all of the revision counters to zero. If another
client has already done this then all you've done is wasted a bit of
startup time.

A much-simpler memcache change would be to allow a client to selectively
execute several consecutive operations with no interruptions from other
clients. Then you can safely handle namespaces in a client wrapper
library and not bloat memcached for people who don't need this
functionality.

(and no, I don't have time to implement that either! Maybe someone else
does, though.)

Cahill, Earl wrote:
> For some time we have had the same memcached servers handling several
> different nodes of information.  As we have seen memcached handle our
> loads, we have cached more, and memcached has yet to miss a beat.  We
> currently have millions of items cached, though, yes we could probably
> expire some of those.
> 
> Recently we added a new 'node' of stuff, call it fred.  Well, it turns
> out fred changes a bit more than the rest, and sometimes we would like
> to wipe everything that has anything to do with fred.  Problem is the
> key names in fred are rather dynamic, combining several types of data,
> one of which is an ordered list.  That makes it kind of hard to know all
> of fred's keys.  There are potentially a few hundred thousand such keys.
> It also doesn't appear to be possible to walk everything currently in a
> memcached instance for sufficiently large slabs.  We don't want to
> restart the whole instance, as we would lose several million confs and
> have to slam our netapp to rewarm the cache.  Ttls are fine, but it
> would be nice to just wipe the whole node.
> 
> Seems like namespaces are what we want here, and I think we'd be willing
> to pay for it.  As a pre-empt, the perl API does not really have name
> space support.  In my reading, all it does is combine your 'namespace'
> to keys, which as far as I can tell, aids little to our problem.
> 
> So my questions are, who can we pay to develop namespace support, and
> how much would it cost?  We would want all the code contributed back to
> trunk, and we would like to work off of trunk and not some branch or the
> like.  We would like to be able to handle lots of namespaces, like
> adding a namespace wouldn't be much more taxing than adding a normal
> key.
> 
> Maybe just a question for brad, but if namespace support just happened,
> in a good way, and all you had to was perhaps merge a few things, would
> you be ok with it?
> 
> I think namespaces could help improve what I think is a pretty great
> product.  Don't want to start a flame war, with other potential uses,
> but I think there are many.
>