Adaptive Replacement Cache

Tue Nov 30 00:51:07 PST 2004

On Sat, Nov 27, 2004 at 11:08:22AM -0800, Brad Fitzpatrick wrote:
> Might be nice to make memcached support configurable cache policies.
> Currently it's just LRU, but this would be a good option:
> 
> http://www.almaden.ibm.com/StorageSystems/autonomic_storage/ARC/index.shtml

I read one of their papers and looked into this a bit.

I get the impression that ARC is especially nice because it has low
overhead.  They contrast it with LRU-2, claiming its two limitations are
one twiddle parameter and a priority queue, causing a "severe overhead"
of O(log n).  memcached uses so little CPU it probably wouldn't hurt to
spend a bit on either better cache policies or better memory efficiency,
and it may be worth investigating exactly which algorithm would offer
the best payoffs.

As for implementing ARC itself, it appears you need to be able to keep
around metadata after data is dropped:  "In addition to the $c$ pages in
the cache, the algorithm ARC remembers $c$ recently evicted pages."
(This extra data is used for adaptativity:  roughly, if you still have
the metadata but not the data for something you're looking up, it
indicates you ought to have kept the data around for longer and lets you
adjust the policy.)
Currently it seems that the memcached "item" struct contains not only
the item pointers and flags, but also the key and the data.  From my
casual glance at the code, some more major surgery may be required to
allow a cache policy like ARC.

But perhaps some changes would be useful anyway-- maybe you could keep a
pool of fixed-size item structures (you'd have to limit the key length
to make them fixed-size, then) and use the slabs purely for item data.
This would also allow small data (for example, numbers accessible by
incr and decr -- where the data is fewer bytes than the 32 bytes the
item structure itself uses) to occupy smaller slab classes.

But I'm just guessing.  Brad, have you collected+published any stats on
the distribution of LiveJournal memcached data?  It'd be neat to collect
a trace or two of just the commands sent to the server... such a log
would allow measuring things like min/avg/max key length or common key
sizes and more importantly allow rapid testing of a new cache policy
without any memcached hacking.

(On the subject of algorithms, this caught my eye recently:
  Efficient Hashing with Lookups in Two Memory Accesses
  http://www.cs.berkeley.edu/~adib/bats/abstracts.html#panigrahy
Hashing isn't a bottleneck at all yet, though.)

-- 
Evan Martin
martine at danga.com
http://neugierig.org