IDEA: Hierarchy of caches for high performance AND high capacity.

Wed Nov 1 15:29:04 UTC 2006

Kevin Burton wrote:
> I believe this could be done through a "hierarchy of caches" with a 
> local in-process cache used to buffer the normal memcached and a 
> disk-based memcached backed by berkeley DB providing large capacity.

Any time you get into using multi-level caches, you take big hits in 
coding complexity and performance.  The complexity comes from the need 
to update several caches when anything changes, handle failures at 
multiple levels, and deal with the increased chance of race conditions 
while writing to multiple caches.  The performance hit is from needing 
to check multiple caches and the network overhead involved in talking to 
two remote caches plus a database (where your permanent data is stored). 
  You can try to make the cache checks asynchronous, but that adds more 
complexity still.

> Long story short a local cache can be 4-8x faster than normal memcached.

That sounds about right for a shared memory cache.  A local in-process 
cache (in Perl) would be at least 10 times faster than Memcached.  That 
still isn't fast enough to make it worth doing unless you have some very 
small hot data that you always need.

> I might also note that since it's now backed by a persistent disk 
> backend one could use Memcached as a large <a 
> href="http://labs.google.com/papers/bigtable.html 
> <http://labs.google.com/papers/bigtable.html>">distributed hashtable</a> 
> similar to <a 
> href="http://glinden.blogspot.com/2005/09/googles-bigtable.html">Bigtable</a>.  

BigTable isn't really a distributed hash.  It provides a complex data 
access API and is heavily oriented towards redundancy and failover. 
It's a closer cousin to MySQL Cluster than to Memcached.

- Perrin