IDEA: Hierarchy of caches for high performance AND high capacity.

Tue Oct 31 22:00:46 UTC 2006

(NOTE: I posted this to my blog as well:
http://www.feedblog.org/2006/10/idea_hierarchy_.html )

This is an idea I've been kicking around for a while and wanted some
feedback.  Memcached does an amazing job as it is but there's always room
for improvement.

Two areas that memcached could be improved are local acceleration and large
capacity support (terabyte range)

I believe this could be done through a "hierarchy of caches" with a local
in-process cache used to buffer the normal memcached and a disk-based
memcached backed by berkeley DB providing large capacity.

The infrastructure would look like this:

<code>
in-process memcached ->  normal memcached -> disk memcached
</code>

The in-process memcached would not be configured to access a larger
memcached cluster.  Clients would <b>not</b> use the network to get()
objects and it would only take up a small amount of memory on the local
machine.  Objects would not serialize themselves before they're stored so
this should act like an ultrafast LRU cache as if it were a native caching
system within the VM.

Since it's a local it should be MUCH faster than the current memcached.

Here are some benchmarks of APC-cache vs memcached.

<a href="http://www.mysqlperformanceblog.com/2006/09/27/apc-or-memcached/">
http://www.mysqlperformanceblog.com/2006/09/27/apc-or-memcached/</a>
<a href="
http://www.mysqlperformanceblog.com/2006/08/09/cache-performance-comparison/
">
http://www.mysqlperformanceblog.com/2006/08/09/cache-performance-comparison/
</a>

Long story short a local cache can be 4-8x faster than normal memcached.

The local in-process cache would be available on every node within this
cluster and act as a L1 cache for ultra fast access to a small number of
objects.

I'm not sure all languages would support this type of cache because it would
require access to and storage of object pointers. I believe you can do this
with Java by hacking JNI pointers directly but I'm not 100% certain.

This cache would be configured to buffer a normal memcached cluster.  We're
all familiar with this type of behavior so I won't explain this any further.

The third component in the is a distributed memcached daemon which uses
Berkeley DB (or another type of persistent hashtable) for storage instead of
the normal slab allocator.  While this might seem like blasphemy to a number
of you I think it could be useful to a number of people with ultra large
caching requirements (hundreds of gigs) which can't afford the required
memory.

There's already a prototype implementation of this in Tugela Cache:

http://meta.wikimedia.org/wiki/Tugelacache

For optimal performance the memcached driver would have to do parallel and
concurrent getMulti requests so that each disk in the system can seek at the
same time.  There are a number of memcached implementations (including the
Java impl which I've contributed to) which fetch in serial.  Since memcached
is amazingly fast this really hasn't shown up in any of my benchmarks but
this would really hinder a disk-backed memcached.

This would provide ultra high capacity and since the disk seeks are
distributed over a large number of disks you can just add spindles to the
equation to get higher throughput.

This system would also NOT suffer from disk hotspots since memcached and the
local in-memory memcached would buffer the disk backend.

>From a non-theoretical perspective the local cache could be skipped or
replaced with a native LRU cache.  These are problematic though due to
memory fragmentation and garbage collection issues.  I use a local LRU cache
for Java but if I configure it to store too many objects I can run out of
memory.  It also won't be able to reache the 85% capacity we're seeing with
the new memcached patches.

I might also note that since it's now backed by a persistent disk backend
one could use Memcached as a large <a href="
http://labs.google.com/papers/bigtable.html">distributed hashtable</a>
similar to <a href="
http://glinden.blogspot.com/2005/09/googles-bigtable.html">Bigtable</a>.

Thoughts?  I think this idea could be very compelling.

-- 
Founder/CEO Tailrank.com
Location: San Francisco, CA
AIM/YIM: sfburtonator
Skype: burtonator
Blog: feedblog.org
Cell: 415-637-8078
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.danga.com/pipermail/memcached/attachments/20061031/261bc432/attachment-0001.htm