Implementing Memcached on EC2

Fri Jan 5 23:51:55 UTC 2007

> I brought this question up to Brad (actually, asking more about the 
> auth factor), and he brought up some good points.
>
> > But:  how are nodes currently discovering the available
> > memcached servers? And don't you have rehashing issues if they're
> > coming and going?  Or are you using consistent hashing on the
> > client side?  Or just local single node caching?  In which case,
> > what's wrong with 127.0.0.1?
> > So yes, auth solves a bit, but I'm curious how you plan to make
> > this work reliably when nodes are coming and going.
I think a lot of what we require with our implementation of Memcache on 
EC2 goes beyond the scope of what Memcache was designed for and what 
other people need. So, rather than proposing a way to modify Memcache, 
I'd like to bounce some ideas off of you guys on how we can design a 
scaleable, fault-tollerant solution that meets our needs. We don't 
really want to go about reimplementing Memcache, so we'd like to work 
with the strengths of Memcache while at the same time addressing the 
weaknesses.

Problems:
1. No way to get all Memcache clients to get the same server lists.
2. No way to get all Memcache clients to have the same network 
connectivity to Memcache instances
3. Without (1) and (2), there is no way to to guarantee(*) that all 
clients have an identical view of the cached data.

(*) or as close to a guarantee (within miliseconds) as we can 
realistically get.

Result: Data inconsistency. The Memcache architecture is susceptible to 
inconsistent cache hits if not all Memcache clients share identical 
server lists and identical Memcache instance connectivity.
A server/Memcache instance can be unreachable on the network for any 
subset of clients accessing it, but at some later point in time come 
online. If this happens, the data associated with the keys on this 
Memcache instance are potentially inconsistent with the Memcache 
instance(s) that took over.

Solution: Implement a layer on top of Memcache that functions as a 
distributed proxy network which maintains persistent connections to all 
Memcache instances and all proxy instances. Each proxy coordinates with 
all other Memcache proxies to always maintain identical server lists. If 
any proxy gets disconnected from any Memcache instance, it broadcasts 
this to all other proxies. Any time a Memcache instance joins a proxy's 
list of Memcache servers, all keys are flushed on that Memcache 
instance. Memcache clients only connect to Memcache proxies.

Assumptions:
1. All proxies can connect to all other proxies. In other words, 
connectivity is not an issue for proxies.
2. If a proxy cannot connect to another proxy, it assumes the proxy is 
dead for everyone (clients and proxies).
3. A method exists for proxies to discover new Memcache instances.

I'd appreciate to hear all constructive criticisms/weaknesses of this 
approach and possible resolutions to any weaknesses, which I'm sure 
there are... Alternately, any other random suggestions/contributions to 
overcome our problems.

Best Regards,

Erik Osterman