Transparent failover and restore?

Mon Dec 20 12:28:04 PST 2004

Josh Berkus wrote:
> Greg,
> 
> 
>>My original point was that the at least some of the stability we have
>>seen with the memcache server is due in part to its simplistic design
>>(server hashing handled by clients, lack of server side state
>>management).  As software complexity grows, the chance for bugs,
>>instability and errors grows with it.  I like production services to
>>be simple, modular, and stable.
> 
> 
> Which is a good reason to make the redundancy optional, but not to
> oppose it.   It's also a good focus for the redundancy; keep is as
> simple as possible.
> 
> From my perspective (I used memcached as an accessory to a database, so
> it only holds quickly restorable data) the problem with taking a server
> offline is not primarily one of re-building the cache data, but rather
> one of propagating the server lists.   To give you an example:
> 
> For a proposed project, we have 5 PostgreSQL database servers, each one
> running a 250MB instance of Memcached to hold session management
> information.   These servers are replicated by Slony, allowing us to
> take them offline one at a time to upgrade them to PostgreSQL 8.0
> (slony supports this).   Our pooling component, C-JDBC, dynamically
> recognizes which servers are not responding and queries the remaining 4
> servers while the 5th is down.
> 
> But memcached doesn't.  In fact, due to the way hash keys are handled,
> not only do we have to propogate new server lists to each one of 8
> webservers, but the entire cache is invalidated and needs to be rebuild
> from database for all 4 machines.   Then when the 5th machine is back
> online, we have to rebuild the entire cache again.
> 
> This is silly.  If C-JDBC can dynamically keep track of servers in the
> pool with minor overhead, why can't we make a pooling daemon for
> memcached?  Then we could simply copy the cache on the 5th machine to a
> new cahce on the 4th machine, and tell the server daemon to redirect
> while we upgrade the 5th machine.  Each cache machine can run a server
> daemon with minor resources which can be easily updated, in batch mode,
> via a network command.
> 
> If people who don't need such pooling don't want the overhead, then
> they can continue to "direct write" to the cache the way it works now.
> 
> --Josh Berkus
> 

So sounds like you are talking about a seperate daemon that acts sort of 
as a proxy?  This I would not have a problem with as it fits nicely into 
my modular way of running things.