Transparent failover and restore?

Fri Dec 17 16:41:39 PST 2004

Hi guys,

On Fri, 2004-12-17 at 19:33 -0500, Larry Leszczynski wrote:
> Hi Gregory -
> 
> On Sat, 18 Dec 2004, Gregory Block wrote:
> 
> > On 17 Dec 2004, at 21:09, Brad Fitzpatrick wrote:
> > > Guys,
> > >
> > > If you want to use memcached as a data store instead of a /cache/, then
> > > use MySQL Cluster:
> > >
> > >    http://www.mysql.com/cluster/
> > >
> > > It was designed for that, doing the whole redundant storage and
> > > two-phase
> > > commit thing, while memcached was designed to be a cache.
> > >
> >
> > No, I think the point is quite simple, actually;
> >
> >   - we cache things because the cost of generating them is too high to
> > do in bulk
> >   - on large-scale systems, or heavily used systems, the cost of losing
> > a server can bring down the system
> >
> >
> > The problem can be mitigated, with additional runtime overhead by
> > clients, with some work to ensure that there's better distribution of
> > single items of information within the cache network.  Moving each
> > information onto two servers in the network immediately makes the
> > entire system less likely to fall over dead; it also reduces the amount
> > of "actual" free space in the cluster you add when you add a node, but
> > that's just efficiency losses.  People with RAID will already be
> > familiar with that kind of logic.
> 
> Nobody would argue those points, but I think the point of Brad's mail was
> that there's no need to reinvent the wheel since MySQL cluster satisfies
> all those needs.

Mysql Cluster doesn't always satisy those needs, as for many
applications you have way too much data to store in an in memory
database as your primary storage (cost effectively anyway).  It is not
easy to use Mysql cluster as a cache above a slower disk based system.
If your entire set of data fits in memory across your cluster then great
use Mysql cluster.

However, there is also a need for redundancy in a cache at times, as
even though you "can" regenerate from your slower storage (disk based
db), it may be something you want to avoid if at all possible due to the
performance hit.

I think this is something that does not belong in memcached itself, as
it isn't really a feature of the cache, but belongs client side so those
who wish to distribute items redunandtly into seperate server side may
do so.  This is relatively trivial to implement, imagine the following:

Client performs hash to determine server to put key in (just like it
does now), client puts key/value pair into cache.  Client then performs
some kind of second hash that gurantees a different server is selected
and puts key/value pair into another servers cache.

The only requirement now left is to ensure that as servers are
added/removed or go up/down the hashing always comes up with the same
server.  There are a number of ways to do this, which I won't get into
here.  You obviously add both extra cache memory usage, and extra
network bw usage by doing this, but in some cases it may be worth it.

When you do gets, you simply try one, if the server is down, try the
next, and then report a miss only after both fail.

> 
> 
> Larry
> 
-- 
John A. McCaskey
Software Development Engineer
Klir Technologies, Inc.
johnm at klir.com
206.902.2027