memcached replication
hirose31 at t3.rim.or.jp
hirose31 at t3.rim.or.jp
Wed Sep 5 15:41:43 UTC 2007
Primary purposes of my async replication are:
1. redundancy
2. prevent burst access to persistent storage (e.g. RDBMS) just
after fail-over
From client standpoint, it seems there is one memcached. But backend
system consist two memcacheds - master and slave -. If master mcd is
down, slave becomes new master with {key/value} data. Clients are not
aware of fail-over at all.
//
= details =
== regularly ==
+--------+
| client |
+--------+
|
+-------+
|
v
IP(A) IP(V) IP(B)
+--------+ +--------+
| mcd-A | | mcd-B |
|(master)|<-VRRP->|(slave )|
+--------+ +--------+
| ^
| |
+--------------+
async repl
- IP(A): mcd-A node's IP address
- IP(B): mcd-B node's IP address
- IP(V): master's Virtual IP Address (VIP) which floating by VRRP
- VRRP : Virtual Router Redundancy Protocol
keepalived <http://www.keepalived.org> implements VRRP stack
* client access to IP(V).
* client don't access to IP(A) and IP(B).
== if master goes down ==
+--------+
| client |
+--------+
|
+--------+
|
v
IP(A) IP(V) IP(B)
+- - - - + +--------+
| mcd-A | | mcd-B |
| (DOWN )| |(master)|
+- - - - + +--------+
* VRRP detects mcd-A was down, and move IP(V) to mcd-B (new master)
* client is not aware of fail-over. seemlessly access to IP(V)
//
= replication lag =
see above, client does only access to master node. so replication lag is
no matter.
I've changed replication sequece since previous my email.
replication sequence:
1. client requests SET command to master
2. master stores to own area.
3. master send key/vaue to slave by memcached protocol
4. master do not wait receiving "STORED" from slave (asynchronous)
5. master return "STORED" to client
There is no get-back-old-data problem, even if master goes down during
between step of 2 and 4. because client does not receive "STORED"
response from master, so client will retry after a while.
= scalability =
If we need scalability, I think, we can balance access to a few
{master/slave} sets by key with
* client library (like Cache::Memcached)
or
* reverse proxy (like PerlBal)
or
* L7 load balance (like L7SW <http://www.linux-l7sw.org>)
+--------+
| client |
+--------+
|
|
<<balancing with cllient API >>
<< or reverse proxy >>
<< or L7 load balancer>>
|
|
+-------+--------+
| |
v v
IP(A) IP(V1) IP(M) IP(V2)
+--------+ +--------+
| mcd-A | | mcd-M |
|(master)| |(master)|
+--------+ +--------+
^ ^ ^ ^
| | | |
repl VRRP repl VRRP
| | | |
IP(B) v v IP(N) v v
+--------+ +--------+
| mcd-B | | mcd-N |
|(slave )| |(slave )|
+--------+ +--------+
thanks for reading!
in "Re: memcached replication"
<46DE648E.1060404 at everyone-here.is-a-geek.com>
at Wed, 05 Sep 2007 18:10:54 +1000,
brenton at everyone-here.is-a-geek.com wrote:
> Marcus Bointon wrote:
> > On 5 Sep 2007, at 06:22, Dustin Sallings wrote:
> >
> >> Doesn't this mean that you will sometimes write a value to a
> >> cache, and then later read a value back and get something other than
> >> the latest value you wrote? Getting known stale values seems worse
> >> than not getting something from the cache.
> >
> > I was thinking that. It might be better to deny the existence of the key
> > until replication is complete, or make the write synchronous from the
> > client's point of view.
>
> I don't think you could know to deny the existence of the key if the
> replication is asyncronous. Unless you are talking about a case where a
> large key/value has started being copied, but not completed. But I think
> this wouldn't be the common case.
>
> > ...
> >
> > Interestingly, though it doesn't get much press, MySQL replication
> > suffers the same problem - there is no transactional integrity between
> > replicated nodes - if you write to a master, then immediately read from
> > a slave, you may not get back what you just wrote. The fix there is not
> > to do it and use a unified front end like sequoia instead.
>
> I wouldn't say it doesn't get much press, it's a known limitation with
> the asyncronous replication (replication lag), and there are ways around
> it (cluster is the "best" option for mysql)
>
> I think the problem Dustin was describing though, was writing to the
> "secondary" cache, then that value being overwritten by the primary
> being replicated with older information.
>
> Which can only really be overcome (I think, if anyone knows better tell
> me :P) by only writing to the master. This means the application, or a
> proxy for it, must be aware of the master/slave situation. But your
> right this doesn't solve the lag problem
>
> >
> > Marcus
>
> Also, the original poster (Masaaki ?) mentioned it was not "not
> scalability, or high performance" but redundancy and fail-over. Which
> means it would only be used in extreme cases, and you could probably
> forgive the cache misses (dependant on application of course)
>
> --
> Brenton Alker
>
>
>
>
--
HIROSE, Masaaki
More information about the memcached
mailing list