memcached replication

hirose31 at t3.rim.or.jp hirose31 at t3.rim.or.jp
Wed Sep 5 15:41:43 UTC 2007


Primary purposes of my async replication are:

  1. redundancy

  2. prevent burst access to persistent storage (e.g. RDBMS) just
     after fail-over

From client standpoint, it seems there is one memcached. But backend
system consist two memcacheds - master and slave -. If master mcd is
down, slave becomes new master with {key/value} data.  Clients are not
aware of fail-over at all.

                                  //

= details =

== regularly ==

            +--------+
            | client |
            +--------+
                 |
         +-------+
         |
         v
IP(A)  IP(V)                   IP(B)
    +--------+        +--------+
    | mcd-A  |        | mcd-B  |
    |(master)|<-VRRP->|(slave )|
    +--------+        +--------+
          |              ^
          |              |
          +--------------+
             async repl

  - IP(A): mcd-A node's IP address
  - IP(B): mcd-B node's IP address
  - IP(V): master's Virtual IP Address (VIP) which floating by VRRP
  - VRRP : Virtual Router Redundancy Protocol
           keepalived <http://www.keepalived.org> implements VRRP stack

  * client access to IP(V).
  * client don't access to IP(A) and IP(B).


== if master goes down ==

            +--------+
            | client |
            +--------+
                 |
                 +--------+
                          |
                          v
IP(A)                    IP(V) IP(B)
    +- - - - +        +--------+
    | mcd-A  |        | mcd-B  |
    | (DOWN )|        |(master)|
    +- - - - +        +--------+

  * VRRP detects mcd-A was down, and move IP(V) to mcd-B (new master)
  * client is not aware of fail-over. seemlessly access to IP(V)

                                  //

= replication lag =

see above, client does only access to master node. so replication lag is
no matter.

I've changed replication sequece since previous my email.

  replication sequence:
    1. client requests SET command to master
    2. master stores to own area.
    3. master send key/vaue to slave by memcached protocol
    4. master do not wait receiving "STORED" from slave (asynchronous)
    5. master return "STORED" to client

There is no get-back-old-data problem, even if master goes down during
between step of 2 and 4. because client does not receive "STORED"
response from master, so client will retry after a while.


= scalability =

If we need scalability, I think, we can balance access to a few
{master/slave} sets by key with
  * client library (like Cache::Memcached)
or
  * reverse proxy (like PerlBal)
or
  * L7 load balance (like L7SW <http://www.linux-l7sw.org>)

             +--------+
             | client |
             +--------+
                 |
                 |
  <<balancing with cllient API     >>
  <<            or reverse proxy   >>
  <<            or L7 load balancer>>
                 |
                 |
         +-------+--------+
         |                |
         v                v
IP(A)  IP(V1)     IP(M)  IP(V2) 
    +--------+        +--------+  
    | mcd-A  |        | mcd-M  |  
    |(master)|        |(master)|
    +--------+        +--------+  
      ^    ^            ^    ^  
      |    |            |    |  
     repl VRRP         repl VRRP
      |    |            |    |  
IP(B) v    v      IP(N) v    v  
    +--------+        +--------+
    | mcd-B  |        | mcd-N  |
    |(slave )|        |(slave )|
    +--------+        +--------+


thanks for reading!



in "Re: memcached replication"
   <46DE648E.1060404 at everyone-here.is-a-geek.com>
at Wed, 05 Sep 2007 18:10:54 +1000,
   brenton at everyone-here.is-a-geek.com wrote:

> Marcus Bointon wrote:
>  > On 5 Sep 2007, at 06:22, Dustin Sallings wrote:
>  >
>  >>     Doesn't this mean that you will sometimes write a value to a
>  >> cache, and then later read a value back and get something other than
>  >> the latest value you wrote?  Getting known stale values seems worse
>  >> than not getting something from the cache.
>  >
>  > I was thinking that. It might be better to deny the existence of the key
>  > until replication is complete, or make the write synchronous from the
>  > client's point of view.
> 
> I don't think you could know to deny the existence of the key if the
> replication is asyncronous. Unless you are talking about a case where a
> large key/value has started being copied, but not completed. But I think
> this wouldn't be the common case.
> 
>  > ...
>  >
>  > Interestingly, though it doesn't get much press, MySQL replication
>  > suffers the same problem - there is no transactional integrity between
>  > replicated nodes - if you write to a master, then immediately read from
>  > a slave, you may not get back what you just wrote. The fix there is not
>  > to do it and use a unified front end like sequoia instead.
> 
> I wouldn't say it doesn't get much press, it's a known limitation with
> the asyncronous replication (replication lag), and there are ways around
> it (cluster is the "best" option for mysql)
> 
> I think the problem Dustin was describing though, was writing to the
> "secondary" cache, then that value being overwritten by the primary
> being replicated with older information.
> 
> Which can only really be overcome (I think, if anyone knows better tell
> me :P) by only writing to the master. This means the application, or a
> proxy for it, must be aware of the master/slave situation. But your
> right this doesn't solve the lag problem
> 
>  >
>  > Marcus
> 
> Also, the original poster (Masaaki ?) mentioned it was not "not
> scalability, or high performance" but redundancy and fail-over. Which
> means it would only be used in extreme cases, and you could probably
> forgive the cache misses (dependant on application of course)
> 
> --
> Brenton Alker
> 
> 
> 
> 

-- 
HIROSE, Masaaki


More information about the memcached mailing list