Asymetric Failure + Best Practices

Tue Jun 24 03:45:05 UTC 2008

On Mon, Jun 23, 2008 at 15:56:10 -0700, Jason McGuirk wrote:
> Consider an asymmetric failure where only one of the servers (say,
> AppA) is unable to route MCB. This is substantially less likely the
> a categorical failure, but could happen for numerous reasons- a
> misconfigured firewall, a housed router, what have you. In this
> scenario, AppA makes the determination that MCB has failed and will
> remap the keyspace to MCA. AppB, meanwhile, is able to route MCB
> successfully and will continue placing Keys into BOTH MCA and MCB,
> and the chance for both servers to read stale data is inevitable,
> since invalidations will consistently map to different nodes.

That is the reason why Cache::Memcached::Fast doesn't implement
'rehash' feature of Cache::Memcached: different clients may view
server failures differently (and failures themselves may be
transient).  However any automatic failure notification is subject to
some delay, during which inconsistent access may occur (plus
notification may fail itself for the same reasons).  So C::M::F simply
tries to reconnect to the failed server forever, and leaves it to the
user to restart the server, or to atomically restart all clients with
different list of available servers.  After all, most, if not all,
memcached setups are local, i.e. both clients and servers are
controlled by the same authority.  Automatic
notifications/negotiations would only complicate application code
unnecessarily.

-- 
   Tomash Brechko