Failure, Data Integrity and the PECL Extension

Kenner Stross kenner at
Fri May 18 16:59:28 UTC 2007

I am using the PECL php extension for memcached access, and am
confused/concerned about data integrity in the case of a failure. I have
already found some discussions on this list regarding this issue, but I
don't see how those solutions hold up in a multi-server environment.
What I've found so far is basically this: Disable automatic failover,
use a callback method to catch the failure and in that callback routine
set the server status to off and stop any further retrying (-1), and
lastly, implement an external service monitor that can detect the
problem, flush the cache and then mark the server as available again.
That way, you can be sure all stale entries are flushed before it
rejoins the pool of active servers.
Fine for one client accessing the cache server.  But I don't see how
that guarantees integrity in a multi-client environment.  In particular,
I don't see how it works when the failure is quite temporary, due to a
heavy load that made the response too sluggish.  Hopefully I'm just
overlooking the obvious and one of you will straighten me out.
Let's imagine a simple 3 machine setup (m1 - m3), where each machine is
acting as a web server and a memcached server.
m1 web --> attempts write to m3 cache, but it fails due to extreme load.
Marks it as failed and offline (in the callback routine).
m2 web --> accesses m3 cache successfully (no load problem on m2, so no
failure). Doesn't see that m1 took it offline.
m2 is using invalid cache data (it's missing m1's activity) but doesn't
realize it. An external service monitor may or may not notice this
brief, intermittent problem, but even if it does, that doesn't help m2
avoid the m3 cache once m1 has experienced an m3 cache failure.
I'm sure I must be missing something.  Your help is greatly appreciated.
-------------- next part --------------
An HTML attachment was scrubbed...

More information about the memcached mailing list