Restarting the MemcacheD Cluster
timeless
time at digg.com
Sat Apr 15 06:03:43 UTC 2006
Philip Neustrom wrote:
>Why would a rolling restart cause cache corruption? You mentioned
>that you had cache corruption to begin with (in your application?)
>Maybe this is why you see it on a rolling restart?
>
>
(At least with the PHP MemcacheD class we use, from PHP.net):
Data is stored on nodes via a hash. If a node is down, then data is
stored on a different node. So if a single server goes down, then comes
up, then goes down again, there is almost guaranteed to be not-current
data on the cluster (in a high-volume environment, this is as close to
guaranteed as to make no difference). Example:
1. ServerA goes down,
2. ItemA is requested, hash now points at ServerB, which doesn't have
the item cached,
3. ItemA is retrieved from the database and cached onto ServerB,
4. ServerA comes up again,
5. ItemA is modified in the database,
6. ItemA is requested, hash now points at ServerA, which doesn't have
the item cached,
7. ItemA is retrieved from the database and cached on ServerA,
8. ServerA goes down AGAIN (pesky server),
9. ItemA is requested, hash now points at ServerB, which has the item
cached, but
10. ItemA is retrieved FROM CACHE from ServerB, which is OLDER than
ItemA in the database.
At this point, there is no way for the code to know that ItemA is older
than what's in the database. Attempting a rolling restart of every
memcached daemon in the cluster while data is actively being written to
the daemons will cause flaky cache results if any server subsequently
goes back down, since it is exactly this case of a server going down and
coming back up.
At least... that's what I think.
--
timeless
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.danga.com/pipermail/memcached/attachments/20060414/324f607b/attachment.htm
More information about the memcached
mailing list