Restarting the MemcacheD Cluster

Sat Apr 15 06:03:43 UTC 2006

Philip Neustrom wrote:

>Why would a rolling restart cause cache corruption?  You mentioned
>that you had cache corruption to begin with (in your application?) 
>Maybe this is why you see it on a rolling restart?
>  
>
(At least with the PHP MemcacheD class we use, from PHP.net):

Data is stored on nodes via a hash. If a node is down, then data is 
stored on a different node. So if a single server goes down, then comes 
up, then goes down again, there is almost guaranteed to be not-current 
data on the cluster (in a high-volume environment, this is as close to 
guaranteed as to make no difference). Example:

   1. ServerA goes down,
   2. ItemA is requested, hash now points at ServerB, which doesn't have
      the item cached,
   3. ItemA is retrieved from the database and cached onto ServerB,
   4. ServerA comes up again,
   5. ItemA is modified in the database,
   6. ItemA is requested, hash now points at ServerA, which doesn't have
      the item cached,
   7. ItemA is retrieved from the database and cached on ServerA,
   8. ServerA goes down AGAIN (pesky server),
   9. ItemA is requested, hash now points at ServerB, which has the item
      cached, but
  10. ItemA is retrieved FROM CACHE from ServerB, which is OLDER than
      ItemA in the database.

At this point, there is no way for the code to know that ItemA is older 
than what's in the database. Attempting a rolling restart of every 
memcached daemon in the cluster while data is actively being written to 
the daemons will cause flaky cache results if any server subsequently 
goes back down, since it is exactly this case of a server going down and 
coming back up.

At least... that's what I think.

--
timeless
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.danga.com/pipermail/memcached/attachments/20060414/324f607b/attachment.htm