Distributed Network Syncing

Mon Aug 8 18:19:29 PDT 2005

I have seen a problem like this arise when this sequence of events happens:

-2+ servers are up and in use properly, exactly as described below
-1 server fails, all data that would go to that server is distributed to 
the rest as expected, some with a long expiration length
-dead server comes back up, the data that was distributed stays where it 
was, never being updated, but also gets re-added and updated to the now 
live server
-same server dies fairly shortly after coming back up (ie within the 
time period of the long expiration times)

now the data with long expiration times that was distributed to the 
other servers during the first outage is potentially out of date, as it 
wasn't updated or deleted from the cache when the server came back up. 
I'm not sure how to detect this.
The solution that I used for a while was to have a server that I used as 
a spare. Anytime a server went down, I manually swapped that one out in 
the config until the down server came back up at which point I swapped 
it back. This had the minor benefit of being slightly faster (only hash 
once), and the larger benefit of not keeping old data around, as I'd 
flush the data on the spare once the spare was taken out of rotation. 
Now if this could be automated, that'd be much better.

Timo

Jason Coene wrote:

>Additionally, regarding the following:
>
>"This bad data seems to  propogate across the network until I have to bring
>all 3 memcache servers down, then start one, hit the correct webpages to
>prime my data, and then start the other two memcache servers back up."
>
>First, One memcached server has no way of affecting the others in any way,
>what you're experiencing is certainly a result of the logic in your
>software.
>
>Second, it sounds like you're relying upon Memcached as a database, where if
>it fails to return the data you need the whole operation comes grinding to a
>halt.  This is bad practice.  Try the following process:
>
>- Try memcached for data you need
>	- If succeeds, use cache data.
>	- If fails, hit database and store result in cache.
>- The next client will get data from the cache.
>
>Of course, it's never this simple to implement, but this is the conventional
>logic among many memcached users.
>
>If implemented this way you'll never be unable to deliver data so long as
>your database server is up.  A memcached server can fail, and your software
>will simply get the data from the database instead.  It works very well.
>
>Regards,
>
>Jason
>
>  
>
>>-----Original Message-----
>>From: memcached-bounces at lists.danga.com [mailto:memcached-
>>bounces at lists.danga.com] On Behalf Of Ivan Krstic
>>Sent: Monday, August 08, 2005 5:21 PM
>>To: jesse at blastro.com
>>Cc: memcached at lists.danga.com; Casey Charvet
>>Subject: Re: Distributed Network Syncing
>>
>>Jesse Brede wrote:
>>    
>>
>>>My question is:  How does memcache sync data entries in a distributed
>>>network?
>>>      
>>>
>>It doesn't. It's not a database.
>>
>>    
>>
>>>Is there something that I need to do to resync the servers
>>> correctly when one has gone down.
>>>      
>>>
>>The servers don't sync - the client decides, based on hashing, on which
>>server to store a particular key/value pair.
>>
>>-IK
>>    
>>
>
>
>
>  
>