Why not let the fs handle this ??
Jon Drukman
jsd at cluttered.com
Wed Jun 7 18:05:41 UTC 2006
Steven Grimm wrote:
> Jon Drukman wrote:
>> Has this actually worked out well in practice for anybody? I've found
>> that losing one machine (out of about 100) results in so much db
>> thrashing as the keys get repopulated into different places that the
>> site becomes basically unusable until enough of the cache has been
>> regenerated (5-10 minutes if i'm lucky).
>
> The problem there is not with lots of memcached servers, but that you're
> letting your client relocate your keys when a server goes down. Our
> approach is just to let the missing machine cause cache misses, but the
> others continue to serve up their usual data. (A missing memcached never
> stays missing for long since our operations team monitors the servers
> and fixes any broken ones pretty quickly.)
if a machine is actually crashed/down (happens once in a while), as
opposed to the machine still being there but with memcached having
crashed (never happened to me yet) then the connections to the downed
box take 2 minutes before they give up. this basically renders the site
inoperative as lots of pages can hang for 2 minutes (or more) before
returning. this quickly fills up all the apache connection slots.
so removing a downed box from the memcache server pool list promptly is
essential - which means all keys get reshuffled.
ideally there would be a way to specify a 1 second timeout on memcache
connection, which would get around this, but as far as i'm aware the 2
minute thing is an OS limitation. would love to be proved wrong on that :)
-jsd-
More information about the memcached
mailing list