System architecture best practices - memcached + webfarms
jan at hitflip.de
Mon Jul 9 12:06:03 UTC 2007
We've had situations where the death of a memcached server caused the
database to overload, since this server contained a significant part of
the cache. With dozens or hundreds of web slaves requesting the same
data, the cache doesn't fill properly, as no process ever reaches the end.
What we did:
- write scripts to "warmup" the cache before the site goes live again,
to prevent the first visitors from killing everything again. These were
dumb in the beginning (simply crawl the page in the background while it
is offline) but got smarter (warmup the cache with the data that has the
most impact (expensive to get and often requested)).
- dividing up the data and the number of servers in a way that the
database can handle the death of one memcache (and of course the
regeneration of cache keys on another system) without db problems. doing
tests we found a sweet spot at about 6-8 machines running memcache
instances (with larger cache sizes than 2 GB). Having one of these fail
will not compromise the database. Below 4 machines running memcache the
failure of one gets critical.
More information about the memcached