linuxjournal article on memcached

Fri, 16 Jul 2004 11:13:50 -0700 (PDT)

On Wed, 14 Jul 2004 johanl@darserman.com wrote:

> > At 02:56 2004-07-14, Brad Fitzpatrick wrote:
> > Let me know if you have any questions after reading the article, or find
> > other things that are now out-of-date.
>
> Nice article, it should spawn interest.
>
> I'm curious about what failure/failover strategies are used wrt memcached.
> The article mentions that briefly, but could you please expand on it?
>
> What about handling added/removed memcached servers? How and how often do
> you propagate and syncronize the configuration changes to the clients?

With the Perl client you have a choice to make it re-hash keys on dead
servers to new servers, or just treat dead servers like cache misses.

With enough servers, the cache hit penalty for a dead server isn't that
bad, and is the safer choice if your application isn't ready for this
case:

  store "foo=bar1" on server A
  A gets partitioned from the network
  client 1 sets "foo=bar2" on A, but A's unreachable, so it sets
     "foo=bar2" on server B
  A merges back into the network
  client 2 requests key "foo" from server A, gets old value

We have a plan to fix that.  In a nutshell, there will be 8096 or so
"virtual buckets" and when a server starts up, it handles no buckets at
all.

The client when sending a key must say,

   key: foo
   value: bar3
   bucket:  387  (I hashed string "foo" to bucket 387)

Then, the server will respond with either "OK" or "NOT_OWNER".

In the NOT_OWNER case, the memcached client contacts one of the memcached
bucket manager processes.  (Written in Perl... low-traffic, no fancy
memory allocation).

All the memcached bucket managers do is listen for reports from clients
when a client couldn't delete/set something somewhere, and keep track of
what buckets are "dirty".  They also store the master configuration about
what servers own what buckets.

The first time a client sends a request to a server and gets NOW_OWNER
that triggers the client to contact the bucket manager and get the updated
list and retry the request.

Likewise when servers are down.

So say the application NEEDS to do a delete, otherwise the cache would be
out of sync with the database.  But the memcached's network is unplugged,
so the delete can't happen.  Then, the client tells one of the multiple
bucket managers, "Yo, I couldn't fix up bucket 387, so next time you see
it, wipe the entire bucket, and assign that bucket to another host and
give me the list of who owns that bucket now."

So that's the plan.  The changes to the core server are minimal.  The
worst part is the protocol will need an extra field for each operation,
which is the client's hashing value.  But that can just go at the end.

- Brad