Better server selection client code?

Wed Oct 18 03:37:35 UTC 2006

Earlier this summer Leon Brocard posted this message:

----
"I've been toying with idea of using consistent hashes with memcached, 
but I don't really have enough servers for it to be worth it.

    http://www8.org/w8-papers/2a-webserver/caching/paper2.html

Leon"
----

That got me thinking about the effects of adding a memcached server to a 
fleet of servers.  Basically the "typical" server selection code in a 
client just uses modulo into a "bucket" array to find a server.  For 
example this code from get_sock in the perl client:

    my $host = $self->{'buckets'}->[$hv % $self->{'bucketcount'}];

The result is that if you add an additional server to your list of 
servers then the client is essentially going to move every key to a 
different server; effectively a flush_all.

I decided to try out an implementation of the above referenced idea. 
I'm working in ruby using this memcached client 
(http://dev.robotcoop.com/Libraries/memcache-client/index.html).  The 
implementation is really simple:

   def get_server_for_key(key)
     # Easy enough if there is only one server.
     return @servers.first if @servers.length == 1

     #  Get the hash value for the key and map it onto the unit
     #  circle.
     key_unit_circle = (key.hash & 0xffffffff).to_f / 0xffffffff

     #  Figure out which server handles this key.
     server_num = (@buckets.size * key_unit_circle).ceil - 1

     #  Fetch the server from our "buckets"
     server = nil
     @buckets.nitems.times do |idx|
       server = @buckets[(server_num + idx) % @buckets.nitems]
       break if server.alive?
     end
     raise MemCacheError, "No servers available" unless server

     server
   end

A couple of notes...

   1.  In the www8.org paper you really only need to understand section 
2.2 to understand the algorithm.

   2.  I kept the block labeled "Fetch the server from our 'buckets'" 
really only out of convention as that's what all of the other clients 
do.  However, this strategy is prone to cache consistencies of a given 
server in the fleet is coming in and out of service.  In some 
applications, it may be better just to fail caching on a segment of the 
cache key space than to use the above code and have an inconsistent cache.

Using this algorithm (if the code is right ;) allows the addition of a 
new server to only cause the relocation of 1/totalServers keys which 
makes scaling your cache fleet far less disruptive.

What do you guys think?

Andy