Better server selection client code?
Andrew Harbick
aharbick at aharbick.com
Wed Oct 18 03:37:35 UTC 2006
Earlier this summer Leon Brocard posted this message:
----
"I've been toying with idea of using consistent hashes with memcached,
but I don't really have enough servers for it to be worth it.
http://www8.org/w8-papers/2a-webserver/caching/paper2.html
Leon"
----
That got me thinking about the effects of adding a memcached server to a
fleet of servers. Basically the "typical" server selection code in a
client just uses modulo into a "bucket" array to find a server. For
example this code from get_sock in the perl client:
my $host = $self->{'buckets'}->[$hv % $self->{'bucketcount'}];
The result is that if you add an additional server to your list of
servers then the client is essentially going to move every key to a
different server; effectively a flush_all.
I decided to try out an implementation of the above referenced idea.
I'm working in ruby using this memcached client
(http://dev.robotcoop.com/Libraries/memcache-client/index.html). The
implementation is really simple:
def get_server_for_key(key)
# Easy enough if there is only one server.
return @servers.first if @servers.length == 1
# Get the hash value for the key and map it onto the unit
# circle.
key_unit_circle = (key.hash & 0xffffffff).to_f / 0xffffffff
# Figure out which server handles this key.
server_num = (@buckets.size * key_unit_circle).ceil - 1
# Fetch the server from our "buckets"
server = nil
@buckets.nitems.times do |idx|
server = @buckets[(server_num + idx) % @buckets.nitems]
break if server.alive?
end
raise MemCacheError, "No servers available" unless server
server
end
A couple of notes...
1. In the www8.org paper you really only need to understand section
2.2 to understand the algorithm.
2. I kept the block labeled "Fetch the server from our 'buckets'"
really only out of convention as that's what all of the other clients
do. However, this strategy is prone to cache consistencies of a given
server in the fleet is coming in and out of service. In some
applications, it may be better just to fail caching on a segment of the
cache key space than to use the above code and have an inconsistent cache.
Using this algorithm (if the code is right ;) allows the addition of a
new server to only cause the relocation of 1/totalServers keys which
makes scaling your cache fleet far less disruptive.
What do you guys think?
Andy
More information about the memcached
mailing list