Better server selection client code?

Mon Nov 6 18:32:59 UTC 2006

Jeetendra Mirchandani wrote:
> >From the paper, section 2.2:
> <snip>
> For technical reasons detailed in [6], it is quite important to make a
> small number of copies of each cache point--that is, to map several
> copies of each cache to different ``random'' points on the unit circle.
> This produces a more uniform distribution of URLs to caches.
> </snip>
>   

That's "copies of each cache point," where a cache point in this context 
is "memcached server at 10.1.1.1 port 11211." I didn't read that as a 
reference to replicating the *contents* of the caches, which would 
certainly reduce your effective cache size. (Later on in the paper they 
talk about replicating "hot" cache entries, but that's a larger 
memcached issue that doesn't have to be addressed as part of this 
discussion.)

What the paper describes is exactly what Brad's client does, and it 
should distribute load at least as well as the hashing-with-modulo 
scheme of the old client, for any number of servers, with no negative 
impact on cache capacity. It does not require the number of servers to 
be a power of anything in particular.

The sample implementation, ironically, only works well for *small* 
numbers of servers, not large numbers, because the ratio of possible 
positions on the circle to number of copies of each cache point is too 
low. But that's why it's just a sample; it's trivial to increase the 
number of positions as needed.

Try it! It works.

> Even I had suggested a deterministic way of picking a point, which was
> based on the largest available free arc on the unit circle. But as
> Andrew added, this will distribute load uniform only when the number of
> servers are a power of 2.
> (http://lists.danga.com/pipermail/memcached/2006-October/002940.html )
>   

One property of doing it that way (which, granted, is true of the 
current memcached client code too) is that if one of my clients adds 
server X, then server Y, then server Z, with that algorithm they will 
end up in totally different positions on the circle than if it adds 
server Z, then server X, then server Y. With the "positions based on a 
hash of the server address" algorithm in Brad's sample implementation 
those two cases will result in identical positions around the unit 
circle; the distribution of servers around the circle is independent of 
the order in which they were added.

-Steve