Better server selection client code?
Steven Grimm
sgrimm at facebook.com
Mon Nov 6 18:32:59 UTC 2006
Jeetendra Mirchandani wrote:
> >From the paper, section 2.2:
> <snip>
> For technical reasons detailed in [6], it is quite important to make a
> small number of copies of each cache point--that is, to map several
> copies of each cache to different ``random'' points on the unit circle.
> This produces a more uniform distribution of URLs to caches.
> </snip>
>
That's "copies of each cache point," where a cache point in this context
is "memcached server at 10.1.1.1 port 11211." I didn't read that as a
reference to replicating the *contents* of the caches, which would
certainly reduce your effective cache size. (Later on in the paper they
talk about replicating "hot" cache entries, but that's a larger
memcached issue that doesn't have to be addressed as part of this
discussion.)
What the paper describes is exactly what Brad's client does, and it
should distribute load at least as well as the hashing-with-modulo
scheme of the old client, for any number of servers, with no negative
impact on cache capacity. It does not require the number of servers to
be a power of anything in particular.
The sample implementation, ironically, only works well for *small*
numbers of servers, not large numbers, because the ratio of possible
positions on the circle to number of copies of each cache point is too
low. But that's why it's just a sample; it's trivial to increase the
number of positions as needed.
Try it! It works.
> Even I had suggested a deterministic way of picking a point, which was
> based on the largest available free arc on the unit circle. But as
> Andrew added, this will distribute load uniform only when the number of
> servers are a power of 2.
> (http://lists.danga.com/pipermail/memcached/2006-October/002940.html )
>
One property of doing it that way (which, granted, is true of the
current memcached client code too) is that if one of my clients adds
server X, then server Y, then server Z, with that algorithm they will
end up in totally different positions on the circle than if it adds
server Z, then server X, then server Y. With the "positions based on a
hash of the server address" algorithm in Brad's sample implementation
those two cases will result in identical positions around the unit
circle; the distribution of servers around the circle is independent of
the order in which they were added.
-Steve
More information about the memcached
mailing list