python-memcached server selection code (long)
Robert Szefler
robert.szefler at sensisoft.com
Thu Oct 25 13:28:49 UTC 2007
Brian Beuning napisał(a):
> I do not see "sequence of identical selections" being
> a problem. It is like saying you flip a coin 10 times
> and getting 5 heads in a row means the coin is broken.
>
Except that, with the current algorithm, you are guaranteed, for a given
number of trials and certain keys to always get the single one server.
Which is not a slight statistical hazard, but a prescribed catastrophe.
> I did not get your numbers. Why not do a test with N
> random keys? If about N/2 go to the 2 servers then it
> is OK. No matter how many happen "in a row".
>
The methodology of the test does not matter that much. It was tested
with keys of the form "1", "2", ... I am pretty sure that using random
keys from a certain more "random" distribution would result with same
effect.
> If your app does 1000 key lookups while 1 of 2 hosts are
> down, you would expect about 500 to fail (given the current
> memcached availability model). What does it matter if
> 500 in a row fail, or every other one fails?
>
The key issue here is determinism. If a random key in a random situation
is innaccessible one time in a 1000 requests, there is no problem OK.
(Well there is a problem, but let's pretend that we can shove it under
the carpet). On the other hand, if the situation is deterministic and
the "guilty" key (our one in a thousand, but a specific one) happens to
permanently cause the home page of a web site to load 5 times slower
because it was caching a result of a heavy SQL query... well, talk about
hazards.
> With any hashing algorithm, it is always possible your
> application uses keys that cause the hash to be unfair.
>
This is not a matter of being fair. In fact, the algorithm is fair and,
from a certain point of view, it is too fair. A solution to my problem
would be to guarantee that potentially every server gets tried in a
round and at the same time the query is random and fair in a sense.
Which brings us to the random permutation solution I proposed.
More information about the memcached
mailing list