python-memcached server selection code (long)

Robert Szefler robert.szefler at sensisoft.com
Thu Oct 25 13:28:49 UTC 2007


Brian Beuning napisał(a):
> I do not see "sequence of identical selections" being
> a problem.  It is like saying you flip a coin 10 times
> and getting 5 heads in a row means the coin is broken.
>   
Except that, with the current algorithm, you are guaranteed, for a given 
number of trials and certain keys to always get the single one server. 
Which is not a slight statistical hazard, but a prescribed catastrophe.
> I did not get your numbers.  Why not do a test with N
> random keys?  If about N/2 go to the 2 servers then it
> is OK.  No matter how many happen "in a row".
>   
The methodology of the test does not matter that much. It was tested 
with keys of the form "1", "2", ... I am pretty sure that using random 
keys from a certain more "random" distribution would result with same 
effect.
> If your app does 1000 key lookups while 1 of 2 hosts are
> down, you would expect about 500 to fail (given the current
> memcached availability model).  What does it matter if
> 500 in a row fail, or every other one fails?
>   
The key issue here is determinism. If a random key in a random situation 
is innaccessible one time in a 1000 requests, there is no problem OK. 
(Well there is a problem, but let's pretend that we can shove it under 
the carpet). On the other hand, if the situation is deterministic and 
the "guilty" key (our one in a thousand, but a specific one) happens to 
permanently cause the home page of a web site to load 5 times slower 
because it was caching a result of a heavy SQL query... well, talk about 
hazards.
> With any hashing algorithm, it is always possible your
> application uses keys that cause the hash to be unfair.
>   
This is not a matter of being fair. In fact, the algorithm is fair and, 
from a certain point of view, it is too fair. A solution to my problem 
would be to guarantee that potentially every server gets tried in a 
round and at the same time the query is random and fair in a sense. 
Which brings us to the random permutation solution I proposed.


More information about the memcached mailing list