ketama: question about re-distribution of items after server addition/removal

Josh Snyder josh at
Fri Oct 26 16:27:23 UTC 2007


I'm new to this list, so let me first thank you all for memcache.

I have question and a code suggestion for ketama. (I trust that ketama
discussion is still appropriate on this list?)

The question is about the re-distribution of items when a server is
added/removed. As I understood it, the primary benefit of consistent
hashing is that, insofar as possible, items should not move from server
to server when servers come and go. I decided to confirm this behavior
with ketama, and found that, in fact, many items appear to move

Here is a quick-and-dirty test - just append to or run after
make test is complete:

echo "Checking to make sure removing or adding a server doesn't disturb

echo "  Setting up"
# set up downed server
head -n `wc -l ../ketama.servers | awk '{print $1-1}'` ../ketama.servers
> ../
DOWNEDSERVER=`tail -n 1 ../ketama.servers | awk '{print $1}'`
# set up new server
cp ../ketama.servers ../
echo -e "${NEWSERVER}\t300" >> ../

echo "  Running on original set of servers to get baseline"
touch -m ../ketama.servers # make sure we're working with the current
./ketama_test ../ketama.servers > original_mapping

echo "  Running on original set of servers minus one"
./ketama_test ../ > minus_one_mapping

echo -n "  Analyzing results..."
MINUS_ONE_MOVED_ITEMS=`diff -y --suppress-common-lines original_mapping
minus_one_mapping | grep -c -v "${DOWNEDSERVER}"`

  echo "OK no items disturbed"
  echo "WARNING ${MINUS_ONE_MOVED_ITEMS} items disturbed"

echo "  Running on original set of servers plus one"
./ketama_test ../ > plus_one_mapping

echo -n "  Analyzing results..."
PLUS_ONE_MOVED_ITEMS=`diff -y --suppress-common-lines original_mapping
plus_one_mapping | grep -c -v "${NEWSERVER}"`

if [ ${PLUS_ONE_MOVED_ITEMS} = 0 ]
  echo "OK no items disturbed"
  echo "WARNING ${PLUS_ONE_MOVED_ITEMS} items disturbed"

echo "  Cleaning up"
rm -rf ../ ../
original_mapping minus_one_mapping plus_one_mapping


What this does is simulate the removal or addition of a server and
compare the allocation of items to servers across the two server sets,
looking for mismatches, after removing items that were _forced_ to move
because they were on the server that went down or migrated to the server
that came up.

Running this test, 106615 out of 1000000 (approx 10%) items moved
unnecessarily when a server went down, and 40536 out of 1000000 (approx
4%) items moved unnecessarily when a new server came online.

I believe that I understand why this happens. Right now, the number of
points assigned per server is calculated to keep the mean
points-per-server near 160. This means that removing a server whose
weight (memory) is above or below the mean will cause the number of
points per server for other servers to change.

For example, if there are three servers A, B, and C, with respective
weights 100, 200, and 300, then the servers will get 80, 160, and 240
points respectively. If you remove server C, then server A will now get
106 points and server B will get 212 points.

When the number of points per server changes, then new points are being
added or removed from the continuum, which can cause items near those
points to be moved to different servers.

This will happen the most when there are relatively few servers and when
the server being added/removed is particularly above/below the mean.

Perhaps some level of unnecessary item movement is acceptable, and
perhaps the current levels fall within this threshold. If not, one
simple method for preventing such item movement is to simply fix the
number of points per server, perhaps using the provided weight directly
rather than re-calculating it based on what other servers are in the
mix. This will prevent any unnecessary item movement, but the downside
is that the user of ketama must understand what an acceptable range of
weights is, and what the impact of changing them is (namely, item

Are there perhaps other, better, methods for preventing item movement?

My code suggestion is just a very minor cleanup in ketama_get_server
that (I believe) leaves functionality completely unchanged. Proposed


Index: ketama.c
--- ketama.c    (revision 228)
+++ ketama.c    (working copy)
@@ -254,7 +254,7 @@
     unsigned int h = ketama_hashi( key );
     int highp = cont->numpoints;
     mcs (*mcsarr)[cont->numpoints] = cont->array;
-    int maxp = highp, lowp = 0, midp;
+    int lowp = 0, midp;
     unsigned int midval, midval1;
     // divide and conquer array search to find server with next biggest
@@ -262,13 +262,9 @@
     while ( 1 )
         midp = (int)( ( lowp+highp ) / 2 );
-        if ( midp == maxp )
-        {
-            if ( midp == cont->numpoints )
-                midp = 1; // if at the end, roll back to zeroth
+        if ( midp == cont->numpoints )
+            return &( (*mcsarr)[0] ); // if at the end, roll back to
-            return &( (*mcsarr)[midp-1] );
-        }
         midval = (*mcsarr)[midp].point;
         midval1 = midp == 0 ? 0 : (*mcsarr)[midp-1].point;


The reasoning is that, since maxp gets initialized to cont->numpoints
and never changed thereafter, maxp can be eliminated, the two if
statements can be collapsed, and the return value array position
hard-coded to 0.

I have confirmed that the allocations provided by this modified code are
identical to the original code for the output of ketama_test.


More information about the memcached mailing list