Another happy memcached user

Troy Hakala troy@recipezaar.com
Sat, 26 Jun 2004 15:51:14 -0700


--Apple-Mail-4-461408260
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
	charset=US-ASCII;
	format=flowed

We're currently running memcached on Recipezaar.com to store recipe 
data and member data. It's working well so we plan to use it more 
extensively. We used to use a file-based caching system (over NFS) so 
it was trivial to modify our API to use memcached instead. We still use 
our own shared-memory scheme to store some data across processes on 
each web server but may just use memcached for that too.

Technical details:

We're running linux kernel 2.6.6, memcached 1.1.11 and using the PHP 
PECL code with our own PHP API on top of that that does the 
distribution between the memcached servers. We have a 3GB cache 
distributed across 3 machines (1GB each) which is big enough for our 
use right now but will probably be expanding this as we use memcached 
more. :-)

We also use Frank Precissi's Nagios plugin (found here: 
http://lists.danga.com/pipermail/memcached/2003-September/000235.html) 
to monitor the daemons on the servers which works great. I had to 
modify the plugin since it didn't come with the utils.pm that provided 
some support functions and I also added a couple stats (uptime and 
percent of RAM used) to the output so Nagios will display it (I love 
statistics:-). I attached the modified version.

I did some testing of the hash functions to evenly distribute the load 
across the memcached servers. In the PHP and Perl APIs there are two 
different hash functions used:

PHP API by Ryan Gilfether hash function:

sprintf("%u",crc32($key));

The PHPCA hash function (this is a PHP port of the Perl API function):

function hashfunc ($key)
{
    $hash = 0;
    for ($i=0; $i<strlen($key); $i++)
    {
       $hash = $hash*33 + ord($key[$i]);
    }

    return $hash;
}

I didn't get good bucket distributions with either of these, so I use 
this:

(crc32($key)>>16)&0x7fff;

which is then modulo'd with the number of buckets. For example, with a 
randomly-chosen alphabetic key, the sprintf version consistently 
generates a distribution like this (numbers are rounded):

0 10%
1 9%
2 60%
3 9%
4 9%

In my tests, bucket #2 gets far more of the distribution than the other 
buckets regardless of the number of buckets. But with the shift>>16 
bits version I consistently see more evenly distributed results:

0 19%
1 19%
2 20%
3 20%
4 20%

The PHPCA/Perl hash function seems to be worse than the sprintf 
version. YMMV with the keys you use.

And speed-wise, the shift>>16 version is the fastest of the 3 in my 
tests. For example, the shift>>16 function, the sprintf function and 
the PHPCA/Perl function take, when hashing 100,000 keys, 1.04 seconds, 
2.69 seconds and 131.30 (!) seconds, respectively. Anyway, it's food 
for thought I guess.  If anyone's interested in running these tests 
themselves, the PHP code is attached.

Thanks to everyone involved for memcached! :-)



--Apple-Mail-4-461408260
Content-Transfer-Encoding: 7bit
Content-Type: application/octet-stream;
	x-unix-mode=0700;
	name="check_memcached"
Content-Disposition: attachment;
	filename=check_memcached

#!/usr/bin/perl

use strict;
use IO::Socket;
use Getopt::Long;
use vars qw($opt_V $opt_h $opt_i $opt_p $PROGNAME);
use lib "/usr/lib/netsaint/plugins";
#use utils qw(%ERRORS &print_revision &support &usage);

sub print_revision {
	print $_[0]." ".$_[1]."\n";
}

sub usage {
	print $_[0];
	die;
}

my %ERRORS=(
	"OK" => 0,
	"WARNING" => 1,
	"CRITICAL" => 2,
	"UNKNOWN" => 3,
);


$PROGNAME = "check_memcached";

sub print_help ();
sub print_usage ();

$ENV{'PATH'}='';
$ENV{'BASH_ENV'}='';
$ENV{'ENV'}='';

if (!@ARGV) {
        print_usage();
        exit;
}

Getopt::Long::Configure('bundling');
GetOptions
        ("V"   => \$opt_V, "version"    => \$opt_V,
         "h"   => \$opt_h, "help"       => \$opt_h,
         "i=s" => \$opt_i, "ip=s"       => \$opt_i,
         "p=s" => \$opt_p, "port=s"     => \$opt_p);

if ($opt_V) {
        print_revision($PROGNAME,'$Revision: 0.1 $');
        exit $ERRORS{'OK'};
}

if ($opt_h) {print_help(); exit $ERRORS{'OK'};}

($opt_i) || usage("IP not specified\n");
my $ip =  $opt_i;

($opt_p) || usage("Port not specified\n");
my $port = $opt_p;

my $memcached = IO::Socket::INET->new( PeerAddr => $ip,
                                       PeerPort => $port,
                                       Proto    => 'tcp');
if (!$memcached) { 
	print "memcached $ip:$port CRITICAL";
	exit $ERRORS{'CRITICAL'}; 
}
                

my %stats;

$memcached->print("stats\r\n");
$memcached->flush;
while (defined( my $line = $memcached->getline)){
        if ($line =~ /STAT/) {
		if ($' =~ /\s+(.+)\s+(\d+)/) {
			$stats{$1}=$2;
		}
	} elsif ($line =~ /END/) {
                print "memcached $ip:$port OK";
		print " Uptime:".($stats{'uptime'})." Space:".int((($stats{'bytes'}/$stats{'limit_maxbytes'})*100))."%\n";
                exit $ERRORS{'OK'};
        } else {
                print "memcached $ip:$port CRITICAL";
                exit $ERRORS{'CRITICAL'};
        }
}

sub print_usage () {
        print "Usage: $PROGNAME -i <ip> -p <port>\n";
}

sub print_help () {
        print_revision($PROGNAME,'$Revision: 0.1 $');
        print "This plugin reports the status of a memcached server.

";
        print_usage();
        print "
-i, --ip=IP
   IP address of host to check
-p, --port=port
   Port that memcached is listening on
";
        #support();
}


--Apple-Mail-4-461408260
Content-Transfer-Encoding: 7bit
Content-Type: application/octet-stream;
	x-unix-mode=0700;
	name="hash_distr.php"
Content-Disposition: attachment;
	filename=hash_distr.php

<?
function hashfunc ($key)
{
	$hash = 0;
	for ($i=0; $i<strlen($key); $i++)
	{
		$hash = $hash*33 + ord($key[$i]);
	}
	
	return $hash;
}

define('NUM_BUCKETS',5);
$odds=array_fill(0,NUM_BUCKETS,0);
$alpha="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
for ($i=10000;$i;--$i) {
	$key=str_shuffle($alpha);
	$hv=(crc32($key)>>16)&0x7fff;
	//$hv=sprintf("%u",crc32($key));
	//$hv=hashfunc($key);
	$bucket=$hv%NUM_BUCKETS;
	$odds[$bucket]++;
}
$total=0;
for ($i=0,$j=count($odds);$i<$j;++$i) { $total+=$odds[$i]; }
for ($i=0,$j=count($odds);$i<$j;++$i) {
	$pct[$i]=(int)(($odds[$i]/$total)*100);
}
for ($i=0,$j=count($odds);$i<$j;++$i) {
	print "$i {$pct[$i]}%\n";
}
?>

--Apple-Mail-4-461408260
Content-Transfer-Encoding: 7bit
Content-Type: application/octet-stream;
	x-unix-mode=0700;
	name="hash_speeds.php"
Content-Disposition: attachment;
	filename=hash_speeds.php

<?
function microtime_diff($a, $b) {
	list($a_dec, $a_sec) = explode(" ", $a);
	list($b_dec, $b_sec) = explode(" ", $b);
	return $b_sec - $a_sec + $b_dec - $a_dec;
}
function print_page_speed(&$times) {
	print sprintf("%0.2f",microtime_diff($times[0],$times[count($times)-1]))." seconds\n";
}

$key="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
// Time the first hash function
$stopwatch[]=microtime();
for ($i=100000;$i;--$i) { $hv=(crc32($key)>>16)&0x7fff; }
$stopwatch[]=microtime();
print_page_speed($stopwatch);

// Time the second hash function
$stopwatch=array();
$stopwatch[]=microtime();
for ($i=100000;$i;--$i) { $hv=sprintf("%u",crc32($key)); }
$stopwatch[]=microtime();
print_page_speed($stopwatch);


function hashfunc ($key)
{
	$hash = 0;
	for ($i=0; $i<strlen($key); $i++)
	{
		$hash = $hash*33 + ord($key[$i]);
	}
	
	return $hash;
}

// Time the third hash function
$stopwatch=array();
$stopwatch[]=microtime();
for ($i=100000;$i;--$i) { $hv=hashfunc($key); }
$stopwatch[]=microtime();
print_page_speed($stopwatch);

?>

--Apple-Mail-4-461408260
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
	charset=US-ASCII;
	format=flowed




Troy Hakala
Recipezaar.com

--Apple-Mail-4-461408260--