memcached 1.2.2 core dump

Sat Nov 17 03:52:38 UTC 2007

We're running memcached 1.2.2 with libevent 1.3 on OpenSolaris 11.

We recently changed some of the ways that our application interacts with
memcached, and then very suddenly afterward started experiencing core dumps
across our 32 instances of memcached, seemingly arbitrarily.

We captured the core files and discovered that they were all generated when
a request for 'stats sizes' was issued by our monitoring processes.

One of the engineers here postulates the following:

SEGFAULT on line 341 of items.c:

    /* build the histogram */
    memset(histogram, 0, (size_t)num_buckets * sizeof(int ));
    for (i = 0; i < LARGEST_ID; i++) {
        item *iter = heads[i];
        while (iter) {
            int ntotal = ITEM_ntotal(iter);
            int bucket = ntotal / 32;
            if ((ntotal % 32) != 0) bucket++;
            if (bucket < num_buckets) histogram[bucket]++;
            iter = iter->next;
        }
    }

That's:

            int ntotal = ITEM_ntotal(iter);

Given the huge amount of transactions we're doing, we're probably hitting a
race condition around moving items from one bucket to the other.  Perhaps a
mutex lock is not being set properly

For the time being we've disabled the 'stats sizes' request from our
monitoring processes to preclude this situation.

I could not find this to be a known issue in previous messages on this list,
but I am certain that someone will end up in this scenario.

I can send the core files or gdb output to anyone interested in addressing
this.

Jeremy LaTrasse
Operations
Twitter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.danga.com/pipermail/memcached/attachments/20071116/6b333222/attachment.htm