memcached 1.2.2 core dump

Sun Nov 18 07:40:44 UTC 2007

Thanks!

I'll compile from trunk later this week, and try to replicate the  
previous behavior.

Unfortunately, unless I cause the coredump in load testing, I won't be  
able to provide the dumpfiles to anyone, as they contain some private  
user information.

I'll keep you posted with my testing results, including my config  
options etc.

Jeremy

On Nov 17, 2007, at 4:10 PM, dormando wrote:

> Hey,
>
> Good news!
>
> The stats commands aren't thread safe in 1.2.2.
>
> Since r589 in SVN they have been thread safe.
>
> I'd ask if you folks could attempt using trunk on one server and see  
> if
> that fixes your issue. We'll have 1.2.4 out the door soon anyway, but
> trunk could use more testing.
>
> -Dormando
>
> Jeremy LaTrasse wrote:
>> We're running memcached 1.2.2 with libevent 1.3 on OpenSolaris 11.
>>
>> We recently changed some of the ways that our application interacts  
>> with
>> memcached, and then very suddenly afterward started experiencing core
>> dumps across our 32 instances of memcached, seemingly arbitrarily.
>>
>> We captured the core files and discovered that they were all  
>> generated
>> when a request for 'stats sizes' was issued by our monitoring  
>> processes.
>>
>> One of the engineers here postulates the following:
>>
>>
>>    SEGFAULT on line 341 of items.c:
>>
>>        /* build the histogram */
>>        memset(histogram, 0, (size_t)num_buckets * sizeof(int ));
>>        for (i = 0; i < LARGEST_ID; i++) {
>>            item *iter = heads[i];
>>            while (iter) {
>>                int ntotal = ITEM_ntotal(iter);
>>                int bucket = ntotal / 32;
>>                if ((ntotal % 32) != 0) bucket++;
>>                if (bucket < num_buckets) histogram[bucket]++;
>>                iter = iter->next;
>>            }
>>        }
>>
>>    That's:
>>
>>                int ntotal = ITEM_ntotal(iter);
>>
>>
>>    Given the huge amount of transactions we're doing, we're probably
>>    hitting a race condition around moving items from one bucket to  
>> the
>>    other.  Perhaps a mutex lock is not being set properly
>>
>>
>> For the time being we've disabled the 'stats sizes' request from our
>> monitoring processes to preclude this situation.
>>
>> I could not find this to be a known issue in previous messages on  
>> this
>> list, but I am certain that someone will end up in this scenario.
>>
>> I can send the core files or gdb output to anyone interested in
>> addressing this.
>>
>> Jeremy LaTrasse
>> Operations
>> Twitter
>