Busy loop and blocking threads on 1.2.5

Kari Lavikka tuner at bdb.fi
Thu Apr 3 19:10:19 UTC 2008


Hello,

This is probably the same problem I reported last August:
http://www.mail-archive.com/memcached@lists.danga.com/msg00978.html

     |\__/|
     ( oo )    Kari Lavikka - tuner at bdb.fi - (050) 380 3808
__ooO(  )Ooo_______ _____ ___ _ _  _   _    _      _                  _
       ""

On Thu, 3 Apr 2008, dormando wrote:

> Hey,
>
> Thanks for the report!
>
> Is this a problem you can reproduce? If so, can you try 1.2.4? Or even 1.2.2 
> ? All in threaded mode? Does it happen with 1.2.5 in non-threaded mode?
>
> I can't recall anything else which would have touched that code recently, so 
> it's very odd that it breaks for you. Perhaps you just found the magic 
> combination :)
>
> I'd like to get a hold of the core file if that's not too much trouble.
>
> -Dormando
>
> Matti Savolainen wrote:
>> Hi,
>> 
>> This morning we switched to 1.2.5 memcached from an ancient version and 
>> running on 2 machines both X86_64 with 7000MB for memcached. The servers 
>> worked fine until one of them reached about 2.8GB~ of allocated memory. It 
>> went into a busy loop and other 3 threads were waiting for a mutex. Didn't 
>> accept anymore connections etc. Heres some info on the systems,versions and 
>> GDB stuff.
>> 
>> Running RHEL with 2.6.18. libevent is 1.4.2-rc, memcached 1.2.5 release.
>> Started with:
>> -m 7000 -c 8192 -d -P /var/run/memcached.pid -p 11211 -u memcached
>> 
>> Attached GDB and this was the state it was in:
>> Program terminated with signal 5, Trace/breakpoint trap.
>> #0  0x0000003f5000c888 in __lll_mutex_lock_wait () from 
>> /lib64/libpthread.so.0
>> (gdb) info threads
>>  4 process 2258  0x0000003f5000c888 in __lll_mutex_lock_wait () from 
>> /lib64/libpthread.so.0
>>  3 process 2261  0x0000003f5000c888 in __lll_mutex_lock_wait () from 
>> /lib64/libpthread.so.0
>>  2 process 2260  assoc_find (key=0x1bab05c4 "image_commentstream_76966976", 
>> nkey=28) at assoc.c:506
>> * 1 process 2259  0x0000003f5000c888 in __lll_mutex_lock_wait () from 
>> /lib64/libpthread.so.0
>> (gdb) thread 2
>> [Switching to thread 2 (process 2260)]#0  assoc_find (key=0x1bab05c4 
>> "image_commentstream_76966976", nkey=28) at assoc.c:506
>> 506             it = it->h_next;
>> (gdb) l
>> 501         while (it) {
>> 502             if ((nkey == it->nkey) &&
>> 503                 (memcmp(key, ITEM_key(it), nkey) == 0)) {
>> 504                 return it;
>> 505             }
>> 506             it = it->h_next;
>> 507         }
>> 508         return 0;
>> 509     }
>> 510
>> (gdb) print *it
>> $1 = {next = 0x2aab1571a300, prev = 0x2aaac57ba6f0, h_next = 
>> 0x2aaac57ba688, time = 1211888, exptime = 1380104, nbytes = 8, refcount = 
>> 0,
>>  nsuffix = 6 '\006', it_flags = 1 '\001', slabs_clsid = 1 '\001', nkey = 14 
>> '\016', cas_id = 1643955, end = 0x2aaac57ba6c0}
>> (gdb) print it
>> $2 = (item *) 0x2aaac57ba688
>> (gdb) bt
>> #0  assoc_find (key=0x1bab05c4 "image_commentstream_76966976", nkey=28) at 
>> assoc.c:506
>> #1  0x0000000000407bce in do_item_get_notedeleted (key=0x3f583623 "", 
>> nkey=3830567580, delete_locked=0x11bac5) at items.c:416
>> #2  0x0000000000408f84 in mt_item_get_notedeleted (key=0x1bab05c4 
>> "image_commentstream_76966976", nkey=28, delete_locked=0x0) at thread.c:426
>> #3  0x0000000000403955 in process_get_command (c=0x1b2e58e0, 
>> tokens=0x41400f20, ntokens=1161925, return_cas=false) at memcached.c:1241
>> #4  0x0000000000405c57 in try_read_command (c=0x1b2e58e0) at 
>> memcached.c:1671
>> #5  0x0000000000406679 in event_handler (fd=<value optimized out>, 
>> which=-11620, arg=0x1b2e58e0) at memcached.c:2135
>> #6  0x00002aaaaaac3848 in event_base_loop (base=0x1aad8250, flags=0) at 
>> event.c:366
>> #7  0x0000000000408b34 in worker_libevent (arg=0x1aa9fd40) at thread.c:324
>> #8  0x0000003f500062f7 in start_thread () from /lib64/libpthread.so.0
>> #9  0x0000003f4f4ce85d in clone () from /lib64/libc.so.6
>> 
>> Heres what the other threads are waiting:
>> (gdb) thread 1
>> [Switching to thread 1 (process 2259)]#0  0x0000003f5000c888 in 
>> __lll_mutex_lock_wait () from /lib64/libpthread.so.0
>> (gdb) bt
>> #0  0x0000003f5000c888 in __lll_mutex_lock_wait () from 
>> /lib64/libpthread.so.0
>> #1  0x0000003f500088b5 in _L_mutex_lock_107 () from /lib64/libpthread.so.0
>> #2  0x0000003f50008343 in pthread_mutex_lock () from /lib64/libpthread.so.0
>> #3  0x0000000000408f76 in mt_item_get_notedeleted (key=0x2aaabb696fe4 
>> "image_61698941", nkey=0, delete_locked=0x3f5000f818) at thread.c:425
>> #4  0x0000000000403955 in process_get_command (c=0x1b2b6470, 
>> tokens=0x409fff20, ntokens=271925180440, return_cas=false) at 
>> memcached.c:1241
>> #5  0x0000000000405c57 in try_read_command (c=0x1b2b6470) at 
>> memcached.c:1671
>> #6  0x0000000000406679 in event_handler (fd=<value optimized out>, which=0, 
>> arg=0x1b2b6470) at memcached.c:2135
>> #7  0x00002aaaaaac3848 in event_base_loop (base=0x1aa9ff50, flags=0) at 
>> event.c:366
>> #8  0x0000000000408b34 in worker_libevent (arg=0x1aa9fc40) at thread.c:324
>> #9  0x0000003f500062f7 in start_thread () from /lib64/libpthread.so.0
>> #10 0x0000003f4f4ce85d in clone () from /lib64/libc.so.6
>> 
>> 
>> As you can see the it->h_next is pointing to the same place as it thus its 
>> looping forever. I made a core of it so I can get additional info as 
>> needed.
>> 
>> Cheers
>> -- 
>> Matti Savolainen
>> Software Developer
>> Sulake Dynamoid oy
>> 
>


More information about the memcached mailing list