Busy loop and blocking threads on 1.2.5
chip
chip-set at mail.ru
Thu Apr 17 17:37:03 UTC 2008
Hello,
We have absolutely the same issue on 64bit linux with 8Gb of memory
(CentOS 5.1 x86_64)
memcached -m 2048 -c 1024 -d -P /var/run/memcached.pid -p 11211 -u nobody
libevent-1.1a
memcached-1.2.5 or 1.2.4
Kari Lavikka wrote:
>
> Hello,
>
> This is probably the same problem I reported last August:
> http://www.mail-archive.com/memcached@lists.danga.com/msg00978.html
>
> |\__/|
> ( oo ) Kari Lavikka - tuner at bdb.fi - (050) 380 3808
> __ooO( )Ooo_______ _____ ___ _ _ _ _ _ _ _
> ""
>
> On Thu, 3 Apr 2008, dormando wrote:
>
>> Hey,
>>
>> Thanks for the report!
>>
>> Is this a problem you can reproduce? If so, can you try 1.2.4? Or even
>> 1.2.2 ? All in threaded mode? Does it happen with 1.2.5 in
>> non-threaded mode?
>>
>> I can't recall anything else which would have touched that code
>> recently, so it's very odd that it breaks for you. Perhaps you just
>> found the magic combination :)
>>
>> I'd like to get a hold of the core file if that's not too much trouble.
>>
>> -Dormando
>>
>> Matti Savolainen wrote:
>>> Hi,
>>>
>>> This morning we switched to 1.2.5 memcached from an ancient version
>>> and running on 2 machines both X86_64 with 7000MB for memcached. The
>>> servers worked fine until one of them reached about 2.8GB~ of
>>> allocated memory. It went into a busy loop and other 3 threads were
>>> waiting for a mutex. Didn't accept anymore connections etc. Heres
>>> some info on the systems,versions and GDB stuff.
>>>
>>> Running RHEL with 2.6.18. libevent is 1.4.2-rc, memcached 1.2.5 release.
>>> Started with:
>>> -m 7000 -c 8192 -d -P /var/run/memcached.pid -p 11211 -u memcached
>>>
>>> Attached GDB and this was the state it was in:
>>> Program terminated with signal 5, Trace/breakpoint trap.
>>> #0 0x0000003f5000c888 in __lll_mutex_lock_wait () from
>>> /lib64/libpthread.so.0
>>> (gdb) info threads
>>> 4 process 2258 0x0000003f5000c888 in __lll_mutex_lock_wait () from
>>> /lib64/libpthread.so.0
>>> 3 process 2261 0x0000003f5000c888 in __lll_mutex_lock_wait () from
>>> /lib64/libpthread.so.0
>>> 2 process 2260 assoc_find (key=0x1bab05c4
>>> "image_commentstream_76966976", nkey=28) at assoc.c:506
>>> * 1 process 2259 0x0000003f5000c888 in __lll_mutex_lock_wait () from
>>> /lib64/libpthread.so.0
>>> (gdb) thread 2
>>> [Switching to thread 2 (process 2260)]#0 assoc_find (key=0x1bab05c4
>>> "image_commentstream_76966976", nkey=28) at assoc.c:506
>>> 506 it = it->h_next;
>>> (gdb) l
>>> 501 while (it) {
>>> 502 if ((nkey == it->nkey) &&
>>> 503 (memcmp(key, ITEM_key(it), nkey) == 0)) {
>>> 504 return it;
>>> 505 }
>>> 506 it = it->h_next;
>>> 507 }
>>> 508 return 0;
>>> 509 }
>>> 510
>>> (gdb) print *it
>>> $1 = {next = 0x2aab1571a300, prev = 0x2aaac57ba6f0, h_next =
>>> 0x2aaac57ba688, time = 1211888, exptime = 1380104, nbytes = 8,
>>> refcount = 0,
>>> nsuffix = 6 '\006', it_flags = 1 '\001', slabs_clsid = 1 '\001',
>>> nkey = 14 '\016', cas_id = 1643955, end = 0x2aaac57ba6c0}
>>> (gdb) print it
>>> $2 = (item *) 0x2aaac57ba688
>>> (gdb) bt
>>> #0 assoc_find (key=0x1bab05c4 "image_commentstream_76966976",
>>> nkey=28) at assoc.c:506
>>> #1 0x0000000000407bce in do_item_get_notedeleted (key=0x3f583623 "",
>>> nkey=3830567580, delete_locked=0x11bac5) at items.c:416
>>> #2 0x0000000000408f84 in mt_item_get_notedeleted (key=0x1bab05c4
>>> "image_commentstream_76966976", nkey=28, delete_locked=0x0) at
>>> thread.c:426
>>> #3 0x0000000000403955 in process_get_command (c=0x1b2e58e0,
>>> tokens=0x41400f20, ntokens=1161925, return_cas=false) at
>>> memcached.c:1241
>>> #4 0x0000000000405c57 in try_read_command (c=0x1b2e58e0) at
>>> memcached.c:1671
>>> #5 0x0000000000406679 in event_handler (fd=<value optimized out>,
>>> which=-11620, arg=0x1b2e58e0) at memcached.c:2135
>>> #6 0x00002aaaaaac3848 in event_base_loop (base=0x1aad8250, flags=0)
>>> at event.c:366
>>> #7 0x0000000000408b34 in worker_libevent (arg=0x1aa9fd40) at
>>> thread.c:324
>>> #8 0x0000003f500062f7 in start_thread () from /lib64/libpthread.so.0
>>> #9 0x0000003f4f4ce85d in clone () from /lib64/libc.so.6
>>>
>>> Heres what the other threads are waiting:
>>> (gdb) thread 1
>>> [Switching to thread 1 (process 2259)]#0 0x0000003f5000c888 in
>>> __lll_mutex_lock_wait () from /lib64/libpthread.so.0
>>> (gdb) bt
>>> #0 0x0000003f5000c888 in __lll_mutex_lock_wait () from
>>> /lib64/libpthread.so.0
>>> #1 0x0000003f500088b5 in _L_mutex_lock_107 () from
>>> /lib64/libpthread.so.0
>>> #2 0x0000003f50008343 in pthread_mutex_lock () from
>>> /lib64/libpthread.so.0
>>> #3 0x0000000000408f76 in mt_item_get_notedeleted (key=0x2aaabb696fe4
>>> "image_61698941", nkey=0, delete_locked=0x3f5000f818) at thread.c:425
>>> #4 0x0000000000403955 in process_get_command (c=0x1b2b6470,
>>> tokens=0x409fff20, ntokens=271925180440, return_cas=false) at
>>> memcached.c:1241
>>> #5 0x0000000000405c57 in try_read_command (c=0x1b2b6470) at
>>> memcached.c:1671
>>> #6 0x0000000000406679 in event_handler (fd=<value optimized out>,
>>> which=0, arg=0x1b2b6470) at memcached.c:2135
>>> #7 0x00002aaaaaac3848 in event_base_loop (base=0x1aa9ff50, flags=0)
>>> at event.c:366
>>> #8 0x0000000000408b34 in worker_libevent (arg=0x1aa9fc40) at
>>> thread.c:324
>>> #9 0x0000003f500062f7 in start_thread () from /lib64/libpthread.so.0
>>> #10 0x0000003f4f4ce85d in clone () from /lib64/libc.so.6
>>>
>>>
>>> As you can see the it->h_next is pointing to the same place as it
>>> thus its looping forever. I made a core of it so I can get additional
>>> info as needed.
>>>
>>> Cheers
>>> --
>>> Matti Savolainen
>>> Software Developer
>>> Sulake Dynamoid oy
>>>
>>
More information about the memcached
mailing list