memcached (1.2.5) stuck in infinite loop

Toru Maesaka tmaesaka at gmail.com
Thu Apr 10 00:16:49 UTC 2008


Hey Miguel,

We were discussing this problem yesterday in IRC and its probably
something that will be looked at the hackathon next week.

Do you have a test case for this by any chance? I don't :(

This problem seems to occur in both single and multi thread build and I'm
suspecting that the problem could be in the table expansion logic
though I can't say for sure (well, tracing h_next assignments kinda makes
you suspect it).

I've modified one of our memcached servers to allocate enough buckets
relative to the number of records that we wont exceed so that the expansion
logic wont be triggered (just to see if it'll hang after a week or so
of operation).

Its been about four days and so far it hasn't spewed up.

Dormando: I've been informed that we're spinning in assoc_find as well
last night :(

Regards,
Toru

On Thu, Apr 10, 2008 at 3:30 AM, Miguel DeAvila
<miguel.j.deavila at gmail.com> wrote:
> We're seeing at least one memcache server (out of 12) get stuck in an
>  infinite loop each day.
>
>  The symptom is in assoc_find(...) in assoc.c, in the loop that starts on line 501.
>  The list of items has a loop, so the function never finds what it is looking
>  for, and never exits.
>
>  Here's a gdb trace of a hung server (we're using the single-threaded server),
>
>  [root at memcache13 ~]# gdb /usr/local/bin/memcached 11113
>
>  Attaching to program: /usr/local/bin/memcached, process 11113
>  Reading symbols from /usr/local/lib/libevent-1.3e.so.1...done.
>  Loaded symbols for /usr/local/lib/libevent-1.3e.so.1
>  Reading symbols from /lib64/libc.so.6...done.
>  Loaded symbols for /lib64/libc.so.6
>  Reading symbols from /lib64/libnsl.so.1...done.
>  Loaded symbols for /lib64/libnsl.so.1
>  Reading symbols from /lib64/librt.so.1...done.
>  Loaded symbols for /lib64/librt.so.1
>  Reading symbols from /lib64/libresolv.so.2...done.
>  Loaded symbols for /lib64/libresolv.so.2
>  Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
>  Loaded symbols for /lib64/ld-linux-x86-64.so.2
>  Reading symbols from /lib64/libpthread.so.0...done.
>  [Thread debugging using libthread_db enabled]
>  [New Thread 46912512725488 (LWP 11113)]
>  Loaded symbols for /lib64/libpthread.so.0
>  Reading symbols from /lib64/libnss_files.so.2...done.
>  Loaded symbols for /lib64/libnss_files.so.2
>
>  [ ... Here's the stack ....]
>
>  (gdb) bt
>  #0  0x0000000000407eee in assoc_find (key=0x2aaaaec12be8 "017_52956_net.flixster.entity.User%23813936667", nkey=46) at assoc.c:502
>  #1  0x000000000040770e in do_item_get_notedeleted (key=0x640f0ec6 ";18095617;18096235;18096842;sr", nkey=480752147,
>     delete_locked=0x1655e5) at items.c:416
>  #2  0x00000000004031e1 in do_store_item (it=0x2aaaaec12bb0, comm=2) at memcached.c:809
>  #3  0x0000000000406574 in event_handler (fd=<value optimized out>, which=<value optimized out>, arg=0x1c48fef0) at memcached.c:785
>  #4  0x00002aaaaaccc6e9 in event_base_loop (base=0x1bf102e0, flags=0) at event.c:331
>  #5  0x0000000000404688 in main (argc=11113, argv=<value optimized out>) at memcached.c:3130
>
>  [ ... The process is stuck in the loop starting on line 501 ]
>
>  0x0000000000407eee in assoc_find (key=0x2aaaaec12be8 "017_52956_net.flixster.entity.User%23813936667", nkey=46) at assoc.c:502
>  502             if ((nkey == it->nkey) &&
>  (gdb) list
>  497         } else {
>  498             it = primary_hashtable[hv & hashmask(hashpower)];
>  499         }
>  500
>  501         while (it) {
>  502             if ((nkey == it->nkey) &&
>  503                 (memcmp(key, ITEM_key(it), nkey) == 0)) {
>  504                 return it;
>  505             }
>  506             it = it->h_next;
>  507         }
>  508      return 0;
>  509 }
>
>  [ ... Walk through the loop, watching the 'it' pointer and printing its address at each iteration ... ]
>
>  (gdb) watch it
>  Watchpoint 1: it
>  (gdb) commands 1
>  Type commands for when breakpoint 1 is hit, one per line.
>  End with a line saying just "end".
>  >print it
>  >end
>  (gdb) continue
>  Continuing.
>  Watchpoint 1: it
>
>  [ ... Note the old value, 0x513a3190, it's going to come around again ... ]
>
>  Old value = (item *) 0x513a3190
>  New value = (item *) 0x7de6ce98
>  assoc_find (key=0x2aaaaec12be8 "017_52956_net.flixster.entity.User%23813936667", nkey=46) at assoc.c:501
>  501         while (it) {
>  $7 = (item *) 0x7de6ce98
>  (gdb)
>  Continuing.
>  Watchpoint 1: it
>
>  Old value = (item *) 0x7de6ce98
>  New value = (item *) 0x5c568cd8
>  assoc_find (key=0x2aaaaec12be8 "017_52956_net.flixster.entity.User%23813936667", nkey=46) at assoc.c:501
>  501         while (it) {
>  $8 = (item *) 0x5c568cd8
>  (gdb)
>  Continuing.
>
>  [ ... Here it is ... we're re-visting the item at 0x513a3190. We're stuck! ... ]
>
>  Watchpoint 1: it
>
>  Old value = (item *) 0x5c568cd8
>  New value = (item *) 0x513a3190
>  assoc_find (key=0x2aaaaec12be8 "017_52956_net.flixster.entity.User%23813936667", nkey=46) at assoc.c:501
>  501         while (it) {
>
>  (I also have a core file.)
>
>  I believe this is the same problem mentioned a few times recently on the list,
>
>         http://www.nabble.com/Busy-loop-and-blocking-threads-on-1.2.5-to16470756.html
>         http://www.mail-archive.com/memcached@lists.danga.com/msg00978.html
>
>  Any suggestions?
>
>  thanks,
>
>  Miguel
>


More information about the memcached mailing list