memcached (1.2.5) stuck in infinite loop

Miguel DeAvila miguel.j.deavila at gmail.com
Wed Apr 9 18:30:32 UTC 2008


We're seeing at least one memcache server (out of 12) get stuck in an
infinite loop each day.

The symptom is in assoc_find(...) in assoc.c, in the loop that starts on line 501.
The list of items has a loop, so the function never finds what it is looking
for, and never exits.

Here's a gdb trace of a hung server (we're using the single-threaded server),

[root at memcache13 ~]# gdb /usr/local/bin/memcached 11113

Attaching to program: /usr/local/bin/memcached, process 11113
Reading symbols from /usr/local/lib/libevent-1.3e.so.1...done.
Loaded symbols for /usr/local/lib/libevent-1.3e.so.1
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/libnsl.so.1...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /lib64/librt.so.1...done.
Loaded symbols for /lib64/librt.so.1
Reading symbols from /lib64/libresolv.so.2...done.
Loaded symbols for /lib64/libresolv.so.2
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libpthread.so.0...done.
[Thread debugging using libthread_db enabled]
[New Thread 46912512725488 (LWP 11113)]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libnss_files.so.2...done.
Loaded symbols for /lib64/libnss_files.so.2

[ ... Here's the stack ....]

(gdb) bt
#0  0x0000000000407eee in assoc_find (key=0x2aaaaec12be8 "017_52956_net.flixster.entity.User%23813936667", nkey=46) at assoc.c:502
#1  0x000000000040770e in do_item_get_notedeleted (key=0x640f0ec6 ";18095617;18096235;18096842;sr", nkey=480752147, 
    delete_locked=0x1655e5) at items.c:416
#2  0x00000000004031e1 in do_store_item (it=0x2aaaaec12bb0, comm=2) at memcached.c:809
#3  0x0000000000406574 in event_handler (fd=<value optimized out>, which=<value optimized out>, arg=0x1c48fef0) at memcached.c:785
#4  0x00002aaaaaccc6e9 in event_base_loop (base=0x1bf102e0, flags=0) at event.c:331
#5  0x0000000000404688 in main (argc=11113, argv=<value optimized out>) at memcached.c:3130

[ ... The process is stuck in the loop starting on line 501 ] 

0x0000000000407eee in assoc_find (key=0x2aaaaec12be8 "017_52956_net.flixster.entity.User%23813936667", nkey=46) at assoc.c:502
502	        if ((nkey == it->nkey) &&
(gdb) list
497	    } else {
498	        it = primary_hashtable[hv & hashmask(hashpower)];
499	    }
500	
501	    while (it) {
502	        if ((nkey == it->nkey) &&
503	            (memcmp(key, ITEM_key(it), nkey) == 0)) {
504	            return it;
505	        }
506	        it = it->h_next;
507	    }
508      return 0;
509 }

[ ... Walk through the loop, watching the 'it' pointer and printing its address at each iteration ... ]

(gdb) watch it
Watchpoint 1: it
(gdb) commands 1
Type commands for when breakpoint 1 is hit, one per line.
End with a line saying just "end".
>print it
>end
(gdb) continue
Continuing.
Watchpoint 1: it

[ ... Note the old value, 0x513a3190, it's going to come around again ... ]

Old value = (item *) 0x513a3190
New value = (item *) 0x7de6ce98
assoc_find (key=0x2aaaaec12be8 "017_52956_net.flixster.entity.User%23813936667", nkey=46) at assoc.c:501
501	    while (it) {
$7 = (item *) 0x7de6ce98
(gdb) 
Continuing.
Watchpoint 1: it

Old value = (item *) 0x7de6ce98
New value = (item *) 0x5c568cd8
assoc_find (key=0x2aaaaec12be8 "017_52956_net.flixster.entity.User%23813936667", nkey=46) at assoc.c:501
501	    while (it) {
$8 = (item *) 0x5c568cd8
(gdb) 
Continuing.

[ ... Here it is ... we're re-visting the item at 0x513a3190. We're stuck! ... ]

Watchpoint 1: it

Old value = (item *) 0x5c568cd8
New value = (item *) 0x513a3190
assoc_find (key=0x2aaaaec12be8 "017_52956_net.flixster.entity.User%23813936667", nkey=46) at assoc.c:501
501	    while (it) {

(I also have a core file.)

I believe this is the same problem mentioned a few times recently on the list,

	http://www.nabble.com/Busy-loop-and-blocking-threads-on-1.2.5-to16470756.html
	http://www.mail-archive.com/memcached@lists.danga.com/msg00978.html

Any suggestions?

thanks,

Miguel


More information about the memcached mailing list