memcached (1.2.5) stuck in infinite loop
Miguel DeAvila
miguel.j.deavila at gmail.com
Wed Apr 9 18:30:32 UTC 2008
We're seeing at least one memcache server (out of 12) get stuck in an
infinite loop each day.
The symptom is in assoc_find(...) in assoc.c, in the loop that starts on line 501.
The list of items has a loop, so the function never finds what it is looking
for, and never exits.
Here's a gdb trace of a hung server (we're using the single-threaded server),
[root at memcache13 ~]# gdb /usr/local/bin/memcached 11113
Attaching to program: /usr/local/bin/memcached, process 11113
Reading symbols from /usr/local/lib/libevent-1.3e.so.1...done.
Loaded symbols for /usr/local/lib/libevent-1.3e.so.1
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/libnsl.so.1...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /lib64/librt.so.1...done.
Loaded symbols for /lib64/librt.so.1
Reading symbols from /lib64/libresolv.so.2...done.
Loaded symbols for /lib64/libresolv.so.2
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libpthread.so.0...done.
[Thread debugging using libthread_db enabled]
[New Thread 46912512725488 (LWP 11113)]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libnss_files.so.2...done.
Loaded symbols for /lib64/libnss_files.so.2
[ ... Here's the stack ....]
(gdb) bt
#0 0x0000000000407eee in assoc_find (key=0x2aaaaec12be8 "017_52956_net.flixster.entity.User%23813936667", nkey=46) at assoc.c:502
#1 0x000000000040770e in do_item_get_notedeleted (key=0x640f0ec6 ";18095617;18096235;18096842;sr", nkey=480752147,
delete_locked=0x1655e5) at items.c:416
#2 0x00000000004031e1 in do_store_item (it=0x2aaaaec12bb0, comm=2) at memcached.c:809
#3 0x0000000000406574 in event_handler (fd=<value optimized out>, which=<value optimized out>, arg=0x1c48fef0) at memcached.c:785
#4 0x00002aaaaaccc6e9 in event_base_loop (base=0x1bf102e0, flags=0) at event.c:331
#5 0x0000000000404688 in main (argc=11113, argv=<value optimized out>) at memcached.c:3130
[ ... The process is stuck in the loop starting on line 501 ]
0x0000000000407eee in assoc_find (key=0x2aaaaec12be8 "017_52956_net.flixster.entity.User%23813936667", nkey=46) at assoc.c:502
502 if ((nkey == it->nkey) &&
(gdb) list
497 } else {
498 it = primary_hashtable[hv & hashmask(hashpower)];
499 }
500
501 while (it) {
502 if ((nkey == it->nkey) &&
503 (memcmp(key, ITEM_key(it), nkey) == 0)) {
504 return it;
505 }
506 it = it->h_next;
507 }
508 return 0;
509 }
[ ... Walk through the loop, watching the 'it' pointer and printing its address at each iteration ... ]
(gdb) watch it
Watchpoint 1: it
(gdb) commands 1
Type commands for when breakpoint 1 is hit, one per line.
End with a line saying just "end".
>print it
>end
(gdb) continue
Continuing.
Watchpoint 1: it
[ ... Note the old value, 0x513a3190, it's going to come around again ... ]
Old value = (item *) 0x513a3190
New value = (item *) 0x7de6ce98
assoc_find (key=0x2aaaaec12be8 "017_52956_net.flixster.entity.User%23813936667", nkey=46) at assoc.c:501
501 while (it) {
$7 = (item *) 0x7de6ce98
(gdb)
Continuing.
Watchpoint 1: it
Old value = (item *) 0x7de6ce98
New value = (item *) 0x5c568cd8
assoc_find (key=0x2aaaaec12be8 "017_52956_net.flixster.entity.User%23813936667", nkey=46) at assoc.c:501
501 while (it) {
$8 = (item *) 0x5c568cd8
(gdb)
Continuing.
[ ... Here it is ... we're re-visting the item at 0x513a3190. We're stuck! ... ]
Watchpoint 1: it
Old value = (item *) 0x5c568cd8
New value = (item *) 0x513a3190
assoc_find (key=0x2aaaaec12be8 "017_52956_net.flixster.entity.User%23813936667", nkey=46) at assoc.c:501
501 while (it) {
(I also have a core file.)
I believe this is the same problem mentioned a few times recently on the list,
http://www.nabble.com/Busy-loop-and-blocking-threads-on-1.2.5-to16470756.html
http://www.mail-archive.com/memcached@lists.danga.com/msg00978.html
Any suggestions?
thanks,
Miguel
More information about the memcached
mailing list