Memcache 1.2.5 segfaults

Fri Jun 13 15:59:33 UTC 2008

   Well that's a bug.  I'll try to see if I can contrive a test that  
triggers it, but in the meantime, I'll push a fix up to my master tree  
when I get off the train.

-- 
Dustin Sallings (mobile)

On Jun 13, 2008, at 5:55, "Hugo Hallqvist" <hugo at dokad.se> wrote:

> Hi list,
>
> we're using memcache to cache documents in our application and we've
> got some issues with stability. Memcached segfaults after having been
> run for some time.
> We're using memcached version 1.2.5 on linux, kernel version 2.6.22
> from ubuntu. We have been getting the crashes on 3 different computers
> running 2 different kernel version, so it seems likely this is
> memcache-related.
>
> Do anyone recognize these problems? Is there some info we can add in
> order to troubleshoot the problem?
>
> This is the stacktrace from gdb:
> Core was generated by `/usr/local/bin/memcached -vv -m 1500 -p 11211
> -u root -r'.
> Program terminated with signal 6, Aborted.
> #0  0x00002af105d3b765 in raise () from /lib/libc.so.6
> (gdb) bt
> #0  0x00002af105d3b765 in raise () from /lib/libc.so.6
> #1  0x00002af105d3d1c0 in abort () from /lib/libc.so.6
> #2  0x00002af105d7460b in ?? () from /lib/libc.so.6
> #3  0x00002af105d7bb0a in ?? () from /lib/libc.so.6
> #4  0x00002af105d7d73e in ?? () from /lib/libc.so.6
> #5  0x00002af105d7f979 in realloc () from /lib/libc.so.6
> #6  0x0000000000402532 in do_suffix_add_to_freelist (s=0x660590 "  
> 1001\r\n")
>   at memcached.c:596
> #7  0x0000000000402628 in conn_cleanup (c=0x656330) at memcached.c:413
> #8  0x00000000004026f4 in conn_close (c=0x656330) at memcached.c:459
> #9  0x00000000004063fd in event_handler (fd=<value optimized out>,  
> which=2793,
>   arg=0x656330) at memcached.c:2309
> #10 0x00002af105af6f99 in event_base_loop (base=0x613d80, flags=0)
>   at event.c:331
> #11 0x00000000004049bf in main (argc=-1524798896, argv=<value  
> optimized out>)
>   at memcached.c:3130
>
> As the issue seems memory related I tried running it through valgrind
> and got the following errors:
> valgrind /usr/local/bin/memcached -vv -c 10 -m 1500 -p 11211 -u root  
> -r
>
> <16 add 19BD1FAA62B46055817FE6FA5E8E9F 2 1800 1532856
>> 16 SERVER_ERROR object too large for cache
> ==2889==
> ==2889== Invalid write of size 8
> ==2889==    at 0x402558: do_suffix_add_to_freelist (memcached.c:600)
> ==2889==    by 0x4067DC: event_handler (memcached.c:2274)
> ==2889==    by 0x4E2CF98: event_base_loop (event.c:331)
> ==2889==    by 0x4049BE: main (memcached.c:3130)
> ==2889==  Address 0x40B23C0 is not stack'd, malloc'd or (recently)  
> free'd
> ==2889==
> ==2889== Invalid write of size 8
> ==2889==    at 0x40250E: do_suffix_add_to_freelist (memcached.c:592)
> ==2889==    by 0x4067DC: event_handler (memcached.c:2274)
> ==2889==    by 0x4E2CF98: event_base_loop (event.c:331)
> ==2889==    by 0x4049BE: main (memcached.c:3130)
> ==2889==  Address 0x40B23C8 is not stack'd, malloc'd or (recently)  
> free'd
>
> it doesn't crash here, but a few searches later it crashes
>
>> 27 sending key document:7185484416683036457
>> 27 SERVER_ERROR out of memory making CAS suffix
> <16 add 19BD1FAA62B46055817FE6FA5E8E9F 2 1800 1532856
>> 16 SERVER_ERROR object too large for cache
> ==2889==
> ==2889== Invalid read of size 8
> ==2889==    at 0x4E2C4D2: event_queue_insert (event.c:892)
> ==2889==    by 0x4E3814C: epoll_dispatch (epoll.c:243)
> ==2889==    by 0x4E2CE60: event_base_loop (event.c:440)
> ==2889==    by 0x4049BE: main (memcached.c:3130)
> ==2889==  Address 0x7203EE0 is 8 bytes after a block of size 24  
> alloc'd
> ==2889==    at 0x4C21C16: malloc (vg_replace_malloc.c:149)
> ==2889==    by 0x403C86: process_get_command (memcached.c:1274)
> ==2889==    by 0x405CA7: try_read_command (memcached.c:1692)
> ==2889==    by 0x4065BB: event_handler (memcached.c:2135)
> ==2889==    by 0x4E2CF98: event_base_loop (event.c:331)
> ==2889==    by 0x4049BE: main (memcached.c:3130)
> ==2889==
> ==2889== Invalid read of size 8
> ==2889==    at 0x4E2C4DF: event_queue_insert (event.c:892)
> ==2889==    by 0x4E3814C: epoll_dispatch (epoll.c:243)
> ==2889==    by 0x4E2CE60: event_base_loop (event.c:440)
> ==2889==    by 0x4049BE: main (memcached.c:3130)
> ==2889==  Address 0x39DD5CC0 is not stack'd, malloc'd or (recently)  
> free'd
> ==2889==
> ==2889== Process terminating with default action of signal 11
> (SIGSEGV): dumping core
> ==2889==  Access not within mapped region at address 0x39DD5CC0
> ==2889==    at 0x4E2C4DF: event_queue_insert (event.c:892)
> ==2889==    by 0x4E3814C: epoll_dispatch (epoll.c:243)
> ==2889==    by 0x4E2CE60: event_base_loop (event.c:440)
> ==2889==    by 0x4049BE: main (memcached.c:3130)
> ==2889==
> ==2889== ERROR SUMMARY: 60070 errors from 7 contexts (suppressed: 16  
> from 1)
> ==2889== malloc/free: in use at exit: 51,446,265 bytes in 10,329  
> blocks.
> ==2889== malloc/free: 19,270 allocs, 8,941 frees, 145,234,725 bytes  
> allocated.
> ==2889== For counts of detected errors, rerun with: -v
> ==2889== searching for pointers to 10,329 not-freed blocks.
> ==2889== checked 50,947,848 bytes.
> ==2889==
> ==2889== LEAK SUMMARY:
> ==2889==    definitely lost: 49,989 bytes in 1,639 blocks.
> ==2889==      possibly lost: 0 bytes in 0 blocks.
> ==2889==    still reachable: 51,396,276 bytes in 8,690 blocks.
> ==2889==         suppressed: 0 bytes in 0 blocks.
> ==2889== Rerun with --leak-check=full to see details of leaked memory.
> Segmentation fault
>
> dmesg output:
> Linux version 2.6.22-14-server (buildd at king) (gcc version 4.1.3
> 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)) #1 SMP Tue Feb 12
> 03:10:53 UTC 2008 (Ubuntu 2.6.22-14.52-server)
> ---- snip ----
> [592338.930134] memcached[16615]: segfault at 0000000000000bc1 rip
> 00002b6ce31060b7 rsp 00007fffc7e48900 error 6
>
> --
> Med vänlig hälsning,
> Hugo Hallqvist
> Dokad Software AB
> www.dokad.se