Memcached crashing on FreeBSD

Anatoly Vorobey mellon at pobox.com
Wed Apr 13 16:05:28 PDT 2005


On Wed, Apr 13, 2005 at 06:42:44PM -0400, Jacob Coby wrote:
> Jason Coene wrote:
> >Well guys, some bad news - "ax" didn't fix the crash!
> >
> >Code I added to main():
> >
> >#ifdef __FreeBSD__
> >    _malloc_options = "ax";
> >#endif
> >
> >The ifdef block is definitely getting picked up, I tested with a printf
> 
> Maybe an assert() needs to be put in slabs_newslab to double check that 
> _malloc_options isn't being reset to the defaults?

Maybe, but I doubt it; based on reading the source, _malloc_options cannot
matter late in the game, it's only getting parsed during init time.

> >Backtrace from latest crash:
> >
> >(gdb) bt
> >#0  0x280c5d4f in kill () from /lib/libc.so.5
> >#1  0x280ba7f8 in raise () from /lib/libc.so.5
> >#2  0x28132f02 in abort () from /lib/libc.so.5
> >#3  0x2813167e in tcflow () from /lib/libc.so.5
> >#4  0x28131f1b in tcflow () from /lib/libc.so.5
> 
> I'm very naive with the *BSDs, but #3 and #4 confuse me a little bit.  I 
> haven't been able to find a version of malloc() in the FreeBSD CVS that 
> calls tcflow().  The version that I'm looking at calls pubrealloc(), 
> which never calls tcflow().  It'd be nice if the bt showed the params to 
> the libc calls..

Ah, well, this is probably due to the libc.so.5 not being a debug version 
(and why would it be), so it doesn't have debug information for internal 
functions. All gdb has to go on are the addresses of the exported 
functions, and #3 and #4 are just some internal functions used by malloc
that happen to lie closest to tcflow()'s address in the code segment, 
among all the exported addresses gdb has. 

In fact, we can reconstruct the flow, using the fact that it's FreeBSD
5.2-RELEASE, provided by Jason. The relevant source file is

http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc/stdlib/malloc.c?rev=1.84.2.1&content-type=text/x-cvsweb-markup&only_with_tag=RELENG_5_2

Jason's getting "error: allocation failed", which is only called from 
inside imalloc() in that file. imalloc() calls wrterror() - these are our
#3 and #4 - and wrterror() calls abort(). imalloc() in its turn is called
directly from malloc().

But why does imalloc() abort, even though we told it not to? It looks at
the internal variable malloc_abort as a guide of whether to abort. The
variable is initialised from all three kinds of malloc options (symlink,
environment variable and _malloc_options) in malloc_init(). It corresponds
directly to the option 'a':

case 'a': malloc_abort   = 0; break;

But look at what happens right after that:

    /*
     * Sensitive processes, somewhat arbitrarily defined here as setuid,
     * setgid, root and wheel cannot afford to have malloc mistakes.
     */
    if (issetugid() || getuid() == 0 || getgid() == 0)
	    malloc_abort = 1;

Somewhat arbitrarily, and completely undocumented.

So, Jason, you need to try and run memcached as a non-root user *and* with
the _malloc_options change. If even that will fail with the same problem,
we'll be back at square one.

-- 
avva
"There's nothing simply good, nor ill alone" -- John Donne



More information about the memcached mailing list