memcached crashing

Brad Fitzpatrick brad@danga.com
Tue, 29 Jun 2004 22:06:31 -0700 (PDT)


Looks like you're still using buggy rtsig:


rt_sigtimedwait([IO 34], {si_signo=SIGRT_2, si_code=0x1, si_pid=65, si_uid=13, si_value={int=1, ptr=0x1}}, 0xbfffdc
58, 8) = 34
rt_sigtimedwait([IO 34], {si_signo=SIGRT_2, si_code=0x1, si_pid=65, si_uid=4, si_value={int=1, ptr=0x1}}, 0xbfffdc5
8, 8) = 34
fcntl64(4, F_GETFL)                     = -1 EBADF (Bad file descriptor)
exit_group(0)                           = ?



On Wed, 30 Jun 2004, Jon Valvatne wrote:

> No core file. I've attached the last part of the strace output; my
> mailer wasn't being nice with the wrapping.
>
> Jon
>
> ----------------------------------------------------------------------
> On Tue, 29 Jun 2004 21:37:00 -0700 (PDT)
> Brad Fitzpatrick <brad@danga.com> wrote:
>
> > Scary.
> >
> > Run it with -r to increase core file size, and make sure the user you
> > run
> > it as has permission to write to the directory you start it from.
> > (with -r
> > it won't chdir to /)
> >
> > Then with the core file, we can inspect it with gdb.
> >
> > But maybe it's not crashing and just quitting, like the event loop is
> > ending.
> >
> > In that case, run it in the foreground but with strace in front of it:
> >
> > strace ./memcached .....
> >
> > Then paste what you see as its final output.
> >
> >
> >
> > On Wed, 30 Jun 2004, Jon Valvatne wrote:
> >
> > > Ok; thanks for the heads-up. I recompiled libevent without rtsig
> > > support, but that doesn't seem to have changed anything at all.
> > > Still
> > > random crashes and refused connections.
> > >
> > > Is there any way to get any sort of debug information out of
> > > memcached
> > > when it crashes?
> > >
> > > Jon
> > >
> > > -------------------------------------------------------------------
> > > ---
> > > On Tue, 29 Jun 2004 21:10:53 -0700 (PDT)
> > > Brad Fitzpatrick <brad@danga.com> wrote:
> > >
> > > > Do *not* use libevent's rtsig support.  I thought he removed that
> > > > given
> > > > how buggy it was.  Three really smart people worked on it for
> > > > quite
> > > > some
> > > > time without getting it anywhere near reliable.  It's just a crap
> > > > interface and it was never made to work with libevent.
> > > >
> > > > Use poll if you must, but epoll's really the best.
> > > >
> > > > - Brad
> > > >
> > > >
> > > > On Wed, 30 Jun 2004, Jon Valvatne wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I've been using memcached to add some caching to a production
> > > > > system
> > > > > to
> > > > > speed things up. Everything worked smoothly on my test box, but
> > > > > I
> > > > > ran
> > > > > into nothing but problems when trying to go live with the
> > > > > changes:
> > > > > Memcached would just die randomly, without any error message
> > > > > whatsoever,
> > > > > within minutes of startup. And even while it was running and
> > > > > accepting
> > > > > some connections, other connections appeared to be randomly
> > > > > refused.
> > > > >
> > > > > The only difference between the test box and the production
> > > > > system
> > > > > is
> > > > > that one is running Fedora Core 2, and the other Redhat 9.
> > > > > Before I
> > > > > try
> > > > > to debug the situation more, I would like to ask: Does anyone
> > > > > here
> > > > > have
> > > > > any experience running memcached with Redhat 9? There's
> > > > > obviously no
> > > > > epoll support, so I compiled the latest libevent with
> > > > > --with-rtsig,
> > > > > and
> > > > > I'm assuming that's what memcached is using. Is this just
> > > > > inherently
> > > > > buggy, or so poor-performing that my system with about a hundred
> > > > > connections and several operations per second will cause the
> > > > > problem
> > > > > I'm
> > > > > seeing?
> > > > >
> > > > > One thing that worried me were the test results when compiling
> > > > > libevent:
> > > > >
> > > > > Running tests:
> > > > > KQUEUE
> > > > > Skipping test
> > > > > POLL
> > > > >  test-eof: OKAY
> > > > >  test-weof: OKAY
> > > > >  test-time: OKAY
> > > > >  regress: FAILED
> > > > > SELECT
> > > > >  test-eof: OKAY
> > > > >  test-weof: OKAY
> > > > >  test-time: OKAY
> > > > >  regress: FAILED
> > > > > RTSIG
> > > > >  test-eof: OKAY
> > > > >  test-weof: OKAY
> > > > >  test-time: OKAY
> > > > >  regress: FAILED
> > > > > EPOLL
> > > > > Skipping test
> > > > >
> > > > > What are these regress tests, and what would cause them to fail?
> > > > >
> > > > > By the way: Is there any way to ask memcached or libevent which
> > > > > polling
> > > > > mechanism is being used?
> > > > >
> > > > > Thanks in advance,
> > > > >
> > > > > Jon Valvatne
> > > > >
> > > > >
> > >
> > >
>