memcached crashing
Brad Fitzpatrick
brad@danga.com
Tue, 29 Jun 2004 22:49:15 -0700 (PDT)
Don't know about poll, but with epoll I can do ten+ thousands of
operations per second with 200-300 clients.
Our site's under a DDoS right now, otherwise I'd get you some real
numbers. Those are from what I vaguely remember.
On Wed, 30 Jun 2004, Jon Valvatne wrote:
> Ah. Sorry for the false alarm. I didn't realize I had to recompile
> memcached as well, in order for it to stop using rtsig. It works very
> nicely now, thank you very much for the prompt replies.
>
> By the way: Can you say anything about the performance of poll vs epoll?
> How many simultaneous connections and/or operations per second could I
> get before I should start worrying about finding a way to get epoll to
> work on this box?
>
> Jon
>
> ----------------------------------------------------------------------
> On Tue, 29 Jun 2004 22:06:31 -0700 (PDT)
> Brad Fitzpatrick <brad@danga.com> wrote:
>
> > Looks like you're still using buggy rtsig:
> >
> >
> > rt_sigtimedwait([IO 34], {si_signo=SIGRT_2, si_code=0x1, si_pid=65,
> > si_uid=13, si_value={int=1, ptr=0x1}}, 0xbfffdc
> > 58, 8) = 34
> > rt_sigtimedwait([IO 34], {si_signo=SIGRT_2, si_code=0x1, si_pid=65,
> > si_uid=4, si_value={int=1, ptr=0x1}}, 0xbfffdc5
> > 8, 8) = 34
> > fcntl64(4, F_GETFL) = -1 EBADF (Bad file
> > descriptor)
> > exit_group(0) = ?
> >
> >
> >
> > On Wed, 30 Jun 2004, Jon Valvatne wrote:
> >
> > > No core file. I've attached the last part of the strace output; my
> > > mailer wasn't being nice with the wrapping.
> > >
> > > Jon
> > >
> > > -------------------------------------------------------------------
> > > ---
> > > On Tue, 29 Jun 2004 21:37:00 -0700 (PDT)
> > > Brad Fitzpatrick <brad@danga.com> wrote:
> > >
> > > > Scary.
> > > >
> > > > Run it with -r to increase core file size, and make sure the user
> > > > you
> > > > run
> > > > it as has permission to write to the directory you start it from.
> > > > (with -r
> > > > it won't chdir to /)
> > > >
> > > > Then with the core file, we can inspect it with gdb.
> > > >
> > > > But maybe it's not crashing and just quitting, like the event loop
> > > > is
> > > > ending.
> > > >
> > > > In that case, run it in the foreground but with strace in front of
> > > > it:
> > > >
> > > > strace ./memcached .....
> > > >
> > > > Then paste what you see as its final output.
> > > >
> > > >
> > > >
> > > > On Wed, 30 Jun 2004, Jon Valvatne wrote:
> > > >
> > > > > Ok; thanks for the heads-up. I recompiled libevent without rtsig
> > > > > support, but that doesn't seem to have changed anything at all.
> > > > > Still
> > > > > random crashes and refused connections.
> > > > >
> > > > > Is there any way to get any sort of debug information out of
> > > > > memcached
> > > > > when it crashes?
> > > > >
> > > > > Jon
> > > > >
> > > > > ---------------------------------------------------------------
> > > > > ----
> > > > > ---
> > > > > On Tue, 29 Jun 2004 21:10:53 -0700 (PDT)
> > > > > Brad Fitzpatrick <brad@danga.com> wrote:
> > > > >
> > > > > > Do *not* use libevent's rtsig support. I thought he removed
> > > > > > that
> > > > > > given
> > > > > > how buggy it was. Three really smart people worked on it for
> > > > > > quite
> > > > > > some
> > > > > > time without getting it anywhere near reliable. It's just a
> > > > > > crap
> > > > > > interface and it was never made to work with libevent.
> > > > > >
> > > > > > Use poll if you must, but epoll's really the best.
> > > > > >
> > > > > > - Brad
> > > > > >
> > > > > >
> > > > > > On Wed, 30 Jun 2004, Jon Valvatne wrote:
> > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > I've been using memcached to add some caching to a
> > > > > > > production
> > > > > > > system
> > > > > > > to
> > > > > > > speed things up. Everything worked smoothly on my test box,
> > > > > > > but
> > > > > > > I
> > > > > > > ran
> > > > > > > into nothing but problems when trying to go live with the
> > > > > > > changes:
> > > > > > > Memcached would just die randomly, without any error message
> > > > > > > whatsoever,
> > > > > > > within minutes of startup. And even while it was running and
> > > > > > > accepting
> > > > > > > some connections, other connections appeared to be randomly
> > > > > > > refused.
> > > > > > >
> > > > > > > The only difference between the test box and the production
> > > > > > > system
> > > > > > > is
> > > > > > > that one is running Fedora Core 2, and the other Redhat 9.
> > > > > > > Before I
> > > > > > > try
> > > > > > > to debug the situation more, I would like to ask: Does
> > > > > > > anyone
> > > > > > > here
> > > > > > > have
> > > > > > > any experience running memcached with Redhat 9? There's
> > > > > > > obviously no
> > > > > > > epoll support, so I compiled the latest libevent with
> > > > > > > --with-rtsig,
> > > > > > > and
> > > > > > > I'm assuming that's what memcached is using. Is this just
> > > > > > > inherently
> > > > > > > buggy, or so poor-performing that my system with about a
> > > > > > > hundred
> > > > > > > connections and several operations per second will cause the
> > > > > > > problem
> > > > > > > I'm
> > > > > > > seeing?
> > > > > > >
> > > > > > > One thing that worried me were the test results when
> > > > > > > compiling
> > > > > > > libevent:
> > > > > > >
> > > > > > > Running tests:
> > > > > > > KQUEUE
> > > > > > > Skipping test
> > > > > > > POLL
> > > > > > > test-eof: OKAY
> > > > > > > test-weof: OKAY
> > > > > > > test-time: OKAY
> > > > > > > regress: FAILED
> > > > > > > SELECT
> > > > > > > test-eof: OKAY
> > > > > > > test-weof: OKAY
> > > > > > > test-time: OKAY
> > > > > > > regress: FAILED
> > > > > > > RTSIG
> > > > > > > test-eof: OKAY
> > > > > > > test-weof: OKAY
> > > > > > > test-time: OKAY
> > > > > > > regress: FAILED
> > > > > > > EPOLL
> > > > > > > Skipping test
> > > > > > >
> > > > > > > What are these regress tests, and what would cause them to
> > > > > > > fail?
> > > > > > >
> > > > > > > By the way: Is there any way to ask memcached or libevent
> > > > > > > which
> > > > > > > polling
> > > > > > > mechanism is being used?
> > > > > > >
> > > > > > > Thanks in advance,
> > > > > > >
> > > > > > > Jon Valvatne
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > >
>
>