libevent looping

Davide Libenzi davidel@xmailserver.org
Fri, 5 Sep 2003 08:04:04 -0700 (PDT)


On Fri, 5 Sep 2003, Niels Provos wrote:

> Hi Brad,
>
> sorry for not getting back to you earlier.  I just moved to California
> and started a new job.  At this point, I do not know what is causing
> your problems.  I attached a diff that fixes some known problems with
> epoll.
>
> Davide did any epoll semantics change recently?

The latest big change was the double LT/ET support. It'd be interesting to
see what kind of event in received by epoll_wait() inside libevent (by
dropping a printf of the fetched events). Which kernel are you using ?
Are you using ET or LT epoll ?




> On Thu, Sep 04, 2003 at 07:11:55PM -0700, Brad Fitzpatrick wrote:
> > Niels,
> >
> > A few of us at LiveJournal.com have been bug-hunting a problem with
> > libevent and epoll for a number of weeks now.
> >
> > We have a dozen machines running memcached on libevent.  A couple times
> > per week a process will start eating all available CPU, but continues to
> > function.
> >
> > What we discovered is that libevent's epoll mode isn't handling a (new?)
> > type of event from Linux, and continually processes the same event over
> > and over.  When a new event comes in that libevent can handle (EPOLLIN or
> > EPOLLOUT), it handles it fine, deletes it, and proceeds to be confused
> > over the mystery event.
> >
> > Here you can see libevent being confused for a while, getting a new event,
> > passing it back to our application, and then going back to looping, and so
> > on.
> >
> > (Our app always does a syscall, read or write, after calls into libevent,
> > so this isn't us looping.)
> >
> > Here's a snippet of an strace:
> >
> > epoll_wait(0x3, 0x80accb8, 0x2000, 0x1ad) = 1
> > gettimeofday({1062725530, 924268}, NULL) = 0
> > gettimeofday({1062725530, 924392}, NULL) = 0
> > gettimeofday({1062725530, 924515}, NULL) = 0
> > epoll_wait(0x3, 0x80accb8, 0x2000, 0x1ad) = 1
> > gettimeofday({1062725530, 924764}, NULL) = 0
> > gettimeofday({1062725530, 924886}, NULL) = 0
> > gettimeofday({1062725530, 924976}, NULL) = 0
> > epoll_wait(0x3, 0x80accb8, 0x2000, 0x1ac) = 1
> > gettimeofday({1062725530, 925195}, NULL) = 0
> > gettimeofday({1062725530, 925314}, NULL) = 0
> > gettimeofday({1062725530, 925440}, NULL) = 0
> > epoll_wait(0x3, 0x80accb8, 0x2000, 0x1ac) = 1
> > gettimeofday({1062725530, 925690}, NULL) = 0
> > gettimeofday({1062725530, 925814}, NULL) = 0
> > gettimeofday({1062725530, 925937}, NULL) = 0
> > epoll_wait(0x3, 0x80accb8, 0x2000, 0x1ab) = 1
> > gettimeofday({1062725530, 926186}, NULL) = 0
> > gettimeofday({1062725530, 926310}, NULL) = 0
> > gettimeofday({1062725530, 926433}, NULL) = 0
> > epoll_wait(0x3, 0x80accb8, 0x2000, 0x1ab) = 1
> > gettimeofday({1062725530, 926666}, NULL) = 0
> > gettimeofday({1062725530, 926779}, NULL) = 0
> > gettimeofday({1062725530, 926901}, NULL) = 0
> > epoll_wait(0x3, 0x80accb8, 0x2000, 0x1aa) = 2
> > gettimeofday({1062725530, 927167}, NULL) = 0
> > read(177, "get userpic.6166503\r\n", 16384) = 21
> > read(177, 0x87a834d, 16363)             = -1 EAGAIN (Resource temporarily unavailable)
> > setsockopt(177, SOL_TCP, TCP_CORK, [1], 4) = 0
> > time(NULL)                              = 1062725530
> > time(NULL)                              = 1062725530
> > write(177, "VALUE userpic.6166503 1 33\r\n", 28) = 28
> > write(177, "\4\4\0041234\4\4\4\10\2\3\0\0\0\n\003100\n\00296\n\006"..., 35) = 35
> > write(177, "END\r\n", 5)                = 5
> > setsockopt(177, SOL_TCP, TCP_CORK, [0], 4) = 0
> > read(177, 0x87a8338, 16384)             = -1 EAGAIN (Resource temporarily unavailable)
> > gettimeofday({1062725530, 928978}, NULL) = 0
> > gettimeofday({1062725530, 929099}, NULL) = 0
> > epoll_wait(0x3, 0x80accb8, 0x2000, 0x1a8) = 1
> > gettimeofday({1062725530, 929360}, NULL) = 0
> > gettimeofday({1062725530, 929486}, NULL) = 0
> > gettimeofday({1062725530, 929690}, NULL) = 0
> > epoll_wait(0x3, 0x80accb8, 0x2000, 0x1a7) = 2
> > gettimeofday({1062725530, 929886}, NULL) = 0
> > read(402, "get sess:1256389:4\r\n", 16384) = 20
> > read(402, 0x84bb3bc, 16364)             = -1 EAGAIN (Resource temporarily unavailable)
> >
> >
> > We suspect libevent needs to be changed to delete events it knows nothing
> > about, if there is indeed something besides EPOLLIN and EPOLLOUT.
> >
> > Any thoughts?
> >
> >
> > Thanks!
> > Brad
>


- Davide