libevent looping

Brad Fitzpatrick brad@danga.com
Thu, 4 Sep 2003 19:34:51 -0700 (PDT)


I forgot to mention:  0x3 is the fd of the server socket, it'd seem, since
daemon() closes and reopens std{out,in,err} to /dev/null.

So maybe Linux 2.4.21 has a new event type for server sockets?
I googled and saw talk of POLLRDHUP or something?  (yeah, I really don't
know...)



On Thu, 4 Sep 2003, Brad Fitzpatrick wrote:

> Niels,
>
> A few of us at LiveJournal.com have been bug-hunting a problem with
> libevent and epoll for a number of weeks now.
>
> We have a dozen machines running memcached on libevent.  A couple times
> per week a process will start eating all available CPU, but continues to
> function.
>
> What we discovered is that libevent's epoll mode isn't handling a (new?)
> type of event from Linux, and continually processes the same event over
> and over.  When a new event comes in that libevent can handle (EPOLLIN or
> EPOLLOUT), it handles it fine, deletes it, and proceeds to be confused
> over the mystery event.
>
> Here you can see libevent being confused for a while, getting a new event,
> passing it back to our application, and then going back to looping, and so
> on.
>
> (Our app always does a syscall, read or write, after calls into libevent,
> so this isn't us looping.)
>
> Here's a snippet of an strace:
>
> epoll_wait(0x3, 0x80accb8, 0x2000, 0x1ad) = 1
> gettimeofday({1062725530, 924268}, NULL) = 0
> gettimeofday({1062725530, 924392}, NULL) = 0
> gettimeofday({1062725530, 924515}, NULL) = 0
> epoll_wait(0x3, 0x80accb8, 0x2000, 0x1ad) = 1
> gettimeofday({1062725530, 924764}, NULL) = 0
> gettimeofday({1062725530, 924886}, NULL) = 0
> gettimeofday({1062725530, 924976}, NULL) = 0
> epoll_wait(0x3, 0x80accb8, 0x2000, 0x1ac) = 1
> gettimeofday({1062725530, 925195}, NULL) = 0
> gettimeofday({1062725530, 925314}, NULL) = 0
> gettimeofday({1062725530, 925440}, NULL) = 0
> epoll_wait(0x3, 0x80accb8, 0x2000, 0x1ac) = 1
> gettimeofday({1062725530, 925690}, NULL) = 0
> gettimeofday({1062725530, 925814}, NULL) = 0
> gettimeofday({1062725530, 925937}, NULL) = 0
> epoll_wait(0x3, 0x80accb8, 0x2000, 0x1ab) = 1
> gettimeofday({1062725530, 926186}, NULL) = 0
> gettimeofday({1062725530, 926310}, NULL) = 0
> gettimeofday({1062725530, 926433}, NULL) = 0
> epoll_wait(0x3, 0x80accb8, 0x2000, 0x1ab) = 1
> gettimeofday({1062725530, 926666}, NULL) = 0
> gettimeofday({1062725530, 926779}, NULL) = 0
> gettimeofday({1062725530, 926901}, NULL) = 0
> epoll_wait(0x3, 0x80accb8, 0x2000, 0x1aa) = 2
> gettimeofday({1062725530, 927167}, NULL) = 0
> read(177, "get userpic.6166503\r\n", 16384) = 21
> read(177, 0x87a834d, 16363)             = -1 EAGAIN (Resource temporarily unavailable)
> setsockopt(177, SOL_TCP, TCP_CORK, [1], 4) = 0
> time(NULL)                              = 1062725530
> time(NULL)                              = 1062725530
> write(177, "VALUE userpic.6166503 1 33\r\n", 28) = 28
> write(177, "\4\4\0041234\4\4\4\10\2\3\0\0\0\n\003100\n\00296\n\006"..., 35) = 35
> write(177, "END\r\n", 5)                = 5
> setsockopt(177, SOL_TCP, TCP_CORK, [0], 4) = 0
> read(177, 0x87a8338, 16384)             = -1 EAGAIN (Resource temporarily unavailable)
> gettimeofday({1062725530, 928978}, NULL) = 0
> gettimeofday({1062725530, 929099}, NULL) = 0
> epoll_wait(0x3, 0x80accb8, 0x2000, 0x1a8) = 1
> gettimeofday({1062725530, 929360}, NULL) = 0
> gettimeofday({1062725530, 929486}, NULL) = 0
> gettimeofday({1062725530, 929690}, NULL) = 0
> epoll_wait(0x3, 0x80accb8, 0x2000, 0x1a7) = 2
> gettimeofday({1062725530, 929886}, NULL) = 0
> read(402, "get sess:1256389:4\r\n", 16384) = 20
> read(402, 0x84bb3bc, 16364)             = -1 EAGAIN (Resource temporarily unavailable)
>
>
> We suspect libevent needs to be changed to delete events it knows nothing
> about, if there is indeed something besides EPOLLIN and EPOLLOUT.
>
> Any thoughts?
>
>
> Thanks!
> Brad
>
>