libevent looping
Niels Provos
provos@citi.umich.edu
Fri, 5 Sep 2003 02:57:49 -0400
--zhXaljGHf11kAtnf
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Hi Brad,
sorry for not getting back to you earlier. I just moved to California
and started a new job. At this point, I do not know what is causing
your problems. I attached a diff that fixes some known problems with
epoll.
Davide did any epoll semantics change recently?
Niels.
On Thu, Sep 04, 2003 at 07:11:55PM -0700, Brad Fitzpatrick wrote:
> Niels,
>
> A few of us at LiveJournal.com have been bug-hunting a problem with
> libevent and epoll for a number of weeks now.
>
> We have a dozen machines running memcached on libevent. A couple times
> per week a process will start eating all available CPU, but continues to
> function.
>
> What we discovered is that libevent's epoll mode isn't handling a (new?)
> type of event from Linux, and continually processes the same event over
> and over. When a new event comes in that libevent can handle (EPOLLIN or
> EPOLLOUT), it handles it fine, deletes it, and proceeds to be confused
> over the mystery event.
>
> Here you can see libevent being confused for a while, getting a new event,
> passing it back to our application, and then going back to looping, and so
> on.
>
> (Our app always does a syscall, read or write, after calls into libevent,
> so this isn't us looping.)
>
> Here's a snippet of an strace:
>
> epoll_wait(0x3, 0x80accb8, 0x2000, 0x1ad) = 1
> gettimeofday({1062725530, 924268}, NULL) = 0
> gettimeofday({1062725530, 924392}, NULL) = 0
> gettimeofday({1062725530, 924515}, NULL) = 0
> epoll_wait(0x3, 0x80accb8, 0x2000, 0x1ad) = 1
> gettimeofday({1062725530, 924764}, NULL) = 0
> gettimeofday({1062725530, 924886}, NULL) = 0
> gettimeofday({1062725530, 924976}, NULL) = 0
> epoll_wait(0x3, 0x80accb8, 0x2000, 0x1ac) = 1
> gettimeofday({1062725530, 925195}, NULL) = 0
> gettimeofday({1062725530, 925314}, NULL) = 0
> gettimeofday({1062725530, 925440}, NULL) = 0
> epoll_wait(0x3, 0x80accb8, 0x2000, 0x1ac) = 1
> gettimeofday({1062725530, 925690}, NULL) = 0
> gettimeofday({1062725530, 925814}, NULL) = 0
> gettimeofday({1062725530, 925937}, NULL) = 0
> epoll_wait(0x3, 0x80accb8, 0x2000, 0x1ab) = 1
> gettimeofday({1062725530, 926186}, NULL) = 0
> gettimeofday({1062725530, 926310}, NULL) = 0
> gettimeofday({1062725530, 926433}, NULL) = 0
> epoll_wait(0x3, 0x80accb8, 0x2000, 0x1ab) = 1
> gettimeofday({1062725530, 926666}, NULL) = 0
> gettimeofday({1062725530, 926779}, NULL) = 0
> gettimeofday({1062725530, 926901}, NULL) = 0
> epoll_wait(0x3, 0x80accb8, 0x2000, 0x1aa) = 2
> gettimeofday({1062725530, 927167}, NULL) = 0
> read(177, "get userpic.6166503\r\n", 16384) = 21
> read(177, 0x87a834d, 16363) = -1 EAGAIN (Resource temporarily unavailable)
> setsockopt(177, SOL_TCP, TCP_CORK, [1], 4) = 0
> time(NULL) = 1062725530
> time(NULL) = 1062725530
> write(177, "VALUE userpic.6166503 1 33\r\n", 28) = 28
> write(177, "\4\4\0041234\4\4\4\10\2\3\0\0\0\n\003100\n\00296\n\006"..., 35) = 35
> write(177, "END\r\n", 5) = 5
> setsockopt(177, SOL_TCP, TCP_CORK, [0], 4) = 0
> read(177, 0x87a8338, 16384) = -1 EAGAIN (Resource temporarily unavailable)
> gettimeofday({1062725530, 928978}, NULL) = 0
> gettimeofday({1062725530, 929099}, NULL) = 0
> epoll_wait(0x3, 0x80accb8, 0x2000, 0x1a8) = 1
> gettimeofday({1062725530, 929360}, NULL) = 0
> gettimeofday({1062725530, 929486}, NULL) = 0
> gettimeofday({1062725530, 929690}, NULL) = 0
> epoll_wait(0x3, 0x80accb8, 0x2000, 0x1a7) = 2
> gettimeofday({1062725530, 929886}, NULL) = 0
> read(402, "get sess:1256389:4\r\n", 16384) = 20
> read(402, 0x84bb3bc, 16364) = -1 EAGAIN (Resource temporarily unavailable)
>
>
> We suspect libevent needs to be changed to delete events it knows nothing
> about, if there is indeed something besides EPOLLIN and EPOLLOUT.
>
> Any thoughts?
>
>
> Thanks!
> Brad
--zhXaljGHf11kAtnf
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=diff
Index: epoll.c
===================================================================
RCS file: /cvsroot/levent/libevent/epoll.c,v
retrieving revision 1.6
retrieving revision 1.7
diff -u -r1.6 -r1.7
--- epoll.c 9 Apr 2003 18:12:11 -0000 1.6
+++ epoll.c 24 Jun 2003 14:45:21 -0000 1.7
@@ -173,7 +173,6 @@
struct epollop *epollop = arg;
struct epoll_event *events = epollop->events;
struct evepoll *evep;
- struct event *ev;
int i, res, timeout;
if (evsignal_deliver(&epollop->evsigmask) == -1)
@@ -200,32 +199,33 @@
for (i = 0; i < res; i++) {
int which = 0, what;
+ struct event *evread = NULL, *evwrite = NULL;
evep = (struct evepoll *)events[i].data.ptr;
what = events[i].events;
if (what & EPOLLIN) {
+ evread = evep->evread;
which |= EV_READ;
}
if (what & EPOLLOUT) {
+ evwrite = evep->evwrite;
which |= EV_WRITE;
}
if (!which)
continue;
- if (which & EV_READ) {
- ev = evep->evread;
- if (!(ev->ev_events & EV_PERSIST))
- event_del(ev);
- event_active(ev, EV_READ, 1);
- }
- if (which & EV_WRITE) {
- ev = evep->evwrite;
- if (!(ev->ev_events & EV_PERSIST))
- event_del(ev);
- event_active(ev, EV_WRITE, 1);
- }
+ if (evread != NULL && !(evread->ev_events & EV_PERSIST))
+ event_del(evread);
+ if (evwrite != NULL && evwrite != evread &&
+ !(evwrite->ev_events & EV_PERSIST))
+ event_del(evwrite);
+
+ if (evread != NULL)
+ event_active(evread, EV_READ, 1);
+ if (evwrite != NULL)
+ event_active(evwrite, EV_WRITE, 1);
}
return (0);
--zhXaljGHf11kAtnf--