ENOMEM error with plenty of free memory (was Crash on perlbal
1.53, possible Foundry influence)
dormando
dormando at rydia.net
Thu Feb 22 07:06:36 UTC 2007
Can you see how much memory each perlbal is taking up at different
points in time? Wondering if you're hitting a memory leak... otherwise,
standard checks... available ram, swap, etc? If you turn off vm
overcommit and remove swap (like I tend to do), you become pretty likely
to get malloc errors under memory pressure, even if it looks like you
have plenty free.
-Dormando
Brett G. Durrett wrote:
>
> I finally got a useful strace from a dying perlbal 1.53. It looks like
> a request to mmap2 for 130K of memory is failing. The odd thing is the
> host has more than three gigs of memory available (MemFree), so I am not
> sure why it would fail to allocate the memory. The strace below is from
> kernel 2.6.8-2-686, stock Debian kernel running Perlbal version 1.53. I
> am also seeing these failures on Perlbal versions 1.42 and 1.52 running
> on a custom 2.6.16 kernel. I have two machines in this pool that do not
> die like the six others, these are running a custom 2.6.19 kernel in
> 64-bit and they are running Perlbal 1.41.
>
> The failure pattern for these machines is usually a rotating failure
> where one will fail and then another will fail a few minutes later
> (rotation could be related to load balancing) and then there will be a
> quiet period of 15-20 minutes with no failures before the cycle begins
> again. Restarting the Perlbal process does not seem to help but,
> possibly coincidence, a reboot of four of the six failing servers
> eliminated the failures in our most recent cycle, after 2 hours of
> problems. The cycle will also stop on its own and seems to appear at
> random times, most commonly when we are not under heavy loads.
>
> I am hoping to dig into this more when I get some free time... sharing
> with the list in case this rings a bell with anybody...
>
>
> time(NULL) = 1171950537
> gettimeofday({1171950537, 835082}, NULL) = 0
> epoll_wait(5, {{EPOLLIN, {u32=507, u64=507}}}, 1000, 1000) = 1
> time(NULL) = 1171950537
> read(507, "<?xml version=\'1.0\'?>\n<methodCal"..., 1372) = 1372
> epoll_ctl(5, EPOLL_CTL_MOD, 507, {EPOLLERR|EPOLLHUP|0x20, {u32=507,
> u64=507}}) = 0
> time(NULL) = 1171950537
> setsockopt(196, SOL_TCP, TCP_CORK, [1], 4) = 0
> write(196, "POST /catalog/skudb/test3.php HT"..., 335) = 335
> setsockopt(196, SOL_TCP, TCP_CORK, [0], 4) = 0
> write(196, "<?xml version=\'1.0\'?>\n<methodCal"..., 1372) = 1372
> time(NULL) = 1171950537
> setsockopt(507, SOL_TCP, TCP_CORK, [1], 4) = 0
> time(NULL) = 1171950537
> time(NULL) = 1171950537
> gettimeofday({1171950537, 836400}, NULL) = 0
> epoll_wait(5, {{EPOLLIN, {u32=867, u64=867}}}, 1000, 1000) = 1
> time(NULL) = 1171950537
> read(867, "2/maDGSiR\nQO0NyL0FBwKmtb2yrJ/0kT"..., 131072) = 1460
> time(NULL) = 1171950537
> time(NULL) = 1171950537
> gettimeofday({1171950537, 836689}, NULL) = 0
> epoll_wait(5, {{EPOLLIN, {u32=1857, u64=1857}}}, 1000, 1000) = 1
> read(1857, "HTTP/1.1 304 Not Modified\r\nDate:"..., 102400) = 338
> write(1242, "HTTP/1.0 304 Not Modified\r\nDate:"..., 339) = 339
> time(NULL) = 1171950537
> setsockopt(1242, SOL_TCP, TCP_CORK, [0], 4) = 0
> epoll_ctl(5, EPOLL_CTL_MOD, 1242, {EPOLLIN|EPOLLERR|EPOLLHUP|0x20,
> {u32=1242, u64=1242}}) = 0
> time(NULL) = 1171950537
> time(NULL) = 1171950537
> gettimeofday({1171950537, 838553}, NULL) = 0
> epoll_wait(5, {{EPOLLIN, {u32=1268, u64=1268}}}, 1000, 1000) = 1
> time(NULL) = 1171950537
> read(1268, "POST /catalog/skudb/test3.php HT"..., 102400) = 361
> time(NULL) = 1171950537
> time(NULL) = 1171950537
> gettimeofday({1171950537, 839257}, NULL) = 0
> epoll_wait(5, {{EPOLLIN, {u32=212, u64=212}},
> {EPOLLERR|EPOLLHUP|0x807a000, {u32=134717464, u64=133629452197912}}},
> 1000, 1000) = 2
> time(NULL) = 1171950537
> read(212, "POST /catalog/skudb/test3.php HT"..., 102400) = 361
> time(NULL) = 1171950537
> *mmap2(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
> -1, 0) = -1 ENOMEM (Cannot allocate memory)*
> brk(0) = 0x3ff90000
> brk(0x3ffd0000) = 0x3ffd0000
> read(867, "Kx3SrfAHKJAk\nf2VUc7dddUEKLC3R9hs"..., 131072) = 1460
> time(NULL) = 1171950537
> time(NULL) = 1171950537
> gettimeofday({1171950537, 842823}, NULL) = 0
> epoll_wait(5, {{EPOLLIN, {u32=1579, u64=1579}}}, 1000, 1000) = 1
> read(1579, "HTTP/1.1 304 Not Modified\r\nDate:"..., 102400) = 338
> write(1265, "HTTP/1.0 304 Not Modified\r\nDate:"..., 339) = 339
> time(NULL) = 1171950537
> epoll_ctl(5, EPOLL_CTL_DEL, 1579, {EPOLLIN|EPOLLERR|EPOLLHUP|0x20,
> {u32=1579, u64=1579}}) = 0
> time(NULL) = 1171950537
> setsockopt(1265, SOL_TCP, TCP_CORK, [0], 4) = 0
> epoll_ctl(5, EPOLL_CTL_MOD, 1265, {EPOLLIN|EPOLLERR|EPOLLHUP|0x20,
> {u32=1265, u64=1265}}) = 0
> close(1579) = 0
> time(NULL) = 1171950537
> time(NULL) = 1171950537
> gettimeofday({1171950537, 848627}, NULL) = 0
> epoll_wait(5, {{EPOLLIN, {u32=225, u64=225}}}, 1000, 1000) = 1
> read(225, "HTTP/1.1 200 OK\r\nDate: Tue, 20 F"..., 102400) = 2967
> write(447, "HTTP/1.0 200 OK\r\nDate: Tue, 20 F"..., 450) = 450
> write(447, "<?xml version=\"1.0\" encoding=\"UT"..., 2517) = 2517
> time(NULL) = 1171950537
> epoll_ctl(5, EPOLL_CTL_DEL, 225, {EPOLLIN|EPOLLERR|EPOLLHUP|0x20,
> {u32=225, u64=225}}) = 0
> time(NULL) = 1171950537
> setsockopt(447, SOL_TCP, TCP_CORK, [0], 4) = 0
> epoll_ctl(5, EPOLL_CTL_MOD, 447, {EPOLLIN|EPOLLERR|EPOLLHUP|0x20,
> {u32=447, u64=447}}) = 0
> close(225) = 0
> time(NULL) = 1171950537
> time(NULL) = 1171950537
> gettimeofday({1171950537, 850399}, NULL) = 0
> epoll_wait(5, {{EPOLLIN, {u32=867, u64=867}}}, 1000, 1000) = 1
> time(NULL) = 1171950537
> read(867, "9NDuQvO9A2Cdq3T\nVWt5Wf5ils86u/9N"..., 131072) = 1460
> time(NULL) = 1171950537
> time(NULL) = 1171950537
> gettimeofday({1171950537, 850847}, NULL) = 0
> epoll_wait(5, {{EPOLLIN, {u32=867, u64=867}}}, 1000, 1000) = 1
> time(NULL) = 1171950537
> *mmap2(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
> -1, 0) = -1 ENOMEM (Cannot allocate memory)*
> brk(0) = 0x3ffd0000
> brk(0x40010000) = 0x3ffd0000
> *mmap2(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
> -1, 0) = -1 ENOMEM (Cannot allocate memory)*
> *write(2, "Out of memory!\n", 15) = 15*
> write(8, "\270\365\322\10", 4) = 4
> write(8, "\20\334\244\10", 4) = 4
> write(8, " 8\266\10", 4) = 4
> select(10, [9], NULL, NULL, NULL) = 1 (in [9])
> read(9, "\270\365\322\10", 4) = 4
> read(9, "\20\334\244\10", 4) = 4
> read(9, " 8\266\10", 4) = 4
> read(9, 0xbffff908, 4) = -1 EAGAIN (Resource
> temporarily unavailable)
> close(1702) = 0
> close(1743) = 0
> close(1704) = 0
> close(1664) = 0
> close(1406) = 0
>
>
>
More information about the perlbal
mailing list