Perlbal reproxied downloads stalling

Brad Fitzpatrick brad at danga.com
Wed Sep 13 20:04:17 UTC 2006


We think we might be seeing this at work now too, so rest assured we're
working on it....

Thanks for the wonderful debug work you've done.  That'll help a bunch.

Will let you know when I know more.

- Brad


On Tue, 12 Sep 2006, Sam G wrote:

> On Sat, 09 Sep 2006 09:36:58 -0700, Brad Fitzpatrick <brad at danga.com>
> wrote:
>
> > Try increasing the server's "buffer_size_reproxy_url" setting.  Still, it
> > shouldn't freeze, so something more is messed up.
> >
> > I can't reproduce this here, even though I thought I'd be able to.  I get
> > 350 kB/second to my house from both my origin and reproxy-URL'd services,
> > which live on different hosts.
> >
> > What's your perlbal configuration and network look like?  What's the
> > latency between the three parties involved?  (ignoring the first server
> > that issues the x-reproxy-url header)
>
> There is very low latency between all three parties. The perlbal machine
> is on the same local network as the storage box, pulling data through
> Apache webdav. The downloading machine is geographically close and
> connected at high bandwidth to the datacenter which houses the first two
> machines.
>
> >
> > BTW, the aio mode won't matter for this... it's a pure network operation.
> > No filesystem calls involved to do async.
> >
> > When a download stalls, telnet into the Perlbal management port and type
> > "socks".  What's the output?  In particular, I'm wondering what state and
> > buffer sizes the different parties are at.
>
> Relevant Perlbal config:
>
> CREATE POOL php_apaches
>    SET nodefile = /etc/perlbal/nodelist.dat
>
> CREATE SERVICE balancer
>    SET listen          = 10.10.10.251:80
>    SET role            = reverse_proxy
>    SET enable_reproxy  = true
>    SET pool            = php_apaches
>    SET persist_client  = on
>    SET persist_backend = on
>    SET verify_backend  = on
> ENABLE balancer
>
> Perlbal machine IP: 10.10.10.251, storage IP: 10.10.30.101, client IP
> requesting download: 67.180.143.68.
>
> Before download starts, everything idle:
>
> socks
>     fd    age
>      4 405805s Perlbal::TCPListener(R): open: listening on 10.10.10.251:80
> for service 'balancer'
>      8 405805s Perlbal::TCPListener(R): open: listening on 127.0.0.1:60001
> for service 'mgmt'
>     12    35s Perlbal::ClientManage(R): open to 127.0.0.1:38802
> .
>
> After download starts, is progressing:
>
> socks
>     fd    age
>      4 405813s Perlbal::TCPListener(R): open: listening on 10.10.10.251:80
> for service 'balancer'
>      8 405813s Perlbal::TCPListener(R): open: listening on 127.0.0.1:60001
> for service 'mgmt'
>      9     4s Perlbal::ClientProxy(RW): open to 67.180.143.68:4791:
> localport=80; reqs=1; xfer_res; http://url/file_id=f_7926310;
> backend=10.10.30.101:80; responded
>     10     4s Perlbal::ClientProxy(R): open to 67.180.143.68:4792:
> localport=80; reqs=2; persist_wait
>     11     4s Perlbal::BackendHTTP(R): open to 10.10.30.101:80:
> localport=50561; client=9; uses=0; xfer_res
>     12    43s Perlbal::ClientManage(R): open to 127.0.0.1:38802
>     13     4s Perlbal::BackendHTTP(R): open to 10.10.20.103:80:
> localport=55473; uses=4; bored; has_attention=yes
>     14     4s Perlbal::ClientProxy(R): open to 67.180.143.68:4793:
> localport=80; reqs=1; persist_wait
>     15     4s Perlbal::ClientProxy(R): open to 67.180.143.68:4794:
> localport=80; reqs=1; persist_wait
> .
> socks
>     fd    age
>      4 405815s Perlbal::TCPListener(R): open: listening on 10.10.10.251:80
> for service 'balancer'
>      8 405815s Perlbal::TCPListener(R): open: listening on 127.0.0.1:60001
> for service 'mgmt'
>      9     6s Perlbal::ClientProxy(RW): open to 67.180.143.68:4791:
> localport=80; reqs=1; xfer_res; http://url/file_id=f_7926310;
> backend=10.10.30.101:80; responded
>     10     6s Perlbal::ClientProxy(R): open to 67.180.143.68:4792:
> localport=80; reqs=2; persist_wait
>     11     6s Perlbal::BackendHTTP(): open to 10.10.30.101:80:
> localport=50561; client=9; uses=0; xfer_res
>     12    45s Perlbal::ClientManage(R): open to 127.0.0.1:38802
>     14     6s Perlbal::ClientProxy(R): open to 67.180.143.68:4793:
> localport=80; reqs=1; persist_wait
>     15     6s Perlbal::ClientProxy(R): open to 67.180.143.68:4794:
> localport=80; reqs=1; persist_wait
> .
>
> At this point the download is totally frozen:
>
> socks
>     fd    age
>      4 405818s Perlbal::TCPListener(R): open: listening on 10.10.10.251:80
> for service 'balancer'
>      8 405818s Perlbal::TCPListener(R): open: listening on 127.0.0.1:60001
> for service 'mgmt'
>      9     9s Perlbal::ClientProxy(RW): open to 67.180.143.68:4791:
> localport=80; reqs=1; xfer_res; http://url/file_id=f_7926310;
> backend=10.10.30.101:80; responded
>     10     9s Perlbal::ClientProxy(R): open to 67.180.143.68:4792:
> localport=80; reqs=2; persist_wait
>     11     9s Perlbal::BackendHTTP(): open to 10.10.30.101:80:
> localport=50561; client=9; uses=0; xfer_res
>     12    48s Perlbal::ClientManage(R): open to 127.0.0.1:38802
>     14     9s Perlbal::ClientProxy(R): open to 67.180.143.68:4793:
> localport=80; reqs=1; persist_wait
>     15     9s Perlbal::ClientProxy(R): open to 67.180.143.68:4794:
> localport=80; reqs=1; persist_wait
> .
>
>
>
> I cancel the download, and it returns to first state:
>
> socks
>     fd    age
>      4 405931s Perlbal::TCPListener(R): open: listening on 10.10.10.251:80
> for service 'balancer'
>      8 405931s Perlbal::TCPListener(R): open: listening on 127.0.0.1:60001
> for service 'mgmt'
>     12   161s Perlbal::ClientManage(R): open to 127.0.0.1:38802
> .
>
>
> Let me know what else I can get you.
>
> -Sam
>
> >
> >
> >
> >
> > On Fri, 8 Sep 2006, Sam G wrote:
> >
> >> This seems to be the bulk of what's happening:
> >>
> >> ===========
> >>
> >> <lots of large chunks of text scrolling by as file downloads...>
> >> ....
> >> <snip>
> >> ....
> >> epoll_ctl(5, EPOLL_CTL_MOD, 9, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x20,
> >> {u32=9, u64=9}}) = 0
> >> time(NULL)                              = 1157770135
> >> gettimeofday({1157770135, 330342}, NULL) = 0
> >> epoll_wait(5, {{EPOLLIN, {u32=12, u64=12}}}, 1000, 1000) = 1
> >> read(12,
> >> "\2100)\"\33\31\207\244\241\0254\210v\360W{\321$\221\331^\302\237\22480\33\257\35\2220\25\337f\17#e\7%\253\23\330\21\356\321\250fd\204\241%\372hD\374h\214\225\16\232w\322 at O\16m_!\375\237w)\20\n\256YF\353B\10|\231j7\23
> >> 1\363\337QnA\210R\365\337\\5,:\3737W\341pI\206\273\227~\2472\361\316Q\301\37V\307\371\216\354\234k
> >> \223\247\333\316\'\f\235\226h<\344\344uD\25\251\253\4\364*\361\215\264\234\377\202\222\27\262\6\257\223J\6\313\344\300\331\222\
> >> 243\304U\211\304\257\206vO\341\322\307\346\307\270I\300\362\330>xH\267\32+\210\250&/8u\262\223|9\7cK\26\256\234\254\207j,e|1\271\246.\304\215d\35\327\305\342\321\267\303>\323*\265\23\16\211\365\350\346\223\316\237,J\251\210/Ud
> >> g\377\231`KZ\34\342\302\'\320\240T\177\321\304\7\320*\241T)\326\256I\264\"\244\376\231&M\310\243\254\240^h#\237N\345i\273\221V3\253q\232\277\6\225\\\f\226\247\3!qi\341`\323\rV\251.`w\26\27\17
> >> `\247\346 \232\257\356/b\205\223\'
> >> ?\366\nz\242L\304\261\255\223\276-\231\361\371\314\31\303\303\367
> >> {\24ha(\374\34\315\365\270\212"..., 61449) = 61449
> >> time(NULL)                              = 1157770135
> >> gettimeofday({1157770135, 330707}, NULL) = 0
> >> epoll_wait(5, {{EPOLLIN, {u32=12, u64=12}}}, 1000, 1000) = 1
> >> epoll_ctl(5, EPOLL_CTL_MOD, 12, {EPOLLERR|EPOLLHUP|0x20, {u32=12,
> >> u64=12}}) = 0
> >> time(NULL)                              = 1157770135
> >> gettimeofday({1157770135, 330867}, NULL) = 0
> >> epoll_wait(5, {{EPOLLOUT, {u32=9, u64=9}}}, 1000, 1000) = 1
> >> time(NULL)                              = 1157770135
> >> write(9,
> >> "Q\344\7\330,\301\355!k\3278\31:\256\371\264<\330\352\310j\225\30\325\310\361\360D\353\353w\344L\3\341*\345\227\3460\204\354\23Q\332\316-O\301\350\270w\206t\235v\365\300r;\367\377\220s\244\311\302\226\372\0\357\3025\3
> >> 60\266\336\341\0\377%O\3324\213\364\344\201D\232si\5\301JS?\341|o\250\274\377\240F\364\32\250$\24\372\315\325q\233_\350at\324\363\320\343qG\32\221\345\f\272\324\t\2524\\\232]^\301\241\220\212\241R\372\3240\360*A\3709}\203@\22\
> >> 313\n(_\200H\263\335\243\261\342Q\33\16\264\243_\230\372A:g*Y\331\324\325F\326N\27,~\200\235\306_\222@\301\246\266\r\374\202\216\311A\2mb\3\213+r\201\22K\323\374\'\277\5l\353\10\355+<~\35\\\272\320\277\323\34\37\345\361\335\26
> >> 1\27\305\276\224\321>\222JKNjwo\336\224PmJ\246\1A\210\374\27w\33\377\r\21\340\32\326\326f\332\255\214T0%\253,!1\212\206\374\351UE\25\274\360*>\310\236\262hcP[-ZW\27\0f\311+ at U\305\214\2701e\10\320\5\33S\302K\352\366\346.\314\36
> >> 7:Z9\316j\313]K\342\356X\31\245S\5\251\t8\223KXrZ\277\277\236I\352;T\200\327\r1\240iO\21\31\301\10"...,
> >> 26868) = 26868
> >> write(9,
> >> "\2100)\"\33\31\207\244\241\0254\210v\360W{\321$\221\331^\302\237\22480\33\257\35\2220\25\337f\17#e\7%\253\23\330\21\356\321\250fd\204\241%\372hD\374h\214\225\16\232w\322 at O\16m_!\375\237w)\20\n\256YF\353B\10|\231j7\23
> >> 1\363\337QnA\210R\365\337\\5,:\3737W\341pI\206\273\227~\2472\361\316Q\301\37V\307\371\216\354\234k
> >> \223\247\333\316\'\f\235\226h<\344\344uD\25\251\253\4\364*\361\215\264\234\377\202\222\27\262\6\257\223J\6\313\344\300\331\222\
> >> 243\304U\211\304\257\206vO\341\322\307\346\307\270I\300\362\330>xH\267\32+\210\250&/8u\262\223|9\7cK\26\256\234\254\207j,e|1\271\246.\304\215d\35\327\305\342\321\267\303>\323*\265\23\16\211\365\350\346\223\316\237,J\251\210/Ud
> >> g\377\231`KZ\34\342\302\'\320\240T\177\321\304\7\320*\241T)\326\256I\264\"\244\376\231&M\310\243\254\240^h#\237N\345i\273\221V3\253q\232\277\6\225\\\f\226\247\3!qi\341`\323\rV\251.`w\26\27\17
> >> `\247\346 \232\257\356/b\205\223\'
> >> ?\366\nz\242L\304\261\255\223\276-\231\361\371\314\31\303\303\367
> >> {\24ha(\374\34\315\365\270\212"..., 61449) = 61449
> >> epoll_ctl(5, EPOLL_CTL_MOD, 9, {EPOLLIN|EPOLLERR|EPOLLHUP|0x20, {u32=9,
> >> u64=9}}) = 0
> >> epoll_ctl(5, EPOLL_CTL_MOD, 12, {EPOLLIN|EPOLLERR|EPOLLHUP|0x20,
> >> {u32=12,
> >> u64=12}}) = 0
> >> time(NULL)                              = 1157770135
> >> gettimeofday({1157770135, 505376}, NULL) = 0
> >> epoll_wait(5, {{EPOLLIN, {u32=12, u64=12}}}, 1000, 1000) = 1
> >> read(12,
> >> "\\\270\t\311\260&\347y\300\7$*+\245M\364.\306\255\246\357\177\206F\'\3038&\335h#\233\33z\216z\276l;\340_U\352\27%2\33\347\217\220\251g\'\220R\20\274\201b\31\232r\370\263\240\331\305\30\323\3351Z\372\276>\3768[eg\f\32
> >> 3\330\265\276K\200\tR[\36\363\v\334\225\204\266\24)\351\34\345\20h_\245x\206\232\336\217\261\1\362
> >> \303sj\214\251\353qQ\235g\244CJ\372\344\343U\221?y*\373\1\341\274\203\265W\356\233\2=\331\303B\264U\211\231\333\235\253|\217a\2
> >> 60\224v>\335\354\212\316\334\252eU\226\3\n\n65\34\30\230zf\2\253\375#1f\276\17\352\212\30\313\"\376\347\307+\365\314\337?r\25745\354tj\23\306\250\27\30\255\213\2649`\30\22\223\316\351\240.\"\362e\335\201\255L\305\335\207\301\2
> >> 70\266@\226\347\272\373\231G\267\207I\262\352\272\247\352\17\231.\32\345\202\244H\335\346{{\37\25{\310]\32\305\teJ\226[\336X\232Wn\5\327\335Z\247\343y\327W\241l\277\231\375\232\262\261\177]Mur\275\271y\223\270\2020\267\366EE~e
> >> \226L\326\247\f\334Mi$b\v\377\255\351K\320ORR\225}Z\331k\344\213\337V\245b.b_]\36\27_\375f\211\25"...,
> >> 61449) = 61449
> >> write(9,
> >> "\\\270\t\311\260&\347y\300\7$*+\245M\364.\306\255\246\357\177\206F\'\3038&\335h#\233\33z\216z\276l;\340_U\352\27%2\33\347\217\220\251g\'\220R\20\274\201b\31\232r\370\263\240\331\305\30\323\3351Z\372\276>\3768[eg\f\32
> >> 3\330\265\276K\200\tR[\36\363\v\334\225\204\266\24)\351\34\345\20h_\245x\206\232\336\217\261\1\362
> >> \303sj\214\251\353qQ\235g\244CJ\372\344\343U\221?y*\373\1\341\274\203\265W\356\233\2=\331\303B\264U\211\231\333\235\253|\217a\2
> >> 60\224v>\335\354\212\316\334\252eU\226\3\n\n65\34\30\230zf\2\253\375#1f\276\17\352\212\30\313\"\376\347\307+\365\314\337?r\25745\354tj\23\306\250\27\30\255\213\2649`\30\22\223\316\351\240.\"\362e\335\201\255L\305\335\207\301\2
> >> 70\266@\226\347\272\373\231G\267\207I\262\352\272\247\352\17\231.\32\345\202\244H\335\346{{\37\25{\310]\32\305\teJ\226[\336X\232Wn\5\327\335Z\247\343y\327W\241l\277\231\375\232\262\261\177]Mur\275\271y\223\270\2020\267\366EE~e
> >> \226L\326\247\f\334Mi$b\v\377\255\351K\320ORR\225}Z\331k\344\213\337V\245b.b_]\36\27_\375f\211\25"...,
> >> 61449) = 9963
> >> epoll_ctl(5, EPOLL_CTL_MOD, 9, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x20,
> >> {u32=9, u64=9}}) = 0
> >> time(NULL)                              = 1157770135
> >> gettimeofday({1157770135, 505943}, NULL) = 0
> >> epoll_wait(5, {{EPOLLIN, {u32=12, u64=12}}}, 1000, 1000) = 1
> >> epoll_ctl(5, EPOLL_CTL_MOD, 12, {EPOLLERR|EPOLLHUP|0x20, {u32=12,
> >> u64=12}}) = 0
> >> time(NULL)                              = 1157770135
> >> gettimeofday({1157770135, 506101}, NULL) = 0
> >> epoll_wait(5, {}, 1000, 1000)           = 0
> >> time(NULL)                              = 1157770136
> >> gettimeofday({1157770136, 507274}, NULL) = 0
> >> epoll_wait(5, {}, 1000, 1000)           = 0
> >> time(NULL)                              = 1157770137
> >> gettimeofday({1157770137, 507277}, NULL) = 0
> >> epoll_wait(5, {}, 1000, 1000)           = 0
> >> time(NULL)                              = 1157770138
> >> gettimeofday({1157770138, 507259}, NULL) = 0
> >> epoll_wait(5, {}, 1000, 881)            = 0
> >> time(NULL)                              = 1157770139
> >> gettimeofday({1157770139, 391390}, NULL) = 0
> >> time(NULL)                              = 1157770139
> >> gettimeofday({1157770139, 391504}, NULL) = 0
> >> epoll_wait(5, {}, 1000, 1000)           = 0
> >> time(NULL)                              = 1157770140
> >> gettimeofday({1157770140, 391353}, NULL) = 0
> >> epoll_wait(5, {{EPOLLIN, {u32=4, u64=4}}}, 1000, 1000) = 1
> >>
> >> ===========
> >>
> >> Right when the chunks stop is when the download freezes, at about 10%
> >> of a
> >> 50mb file. Let me know if you need more/different parts of the trace.
> >> Thanks!
> >>
> >> -Sam
> >>
> >> On Fri, 08 Sep 2006 10:47:33 -0700, Brad Fitzpatrick <brad at danga.com>
> >> wrote:
> >>
> >> > Hm.  Can you get an strace on the main Perlbal process while this is
> >> > happening?
> >> >
> >> > $ strace -s 500 -o perlbal.trace -p <pid-of-perlbal>
> >> > ...... (wait)
> >> > ^C
> >> >
> >> >
> >> >
> >> > On Fri, 8 Sep 2006, Sam G wrote:
> >> >
> >> >> Hi all,
> >> >>
> >> >> I have perlbal installed on a Centos 4.4 x64 dual Opteron server w/
> >> 4GB
> >> >> RAM, and am reproxying from a webserver to read files from a
> >> clustered
> >> >> storage backend running Apache/DAV. When I do a reproxied download,
> >> >> things
> >> >> seem to go well until a few megabytes into the file; then, the
> >> download
> >> >> seems to slow down sharply then hang. It will happen at different
> >> parts
> >> >> of
> >> >> the file, but the speed always starts out strong (500kb/s to 2mb/s,
> >> >> depending on the connection I'm downloading from) but after 10
> >> seconds
> >> >> or
> >> >> so comes to a halt. Sometimes it'll slow way down, speed back up, but
> >> >> then
> >> >> slow down again, finally hanging. This happens for virtually any
> >> large
> >> >> file.
> >> >>
> >> >> I've tried both 'ioaio' and 'none' for the aio_mode parameter
> >> (Linux::IO
> >> >> won't pass tests), but same thing happens in both cases. I can pull
> >> from
> >> >> the storage to the perlbal machine no problem with HTTP wget, at
> >> about
> >> >> 30
> >> >> MB/s uninterrupted. As the reeproxied download happens, perlbal CPU
> >> >> usage
> >> >> never goes above 1%. I have the default buffer settings; have tried
> >> >> tweaking a little but to no avail.
> >> >>
> >> >> Anyone have any ideas why this might be happening?
> >> >>
> >> >> --
> >> >> Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
> >> >>
> >> >>
> >>
> >>
> >>
> >> --
> >> Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
> >>
> >>
>
>
>
> --
> Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
>
>


More information about the perlbal mailing list