Perlbal reproxied downloads stalling

Sam G ceptorial at comcast.net
Wed Sep 13 04:06:26 UTC 2006


On Sat, 09 Sep 2006 09:36:58 -0700, Brad Fitzpatrick <brad at danga.com>  
wrote:

> Try increasing the server's "buffer_size_reproxy_url" setting.  Still, it
> shouldn't freeze, so something more is messed up.
>
> I can't reproduce this here, even though I thought I'd be able to.  I get
> 350 kB/second to my house from both my origin and reproxy-URL'd services,
> which live on different hosts.
>
> What's your perlbal configuration and network look like?  What's the
> latency between the three parties involved?  (ignoring the first server
> that issues the x-reproxy-url header)

There is very low latency between all three parties. The perlbal machine  
is on the same local network as the storage box, pulling data through  
Apache webdav. The downloading machine is geographically close and  
connected at high bandwidth to the datacenter which houses the first two  
machines.

>
> BTW, the aio mode won't matter for this... it's a pure network operation.
> No filesystem calls involved to do async.
>
> When a download stalls, telnet into the Perlbal management port and type
> "socks".  What's the output?  In particular, I'm wondering what state and
> buffer sizes the different parties are at.

Relevant Perlbal config:

CREATE POOL php_apaches
   SET nodefile = /etc/perlbal/nodelist.dat

CREATE SERVICE balancer
   SET listen          = 10.10.10.251:80
   SET role            = reverse_proxy
   SET enable_reproxy  = true
   SET pool            = php_apaches
   SET persist_client  = on
   SET persist_backend = on
   SET verify_backend  = on
ENABLE balancer

Perlbal machine IP: 10.10.10.251, storage IP: 10.10.30.101, client IP  
requesting download: 67.180.143.68.

Before download starts, everything idle:

socks
    fd    age
     4 405805s Perlbal::TCPListener(R): open: listening on 10.10.10.251:80  
for service 'balancer'
     8 405805s Perlbal::TCPListener(R): open: listening on 127.0.0.1:60001  
for service 'mgmt'
    12    35s Perlbal::ClientManage(R): open to 127.0.0.1:38802
.

After download starts, is progressing:

socks
    fd    age
     4 405813s Perlbal::TCPListener(R): open: listening on 10.10.10.251:80  
for service 'balancer'
     8 405813s Perlbal::TCPListener(R): open: listening on 127.0.0.1:60001  
for service 'mgmt'
     9     4s Perlbal::ClientProxy(RW): open to 67.180.143.68:4791:  
localport=80; reqs=1; xfer_res; http://url/file_id=f_7926310;  
backend=10.10.30.101:80; responded
    10     4s Perlbal::ClientProxy(R): open to 67.180.143.68:4792:  
localport=80; reqs=2; persist_wait
    11     4s Perlbal::BackendHTTP(R): open to 10.10.30.101:80:  
localport=50561; client=9; uses=0; xfer_res
    12    43s Perlbal::ClientManage(R): open to 127.0.0.1:38802
    13     4s Perlbal::BackendHTTP(R): open to 10.10.20.103:80:  
localport=55473; uses=4; bored; has_attention=yes
    14     4s Perlbal::ClientProxy(R): open to 67.180.143.68:4793:  
localport=80; reqs=1; persist_wait
    15     4s Perlbal::ClientProxy(R): open to 67.180.143.68:4794:  
localport=80; reqs=1; persist_wait
.
socks
    fd    age
     4 405815s Perlbal::TCPListener(R): open: listening on 10.10.10.251:80  
for service 'balancer'
     8 405815s Perlbal::TCPListener(R): open: listening on 127.0.0.1:60001  
for service 'mgmt'
     9     6s Perlbal::ClientProxy(RW): open to 67.180.143.68:4791:  
localport=80; reqs=1; xfer_res; http://url/file_id=f_7926310;  
backend=10.10.30.101:80; responded
    10     6s Perlbal::ClientProxy(R): open to 67.180.143.68:4792:  
localport=80; reqs=2; persist_wait
    11     6s Perlbal::BackendHTTP(): open to 10.10.30.101:80:  
localport=50561; client=9; uses=0; xfer_res
    12    45s Perlbal::ClientManage(R): open to 127.0.0.1:38802
    14     6s Perlbal::ClientProxy(R): open to 67.180.143.68:4793:  
localport=80; reqs=1; persist_wait
    15     6s Perlbal::ClientProxy(R): open to 67.180.143.68:4794:  
localport=80; reqs=1; persist_wait
.

At this point the download is totally frozen:

socks
    fd    age
     4 405818s Perlbal::TCPListener(R): open: listening on 10.10.10.251:80  
for service 'balancer'
     8 405818s Perlbal::TCPListener(R): open: listening on 127.0.0.1:60001  
for service 'mgmt'
     9     9s Perlbal::ClientProxy(RW): open to 67.180.143.68:4791:  
localport=80; reqs=1; xfer_res; http://url/file_id=f_7926310;  
backend=10.10.30.101:80; responded
    10     9s Perlbal::ClientProxy(R): open to 67.180.143.68:4792:  
localport=80; reqs=2; persist_wait
    11     9s Perlbal::BackendHTTP(): open to 10.10.30.101:80:  
localport=50561; client=9; uses=0; xfer_res
    12    48s Perlbal::ClientManage(R): open to 127.0.0.1:38802
    14     9s Perlbal::ClientProxy(R): open to 67.180.143.68:4793:  
localport=80; reqs=1; persist_wait
    15     9s Perlbal::ClientProxy(R): open to 67.180.143.68:4794:  
localport=80; reqs=1; persist_wait
.



I cancel the download, and it returns to first state:

socks
    fd    age
     4 405931s Perlbal::TCPListener(R): open: listening on 10.10.10.251:80  
for service 'balancer'
     8 405931s Perlbal::TCPListener(R): open: listening on 127.0.0.1:60001  
for service 'mgmt'
    12   161s Perlbal::ClientManage(R): open to 127.0.0.1:38802
.


Let me know what else I can get you.

-Sam

>
>
>
>
> On Fri, 8 Sep 2006, Sam G wrote:
>
>> This seems to be the bulk of what's happening:
>>
>> ===========
>>
>> <lots of large chunks of text scrolling by as file downloads...>
>> ....
>> <snip>
>> ....
>> epoll_ctl(5, EPOLL_CTL_MOD, 9, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x20,
>> {u32=9, u64=9}}) = 0
>> time(NULL)                              = 1157770135
>> gettimeofday({1157770135, 330342}, NULL) = 0
>> epoll_wait(5, {{EPOLLIN, {u32=12, u64=12}}}, 1000, 1000) = 1
>> read(12,
>> "\2100)\"\33\31\207\244\241\0254\210v\360W{\321$\221\331^\302\237\22480\33\257\35\2220\25\337f\17#e\7%\253\23\330\21\356\321\250fd\204\241%\372hD\374h\214\225\16\232w\322 at O\16m_!\375\237w)\20\n\256YF\353B\10|\231j7\23
>> 1\363\337QnA\210R\365\337\\5,:\3737W\341pI\206\273\227~\2472\361\316Q\301\37V\307\371\216\354\234k
>> \223\247\333\316\'\f\235\226h<\344\344uD\25\251\253\4\364*\361\215\264\234\377\202\222\27\262\6\257\223J\6\313\344\300\331\222\
>> 243\304U\211\304\257\206vO\341\322\307\346\307\270I\300\362\330>xH\267\32+\210\250&/8u\262\223|9\7cK\26\256\234\254\207j,e|1\271\246.\304\215d\35\327\305\342\321\267\303>\323*\265\23\16\211\365\350\346\223\316\237,J\251\210/Ud
>> g\377\231`KZ\34\342\302\'\320\240T\177\321\304\7\320*\241T)\326\256I\264\"\244\376\231&M\310\243\254\240^h#\237N\345i\273\221V3\253q\232\277\6\225\\\f\226\247\3!qi\341`\323\rV\251.`w\26\27\17
>> `\247\346 \232\257\356/b\205\223\'
>> ?\366\nz\242L\304\261\255\223\276-\231\361\371\314\31\303\303\367
>> {\24ha(\374\34\315\365\270\212"..., 61449) = 61449
>> time(NULL)                              = 1157770135
>> gettimeofday({1157770135, 330707}, NULL) = 0
>> epoll_wait(5, {{EPOLLIN, {u32=12, u64=12}}}, 1000, 1000) = 1
>> epoll_ctl(5, EPOLL_CTL_MOD, 12, {EPOLLERR|EPOLLHUP|0x20, {u32=12,
>> u64=12}}) = 0
>> time(NULL)                              = 1157770135
>> gettimeofday({1157770135, 330867}, NULL) = 0
>> epoll_wait(5, {{EPOLLOUT, {u32=9, u64=9}}}, 1000, 1000) = 1
>> time(NULL)                              = 1157770135
>> write(9,
>> "Q\344\7\330,\301\355!k\3278\31:\256\371\264<\330\352\310j\225\30\325\310\361\360D\353\353w\344L\3\341*\345\227\3460\204\354\23Q\332\316-O\301\350\270w\206t\235v\365\300r;\367\377\220s\244\311\302\226\372\0\357\3025\3
>> 60\266\336\341\0\377%O\3324\213\364\344\201D\232si\5\301JS?\341|o\250\274\377\240F\364\32\250$\24\372\315\325q\233_\350at\324\363\320\343qG\32\221\345\f\272\324\t\2524\\\232]^\301\241\220\212\241R\372\3240\360*A\3709}\203@\22\
>> 313\n(_\200H\263\335\243\261\342Q\33\16\264\243_\230\372A:g*Y\331\324\325F\326N\27,~\200\235\306_\222@\301\246\266\r\374\202\216\311A\2mb\3\213+r\201\22K\323\374\'\277\5l\353\10\355+<~\35\\\272\320\277\323\34\37\345\361\335\26
>> 1\27\305\276\224\321>\222JKNjwo\336\224PmJ\246\1A\210\374\27w\33\377\r\21\340\32\326\326f\332\255\214T0%\253,!1\212\206\374\351UE\25\274\360*>\310\236\262hcP[-ZW\27\0f\311+ at U\305\214\2701e\10\320\5\33S\302K\352\366\346.\314\36
>> 7:Z9\316j\313]K\342\356X\31\245S\5\251\t8\223KXrZ\277\277\236I\352;T\200\327\r1\240iO\21\31\301\10"...,
>> 26868) = 26868
>> write(9,
>> "\2100)\"\33\31\207\244\241\0254\210v\360W{\321$\221\331^\302\237\22480\33\257\35\2220\25\337f\17#e\7%\253\23\330\21\356\321\250fd\204\241%\372hD\374h\214\225\16\232w\322 at O\16m_!\375\237w)\20\n\256YF\353B\10|\231j7\23
>> 1\363\337QnA\210R\365\337\\5,:\3737W\341pI\206\273\227~\2472\361\316Q\301\37V\307\371\216\354\234k
>> \223\247\333\316\'\f\235\226h<\344\344uD\25\251\253\4\364*\361\215\264\234\377\202\222\27\262\6\257\223J\6\313\344\300\331\222\
>> 243\304U\211\304\257\206vO\341\322\307\346\307\270I\300\362\330>xH\267\32+\210\250&/8u\262\223|9\7cK\26\256\234\254\207j,e|1\271\246.\304\215d\35\327\305\342\321\267\303>\323*\265\23\16\211\365\350\346\223\316\237,J\251\210/Ud
>> g\377\231`KZ\34\342\302\'\320\240T\177\321\304\7\320*\241T)\326\256I\264\"\244\376\231&M\310\243\254\240^h#\237N\345i\273\221V3\253q\232\277\6\225\\\f\226\247\3!qi\341`\323\rV\251.`w\26\27\17
>> `\247\346 \232\257\356/b\205\223\'
>> ?\366\nz\242L\304\261\255\223\276-\231\361\371\314\31\303\303\367
>> {\24ha(\374\34\315\365\270\212"..., 61449) = 61449
>> epoll_ctl(5, EPOLL_CTL_MOD, 9, {EPOLLIN|EPOLLERR|EPOLLHUP|0x20, {u32=9,
>> u64=9}}) = 0
>> epoll_ctl(5, EPOLL_CTL_MOD, 12, {EPOLLIN|EPOLLERR|EPOLLHUP|0x20,  
>> {u32=12,
>> u64=12}}) = 0
>> time(NULL)                              = 1157770135
>> gettimeofday({1157770135, 505376}, NULL) = 0
>> epoll_wait(5, {{EPOLLIN, {u32=12, u64=12}}}, 1000, 1000) = 1
>> read(12,
>> "\\\270\t\311\260&\347y\300\7$*+\245M\364.\306\255\246\357\177\206F\'\3038&\335h#\233\33z\216z\276l;\340_U\352\27%2\33\347\217\220\251g\'\220R\20\274\201b\31\232r\370\263\240\331\305\30\323\3351Z\372\276>\3768[eg\f\32
>> 3\330\265\276K\200\tR[\36\363\v\334\225\204\266\24)\351\34\345\20h_\245x\206\232\336\217\261\1\362
>> \303sj\214\251\353qQ\235g\244CJ\372\344\343U\221?y*\373\1\341\274\203\265W\356\233\2=\331\303B\264U\211\231\333\235\253|\217a\2
>> 60\224v>\335\354\212\316\334\252eU\226\3\n\n65\34\30\230zf\2\253\375#1f\276\17\352\212\30\313\"\376\347\307+\365\314\337?r\25745\354tj\23\306\250\27\30\255\213\2649`\30\22\223\316\351\240.\"\362e\335\201\255L\305\335\207\301\2
>> 70\266@\226\347\272\373\231G\267\207I\262\352\272\247\352\17\231.\32\345\202\244H\335\346{{\37\25{\310]\32\305\teJ\226[\336X\232Wn\5\327\335Z\247\343y\327W\241l\277\231\375\232\262\261\177]Mur\275\271y\223\270\2020\267\366EE~e
>> \226L\326\247\f\334Mi$b\v\377\255\351K\320ORR\225}Z\331k\344\213\337V\245b.b_]\36\27_\375f\211\25"...,
>> 61449) = 61449
>> write(9,
>> "\\\270\t\311\260&\347y\300\7$*+\245M\364.\306\255\246\357\177\206F\'\3038&\335h#\233\33z\216z\276l;\340_U\352\27%2\33\347\217\220\251g\'\220R\20\274\201b\31\232r\370\263\240\331\305\30\323\3351Z\372\276>\3768[eg\f\32
>> 3\330\265\276K\200\tR[\36\363\v\334\225\204\266\24)\351\34\345\20h_\245x\206\232\336\217\261\1\362
>> \303sj\214\251\353qQ\235g\244CJ\372\344\343U\221?y*\373\1\341\274\203\265W\356\233\2=\331\303B\264U\211\231\333\235\253|\217a\2
>> 60\224v>\335\354\212\316\334\252eU\226\3\n\n65\34\30\230zf\2\253\375#1f\276\17\352\212\30\313\"\376\347\307+\365\314\337?r\25745\354tj\23\306\250\27\30\255\213\2649`\30\22\223\316\351\240.\"\362e\335\201\255L\305\335\207\301\2
>> 70\266@\226\347\272\373\231G\267\207I\262\352\272\247\352\17\231.\32\345\202\244H\335\346{{\37\25{\310]\32\305\teJ\226[\336X\232Wn\5\327\335Z\247\343y\327W\241l\277\231\375\232\262\261\177]Mur\275\271y\223\270\2020\267\366EE~e
>> \226L\326\247\f\334Mi$b\v\377\255\351K\320ORR\225}Z\331k\344\213\337V\245b.b_]\36\27_\375f\211\25"...,
>> 61449) = 9963
>> epoll_ctl(5, EPOLL_CTL_MOD, 9, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x20,
>> {u32=9, u64=9}}) = 0
>> time(NULL)                              = 1157770135
>> gettimeofday({1157770135, 505943}, NULL) = 0
>> epoll_wait(5, {{EPOLLIN, {u32=12, u64=12}}}, 1000, 1000) = 1
>> epoll_ctl(5, EPOLL_CTL_MOD, 12, {EPOLLERR|EPOLLHUP|0x20, {u32=12,
>> u64=12}}) = 0
>> time(NULL)                              = 1157770135
>> gettimeofday({1157770135, 506101}, NULL) = 0
>> epoll_wait(5, {}, 1000, 1000)           = 0
>> time(NULL)                              = 1157770136
>> gettimeofday({1157770136, 507274}, NULL) = 0
>> epoll_wait(5, {}, 1000, 1000)           = 0
>> time(NULL)                              = 1157770137
>> gettimeofday({1157770137, 507277}, NULL) = 0
>> epoll_wait(5, {}, 1000, 1000)           = 0
>> time(NULL)                              = 1157770138
>> gettimeofday({1157770138, 507259}, NULL) = 0
>> epoll_wait(5, {}, 1000, 881)            = 0
>> time(NULL)                              = 1157770139
>> gettimeofday({1157770139, 391390}, NULL) = 0
>> time(NULL)                              = 1157770139
>> gettimeofday({1157770139, 391504}, NULL) = 0
>> epoll_wait(5, {}, 1000, 1000)           = 0
>> time(NULL)                              = 1157770140
>> gettimeofday({1157770140, 391353}, NULL) = 0
>> epoll_wait(5, {{EPOLLIN, {u32=4, u64=4}}}, 1000, 1000) = 1
>>
>> ===========
>>
>> Right when the chunks stop is when the download freezes, at about 10%  
>> of a
>> 50mb file. Let me know if you need more/different parts of the trace.
>> Thanks!
>>
>> -Sam
>>
>> On Fri, 08 Sep 2006 10:47:33 -0700, Brad Fitzpatrick <brad at danga.com>
>> wrote:
>>
>> > Hm.  Can you get an strace on the main Perlbal process while this is
>> > happening?
>> >
>> > $ strace -s 500 -o perlbal.trace -p <pid-of-perlbal>
>> > ...... (wait)
>> > ^C
>> >
>> >
>> >
>> > On Fri, 8 Sep 2006, Sam G wrote:
>> >
>> >> Hi all,
>> >>
>> >> I have perlbal installed on a Centos 4.4 x64 dual Opteron server w/  
>> 4GB
>> >> RAM, and am reproxying from a webserver to read files from a  
>> clustered
>> >> storage backend running Apache/DAV. When I do a reproxied download,
>> >> things
>> >> seem to go well until a few megabytes into the file; then, the  
>> download
>> >> seems to slow down sharply then hang. It will happen at different  
>> parts
>> >> of
>> >> the file, but the speed always starts out strong (500kb/s to 2mb/s,
>> >> depending on the connection I'm downloading from) but after 10  
>> seconds
>> >> or
>> >> so comes to a halt. Sometimes it'll slow way down, speed back up, but
>> >> then
>> >> slow down again, finally hanging. This happens for virtually any  
>> large
>> >> file.
>> >>
>> >> I've tried both 'ioaio' and 'none' for the aio_mode parameter  
>> (Linux::IO
>> >> won't pass tests), but same thing happens in both cases. I can pull  
>> from
>> >> the storage to the perlbal machine no problem with HTTP wget, at  
>> about
>> >> 30
>> >> MB/s uninterrupted. As the reeproxied download happens, perlbal CPU
>> >> usage
>> >> never goes above 1%. I have the default buffer settings; have tried
>> >> tweaking a little but to no avail.
>> >>
>> >> Anyone have any ideas why this might be happening?
>> >>
>> >> --
>> >> Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
>> >>
>> >>
>>
>>
>>
>> --
>> Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
>>
>>



-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/


More information about the perlbal mailing list