Perlbal reproxied downloads stalling

Brad Fitzpatrick brad at danga.com
Sat Sep 9 16:36:58 UTC 2006


Try increasing the server's "buffer_size_reproxy_url" setting.  Still, it
shouldn't freeze, so something more is messed up.

I can't reproduce this here, even though I thought I'd be able to.  I get
350 kB/second to my house from both my origin and reproxy-URL'd services,
which live on different hosts.

What's your perlbal configuration and network look like?  What's the
latency between the three parties involved?  (ignoring the first server
that issues the x-reproxy-url header)

BTW, the aio mode won't matter for this... it's a pure network operation.
No filesystem calls involved to do async.

When a download stalls, telnet into the Perlbal management port and type
"socks".  What's the output?  In particular, I'm wondering what state and
buffer sizes the different parties are at.




On Fri, 8 Sep 2006, Sam G wrote:

> This seems to be the bulk of what's happening:
>
> ===========
>
> <lots of large chunks of text scrolling by as file downloads...>
> ....
> <snip>
> ....
> epoll_ctl(5, EPOLL_CTL_MOD, 9, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x20,
> {u32=9, u64=9}}) = 0
> time(NULL)                              = 1157770135
> gettimeofday({1157770135, 330342}, NULL) = 0
> epoll_wait(5, {{EPOLLIN, {u32=12, u64=12}}}, 1000, 1000) = 1
> read(12,
> "\2100)\"\33\31\207\244\241\0254\210v\360W{\321$\221\331^\302\237\22480\33\257\35\2220\25\337f\17#e\7%\253\23\330\21\356\321\250fd\204\241%\372hD\374h\214\225\16\232w\322 at O\16m_!\375\237w)\20\n\256YF\353B\10|\231j7\23
> 1\363\337QnA\210R\365\337\\5,:\3737W\341pI\206\273\227~\2472\361\316Q\301\37V\307\371\216\354\234k
> \223\247\333\316\'\f\235\226h<\344\344uD\25\251\253\4\364*\361\215\264\234\377\202\222\27\262\6\257\223J\6\313\344\300\331\222\
> 243\304U\211\304\257\206vO\341\322\307\346\307\270I\300\362\330>xH\267\32+\210\250&/8u\262\223|9\7cK\26\256\234\254\207j,e|1\271\246.\304\215d\35\327\305\342\321\267\303>\323*\265\23\16\211\365\350\346\223\316\237,J\251\210/Ud
> g\377\231`KZ\34\342\302\'\320\240T\177\321\304\7\320*\241T)\326\256I\264\"\244\376\231&M\310\243\254\240^h#\237N\345i\273\221V3\253q\232\277\6\225\\\f\226\247\3!qi\341`\323\rV\251.`w\26\27\17
> `\247\346 \232\257\356/b\205\223\'
> ?\366\nz\242L\304\261\255\223\276-\231\361\371\314\31\303\303\367
> {\24ha(\374\34\315\365\270\212"..., 61449) = 61449
> time(NULL)                              = 1157770135
> gettimeofday({1157770135, 330707}, NULL) = 0
> epoll_wait(5, {{EPOLLIN, {u32=12, u64=12}}}, 1000, 1000) = 1
> epoll_ctl(5, EPOLL_CTL_MOD, 12, {EPOLLERR|EPOLLHUP|0x20, {u32=12,
> u64=12}}) = 0
> time(NULL)                              = 1157770135
> gettimeofday({1157770135, 330867}, NULL) = 0
> epoll_wait(5, {{EPOLLOUT, {u32=9, u64=9}}}, 1000, 1000) = 1
> time(NULL)                              = 1157770135
> write(9,
> "Q\344\7\330,\301\355!k\3278\31:\256\371\264<\330\352\310j\225\30\325\310\361\360D\353\353w\344L\3\341*\345\227\3460\204\354\23Q\332\316-O\301\350\270w\206t\235v\365\300r;\367\377\220s\244\311\302\226\372\0\357\3025\3
> 60\266\336\341\0\377%O\3324\213\364\344\201D\232si\5\301JS?\341|o\250\274\377\240F\364\32\250$\24\372\315\325q\233_\350at\324\363\320\343qG\32\221\345\f\272\324\t\2524\\\232]^\301\241\220\212\241R\372\3240\360*A\3709}\203@\22\
> 313\n(_\200H\263\335\243\261\342Q\33\16\264\243_\230\372A:g*Y\331\324\325F\326N\27,~\200\235\306_\222@\301\246\266\r\374\202\216\311A\2mb\3\213+r\201\22K\323\374\'\277\5l\353\10\355+<~\35\\\272\320\277\323\34\37\345\361\335\26
> 1\27\305\276\224\321>\222JKNjwo\336\224PmJ\246\1A\210\374\27w\33\377\r\21\340\32\326\326f\332\255\214T0%\253,!1\212\206\374\351UE\25\274\360*>\310\236\262hcP[-ZW\27\0f\311+ at U\305\214\2701e\10\320\5\33S\302K\352\366\346.\314\36
> 7:Z9\316j\313]K\342\356X\31\245S\5\251\t8\223KXrZ\277\277\236I\352;T\200\327\r1\240iO\21\31\301\10"...,
> 26868) = 26868
> write(9,
> "\2100)\"\33\31\207\244\241\0254\210v\360W{\321$\221\331^\302\237\22480\33\257\35\2220\25\337f\17#e\7%\253\23\330\21\356\321\250fd\204\241%\372hD\374h\214\225\16\232w\322 at O\16m_!\375\237w)\20\n\256YF\353B\10|\231j7\23
> 1\363\337QnA\210R\365\337\\5,:\3737W\341pI\206\273\227~\2472\361\316Q\301\37V\307\371\216\354\234k
> \223\247\333\316\'\f\235\226h<\344\344uD\25\251\253\4\364*\361\215\264\234\377\202\222\27\262\6\257\223J\6\313\344\300\331\222\
> 243\304U\211\304\257\206vO\341\322\307\346\307\270I\300\362\330>xH\267\32+\210\250&/8u\262\223|9\7cK\26\256\234\254\207j,e|1\271\246.\304\215d\35\327\305\342\321\267\303>\323*\265\23\16\211\365\350\346\223\316\237,J\251\210/Ud
> g\377\231`KZ\34\342\302\'\320\240T\177\321\304\7\320*\241T)\326\256I\264\"\244\376\231&M\310\243\254\240^h#\237N\345i\273\221V3\253q\232\277\6\225\\\f\226\247\3!qi\341`\323\rV\251.`w\26\27\17
> `\247\346 \232\257\356/b\205\223\'
> ?\366\nz\242L\304\261\255\223\276-\231\361\371\314\31\303\303\367
> {\24ha(\374\34\315\365\270\212"..., 61449) = 61449
> epoll_ctl(5, EPOLL_CTL_MOD, 9, {EPOLLIN|EPOLLERR|EPOLLHUP|0x20, {u32=9,
> u64=9}}) = 0
> epoll_ctl(5, EPOLL_CTL_MOD, 12, {EPOLLIN|EPOLLERR|EPOLLHUP|0x20, {u32=12,
> u64=12}}) = 0
> time(NULL)                              = 1157770135
> gettimeofday({1157770135, 505376}, NULL) = 0
> epoll_wait(5, {{EPOLLIN, {u32=12, u64=12}}}, 1000, 1000) = 1
> read(12,
> "\\\270\t\311\260&\347y\300\7$*+\245M\364.\306\255\246\357\177\206F\'\3038&\335h#\233\33z\216z\276l;\340_U\352\27%2\33\347\217\220\251g\'\220R\20\274\201b\31\232r\370\263\240\331\305\30\323\3351Z\372\276>\3768[eg\f\32
> 3\330\265\276K\200\tR[\36\363\v\334\225\204\266\24)\351\34\345\20h_\245x\206\232\336\217\261\1\362
> \303sj\214\251\353qQ\235g\244CJ\372\344\343U\221?y*\373\1\341\274\203\265W\356\233\2=\331\303B\264U\211\231\333\235\253|\217a\2
> 60\224v>\335\354\212\316\334\252eU\226\3\n\n65\34\30\230zf\2\253\375#1f\276\17\352\212\30\313\"\376\347\307+\365\314\337?r\25745\354tj\23\306\250\27\30\255\213\2649`\30\22\223\316\351\240.\"\362e\335\201\255L\305\335\207\301\2
> 70\266@\226\347\272\373\231G\267\207I\262\352\272\247\352\17\231.\32\345\202\244H\335\346{{\37\25{\310]\32\305\teJ\226[\336X\232Wn\5\327\335Z\247\343y\327W\241l\277\231\375\232\262\261\177]Mur\275\271y\223\270\2020\267\366EE~e
> \226L\326\247\f\334Mi$b\v\377\255\351K\320ORR\225}Z\331k\344\213\337V\245b.b_]\36\27_\375f\211\25"...,
> 61449) = 61449
> write(9,
> "\\\270\t\311\260&\347y\300\7$*+\245M\364.\306\255\246\357\177\206F\'\3038&\335h#\233\33z\216z\276l;\340_U\352\27%2\33\347\217\220\251g\'\220R\20\274\201b\31\232r\370\263\240\331\305\30\323\3351Z\372\276>\3768[eg\f\32
> 3\330\265\276K\200\tR[\36\363\v\334\225\204\266\24)\351\34\345\20h_\245x\206\232\336\217\261\1\362
> \303sj\214\251\353qQ\235g\244CJ\372\344\343U\221?y*\373\1\341\274\203\265W\356\233\2=\331\303B\264U\211\231\333\235\253|\217a\2
> 60\224v>\335\354\212\316\334\252eU\226\3\n\n65\34\30\230zf\2\253\375#1f\276\17\352\212\30\313\"\376\347\307+\365\314\337?r\25745\354tj\23\306\250\27\30\255\213\2649`\30\22\223\316\351\240.\"\362e\335\201\255L\305\335\207\301\2
> 70\266@\226\347\272\373\231G\267\207I\262\352\272\247\352\17\231.\32\345\202\244H\335\346{{\37\25{\310]\32\305\teJ\226[\336X\232Wn\5\327\335Z\247\343y\327W\241l\277\231\375\232\262\261\177]Mur\275\271y\223\270\2020\267\366EE~e
> \226L\326\247\f\334Mi$b\v\377\255\351K\320ORR\225}Z\331k\344\213\337V\245b.b_]\36\27_\375f\211\25"...,
> 61449) = 9963
> epoll_ctl(5, EPOLL_CTL_MOD, 9, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP|0x20,
> {u32=9, u64=9}}) = 0
> time(NULL)                              = 1157770135
> gettimeofday({1157770135, 505943}, NULL) = 0
> epoll_wait(5, {{EPOLLIN, {u32=12, u64=12}}}, 1000, 1000) = 1
> epoll_ctl(5, EPOLL_CTL_MOD, 12, {EPOLLERR|EPOLLHUP|0x20, {u32=12,
> u64=12}}) = 0
> time(NULL)                              = 1157770135
> gettimeofday({1157770135, 506101}, NULL) = 0
> epoll_wait(5, {}, 1000, 1000)           = 0
> time(NULL)                              = 1157770136
> gettimeofday({1157770136, 507274}, NULL) = 0
> epoll_wait(5, {}, 1000, 1000)           = 0
> time(NULL)                              = 1157770137
> gettimeofday({1157770137, 507277}, NULL) = 0
> epoll_wait(5, {}, 1000, 1000)           = 0
> time(NULL)                              = 1157770138
> gettimeofday({1157770138, 507259}, NULL) = 0
> epoll_wait(5, {}, 1000, 881)            = 0
> time(NULL)                              = 1157770139
> gettimeofday({1157770139, 391390}, NULL) = 0
> time(NULL)                              = 1157770139
> gettimeofday({1157770139, 391504}, NULL) = 0
> epoll_wait(5, {}, 1000, 1000)           = 0
> time(NULL)                              = 1157770140
> gettimeofday({1157770140, 391353}, NULL) = 0
> epoll_wait(5, {{EPOLLIN, {u32=4, u64=4}}}, 1000, 1000) = 1
>
> ===========
>
> Right when the chunks stop is when the download freezes, at about 10% of a
> 50mb file. Let me know if you need more/different parts of the trace.
> Thanks!
>
> -Sam
>
> On Fri, 08 Sep 2006 10:47:33 -0700, Brad Fitzpatrick <brad at danga.com>
> wrote:
>
> > Hm.  Can you get an strace on the main Perlbal process while this is
> > happening?
> >
> > $ strace -s 500 -o perlbal.trace -p <pid-of-perlbal>
> > ...... (wait)
> > ^C
> >
> >
> >
> > On Fri, 8 Sep 2006, Sam G wrote:
> >
> >> Hi all,
> >>
> >> I have perlbal installed on a Centos 4.4 x64 dual Opteron server w/ 4GB
> >> RAM, and am reproxying from a webserver to read files from a clustered
> >> storage backend running Apache/DAV. When I do a reproxied download,
> >> things
> >> seem to go well until a few megabytes into the file; then, the download
> >> seems to slow down sharply then hang. It will happen at different parts
> >> of
> >> the file, but the speed always starts out strong (500kb/s to 2mb/s,
> >> depending on the connection I'm downloading from) but after 10 seconds
> >> or
> >> so comes to a halt. Sometimes it'll slow way down, speed back up, but
> >> then
> >> slow down again, finally hanging. This happens for virtually any large
> >> file.
> >>
> >> I've tried both 'ioaio' and 'none' for the aio_mode parameter (Linux::IO
> >> won't pass tests), but same thing happens in both cases. I can pull from
> >> the storage to the perlbal machine no problem with HTTP wget, at about
> >> 30
> >> MB/s uninterrupted. As the reeproxied download happens, perlbal CPU
> >> usage
> >> never goes above 1%. I have the default buffer settings; have tried
> >> tweaking a little but to no avail.
> >>
> >> Anyone have any ideas why this might be happening?
> >>
> >> --
> >> Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
> >>
> >>
>
>
>
> --
> Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
>
>


More information about the perlbal mailing list