MogileFS::Backend: socket close on read & `make test` errors compiling Perlbal

Brad Fitzpatrick brad at danga.com
Fri Apr 27 20:37:34 UTC 2007


Steven,

No test failure should be considered 'normal', or 'harmless', so thanks
for pointing these out... even if it's only a bad assumption or race
condition in the test, rather than the core code itself, it's still a bug,
because "make test" should always pass.

As for the other bug, the one that's actually biting you ... it's actually
pretty funny.  Your system clock seems to not always be going forward...
it's taking negative time to do your create_open query, which when output
as 4 decimal places, is -0.0000.  The parent process isn't expecting any
child to complete work in negative time, so the negative sign trips it up,
the parent assumes the child is on crack, kills it, and your connection
drops.

This line, in the mogile server's ProcManager.pm:

    if ($line =~ /^(\d+-\d+)\s+(\d+\.\d+)\s+(.+)$/) {

Can be changed to:

    if ($line =~ /^(\d+-\d+)\s+(\-?\d+\.\d+)\s+(.+)$/) {

And then it should all work for you.  I'll put that into the next release,
due early next week.

I'm curious, though:  run "dmesg" on your box.. do you see warnings about
your clocksource being whack?  It might also explain the Perlbal test
problems, if your clock's jumping all over the place.  What's the hardware
in the box (number/type of processors), and Linux kernel version?

- Brad



On Fri, 27 Apr 2007, Steven Shou wrote:

> Hi all,
>
> I'm running MogileFS 2.0.10 with Perlbal 1.56 and IO::AIO all from CPAN on 3
> test servers (as both trackers and mogstored) under linux 2.6.18 and perl
> 5.8.8.  Whenever I compile Perlbal, I have to `make test` a few times in
> order for it to succeed. Below are some ramdom errors I got while doing 'make
> test':
>
> t/15-webserver...........NOK 6/15
> #   Failed test 'Got not modified'
> #   in t/15-webserver.t at line 70.
> #          got: '200'
> #     expected: '304'
> t/15-webserver...........NOK 7/15
> #   Failed test 'Shouldn't get a Content-Length header'
> #   in t/15-webserver.t at line 71.
> #          got: '12000'
> #     expected: undef
> # Looks like you failed 2 tests of 15.
> t/15-webserver...........dubious
>         Test returned status 2 (wstat 512, 0x200)
> DIED. FAILED tests 6-7
>         Failed 2/15 tests, 86.67% okay
>
> t/45-buffereduploads.....ok 16/0
> #   Failed test 'no_buffer_on_rate: no buffer reason'
> #   in t/45-buffereduploads.t at line 203.
> t/45-buffereduploads.....ok 24/0
> #   Failed test 'no_buffer_on_time: no file'
> #   in t/45-buffereduploads.t at line 219.
> t/45-buffereduploads.....NOK 28/0
> #   Failed test 'no_buffer_on_time: no buffer reason'
> #   in t/45-buffereduploads.t at line 203.
> # Looks like you failed 3 tests of 30.
> t/45-buffereduploads.....dubious
>         Test returned status 3 (wstat 768, 0x300)
> DIED. FAILED tests 22, 28, 30
>         Failed 3/30 tests, 90.00% okay
>
> Eventually, the test will pass after a few tries.  I've also tried many older
> versions but same thing happens.  Just wondering if they're harmless.
>
>
>
> Also after setting up everything, I ran the following tight loop of file
> insertion code to do some simple testing to see how mogilefs handles stuff.
>
> Code:
>
> while(<FILENAMELIST>) {
>     chomp;
>     if(-e $_) {
>         while(! $mogc->store_file($key, "test_class", $_) ) { }
>     }
> }
>
> The program dies after a few inserts with the follow error:
>
> SOCK: cached = Sock_192.168.0.20:6001, REQ: create_open
> domain=test_domain&fid=0&class=test_class&multi_dest=1&key=test11_122
> $VAR1 = undef;
> Use of uninitialized value in concatenation (.) or string at /usr/lib/perl5/
> site_perl/5.8.8/MogileFS/Backend.pm line 174, <Sock_192.168.0.20:6001> line
> 102.
> RESPONSE:
> $VAR1 = undef;
> MogileFS::Backend: socket closed on read at /usr/lib/perl5/site_perl/5.8.8/
> MogileFS/Client.pm line 255
>
>
> While the tracker shows following error:
>
> Worker responded with id <undef> (line: [26680-24 -0.0000 OK
> dev_count=3&devid_3=3&devid_2=2&path_1=http://192.168.0.20:7500/
> dev1/0/000/005/0000005411.fid&fid=5411&devid_1=1&path_3=http://
> 192.168.0.17:7500/dev3/0/000/005/0000005411.fid&path_2=http://
> 192.168.0.22:7500/dev2/0/000/005/0000005411.fid]), but expected id 26680-24,
> killing
> Child 26678 (queryworker) died: 0 (expected)
> Job queryworker has only 1, wants 5, making 4.
>
>
> If I put a `sleep 1` after the while block, then no errors.  I can use `eval`
> to prevent the program from dying and retry until success, but just wondering
> if this is normal and just a sign of limited resource or might this be tied
> to the Perlbal errors at the top?
>
> Thanks!
> Steven
>


More information about the mogilefs mailing list