bugs in PRINT/_write and do_request

Mark Smith marksmith at danga.com
Wed May 11 15:00:20 PDT 2005


> I've been getting reconnections during writes working in the python
> client.  I noticed for the perl client if you do
> ----------------
> use MogileFS;
> my $mogfs = MogileFS->new(domain => 'test',
>                           hosts  => [ 'peter:7001' ]);
> die "Unable to initialize MogileFS object.\n" unless $mogfs;
> 
> 
> my $file_contents = $mogfs->get_file_data("motd");
> print "$$file_contents\n";
> 
> sleep(10);
> my $file_contents = $mogfs->get_file_data("motd");
> print "$$file_contents\n";
> --------------------
> and during the sleep(10) restart the tracker, you get
> 
> MogileFS::Backend: socket closed on read at /usr/share/perl5/MogileFS.pm line 129
> 
> by the second call
> 
> this is because _get_sock never gets a chance to be called, I accounted
> for this in the python client by having it retry the do_request once
> which will then force _get_sock to be called.

Interesting.  It's okay that _get_sock isn't called a second time.  That's
how it's supposed to work.  The client attempts to cache the socket to the
backend through requests so it doesn't have to go through the handshake
every time.

do_request does some logic that lets it try to use the cached socket, and
if there's an error, it will then open a new connection.

The problem is that for some reason, when the remote goes away, and we call
send() to send our request, we aren't getting an ECONNRESET (or really, any
error) and the system is saying it wrote the bytes.

I did some tracing (via strace) and confirmed that the OS is indeed saying
that it successfully wrote the bytes to the backend socket, which should
be dead, since the backend went away.  However, the OS is saying "alrihgt,
bytes sent."

Perplexing and disturbing.

I honestly have no idea why this is happening.  Maybe it has to do with
talking to a local tracker?  I'll have to play around with a remote tracker
and see if I can cause the same behavior.

So what I'm saying: thanks for the report, it's verified, but no idea why
it's happening.  Definitely needs to be investigated and fixed, though.  If
you have any ideas on why we're getting an all-clear from send() on a dead
socket, I'd love to hear them...

Anyway, I'll get to your other emails as I have time.  Life is busy around
here lately!  Good stuff, though.  :)

--
Mark Smith
junior at danga.com
junior at sixapart.com


More information about the mogilefs mailing list