Tracker error "size_verify_error" on create_close

Adam Rosien adam at rosien.net
Fri Apr 20 00:36:18 UTC 2007


Interesting progress.  Your change locked up the tracker because
$first has some low bytes in it.  I changed it to write (hex $first)
which returns '12', which seems like bogus data from the socket.

I'm running a series of unit tests and have also tried seeing if
removing some of the tests had any particular effect.  One of my tests
did this sequence:

1. create_open -> OK
2. create_close (specifying size 0) -> expected ERR from tracker
3. create_close (proper size specified) -> intermittent size_verify_error

If I take out step 2, I don't see the same size_verify_error "HEAD
response to get_file_size looks bogus".  I do, however, very rarely
get a size_verify_error:

ERR size_verify_error
Expected:+4%3B+actual:+0+%28missing%29%3B+path:+http://10.3.1.126:17500/dev2/0/000/001/0000001000.fid%3B+error:+Job+queryworker+has+only+0,+wants+5,+making+5.

I think I can reasonably say that the non unit test code won't be
doing the 1-2-3 sequence, but the intermittent error is odd in any
case.  The newer "queryworker has only 0, wants 5, making 5" seems
suitably rare.

In any case I can detect when a create_open/PUT/create_close sequence
fails and try again.

Shall I do any other tests?

.. Adam

On 4/19/07, Brad Fitzpatrick <brad at danga.com> wrote:
> Weird.
>
> But we're getting closer...
>
> Change this line:
>
>    return undeferr("HEAD response to get_file_size looks bogus");
>
> to:
>
>    return undeferr("HEAD response to get_file_size looks bogus: [$first]");
>
> And let me know what it says?
>
> - Brad
>
>
> On Thu, 19 Apr 2007, Adam Rosien wrote:
>
> > The message after upgrading to trunk is now:
> >
> > ERR size_verify_error
> > Expected:+4%3B+actual:+0+%28cantreach%29%3B+path:+http://10.3.1.104:17500/dev1/0/000/000/0000000484.fid%3B+error:+HEAD+response+to+get_file_size+looks+bogus
> >
> > If I do a HEAD request to the path in the error response with curl the
> > response is "200 OK", so one theory would be that there is some kind
> > of timing issue, unless you know more about the meaning behind the
> > above message.
> >
> > .. Adam
> >
> > On 4/19/07, Adam Rosien <adam at rosien.net> wrote:
> > > mogstored.  I'll update and get you the new message.
> > >
> > > .. Adam
> > >
> > > On 4/19/07, Brad Fitzpatrick <brad at danga.com> wrote:
> > > > Adam,
> > > >
> > > > Current trunk should be safe to run... nothing scary.  All big changes are
> > > > in Fsck.pm, and rest is just cleanups & docs.
> > > >
> > > > But the thing you want is the part where I (today?) improved this exact
> > > > error message to say more than the "HEAD request wasn't 200 OK" that
> > > > you're seeing, but to show you exactly what the remote server said during
> > > > the size check.
> > > >
> > > > Which storage node webserver are you using, btw?  mogstored, apache, lighttpd?
> > > >
> > > > - Brad
> > > >
> > > > On Thu, 19 Apr 2007, Adam Rosien wrote:
> > > >
> > > > > ERR size_verify_error
> > > > > Expected:+4%3B+actual:+0+%28error%29%3B+path:+http://10.3.1.104:17500/dev1/0/000/000/0000000372.fid%3B+error:+get_file_size%28%29%27s+HEAD+request+wasn%27t+a+200+OK
> > > > >
> > > > > (I'm writing a 4 byte file, and got a 200 OK from the PUT to the storage node)
> > > > >
> > > > > I'm running the mogile code from svn trunk, as of Mar 15, Perlbal
> > > > > 1.54.  I see there have been updates in trunk since then, but don't
> > > > > have them yet.
> > > > >
> > > > > .. Adam
> > > > >
> > > > > On 4/19/07, Brad Fitzpatrick <brad at danga.com> wrote:
> > > > > > What's the full response line?  size_verify_error should be returning
> > > > > > extra details about why it failed.
> > > > > >
> > > > > > And what version?
> > > > > >
> > > > > >
> > > > > > On Thu, 19 Apr 2007, Adam Rosien wrote:
> > > > > >
> > > > > > > Intermittently I get a "size_verify_error" error code from the tracker
> > > > > > > when calling create_close after first calling create_open and then
> > > > > > > PUTing the file to the storage node.  Is there a possible latency
> > > > > > > between completing the PUT to the storage node and when the tracker
> > > > > > > confirms the bytes written with the storage node, so that the
> > > > > > > create_close returns this error?
> > > > > > >
> > > > > > > I am using my own C++ code for the tracker protocol and libcurl for HTTP.
> > > > > > >
> > > > > > > .. Adam
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


More information about the mogilefs mailing list