mogstored dying: redux
Greg Connor
gconnor at nekodojo.org
Wed May 21 07:17:00 UTC 2008
On May 20, 2008, at 11:27 AM, Mark Smith wrote:
>> Hi all, I very much appreciate the patient help and advice, but I'm
>> still
>> having trouble getting even small files stored in my mogile setup.
>
> Given the error message you've pasted (403?) this seems like a
> configuration/setup problem. Are you sure that your MogileFS setup is
> even working at all, even without touching mogtool? Well, it's easy
> to figure out if it is or not. Here, this little script:
>
> ---
>
> If the process fails, can you copy the output of it and paste on the
> mailing list here? There should be a lot of text for all of the work
> that the library is doing that will tell you what's going on. Or
> anyway, will tell us what's going on, I don't expect most of it to
> make sense unless you know the internals of MogileFS. :)
Thanks Mark. The test script worked fine. The 403 errors were only
occurring with "lighttpd" used in place of perlbal. This was a
suggestion (Ask's) which seemed like a good thing to try, but lighttpd
actually made things worse. With lighttpd, about 1 in 5 requests
failed to store, or failed to close.
I've now reverted back to the standard mogstored/perlbal config, and
it's *mostly* working but I'm concerned about the frequency of
mogstored just plain dying... I have to keep a "keepalive" script
running to relaunch any mogstored procs that have mysteriously stopped
running by checking my 16 storage nodes every 5 min.
I'm also worried about intermittent problems when pushing large
numbers of files (currently using mogtool). I'm not sure if this
corresponds to mogstored dying, or trying to hit a dead node before
the restart kicks in, or what. The errors given out by mogtool in
these intermittent cases are one of these:
> MogileFS backend error message: unknown_key unknown_key
> System error message: MogileFS::NewHTTPFile: unable to write to any
allocated storage node at /usr/lib64/perl5/5.8.5/x86_64-linux-thread-
multi/IO/Handle.pm line 399
> System error message: Close failed at /usr/bin/mogtool line 816,
<Sock_minime336:7001> line 215.
I can live with transmit errors once in a while, and for now mogtool
seems to be retrying and recovering. But if they crash the storage
node, that's a showstopper. If it's not normal for mogstored to just
die like that, I will spend some time trying to figure out why that
is. If it *is* normal for mogstored to just die sometimes, I need to
get rid of it quickly and get lighttpd over its intermittent 403
problems. I don't think I have time to do both so I need pick a
direction that's more likely to succeed. My time to evaluate this
solution for our application is running out quickly.
Thanks again for the replies. I would be lost without the help from
the list (which probably means the documentation is weak and puny, but
c'est la vie).
More information about the mogilefs
mailing list