mogstored dying: redux

Greg Connor gconnor at nekodojo.org
Wed May 21 07:17:00 UTC 2008


On May 20, 2008, at 11:27 AM, Mark Smith wrote:

>> Hi all, I very much appreciate the patient help and advice, but I'm  
>> still
>> having trouble getting even small files stored in my mogile setup.
>
> Given the error message you've pasted (403?) this seems like a
> configuration/setup problem.  Are you sure that your MogileFS setup is
> even working at all, even without touching mogtool?  Well, it's easy
> to figure out if it is or not.  Here, this little script:
>
> ---
>
> If the process fails, can you copy the output of it and paste on the
> mailing list here?  There should be a lot of text for all of the work
> that the library is doing that will tell you what's going on.  Or
> anyway, will tell us what's going on, I don't expect most of it to
> make sense unless you know the internals of MogileFS.  :)


Thanks Mark.  The test script worked fine.  The 403 errors were only  
occurring with "lighttpd" used in place of perlbal.  This was a  
suggestion (Ask's) which seemed like a good thing to try, but lighttpd  
actually made things worse.  With lighttpd, about 1 in 5 requests  
failed to store, or failed to close.

I've now reverted back to the standard mogstored/perlbal config, and  
it's *mostly* working but I'm concerned about the frequency of  
mogstored just plain dying... I have to keep a "keepalive" script  
running to relaunch any mogstored procs that have mysteriously stopped  
running by checking my 16 storage nodes every 5 min.

I'm also worried about intermittent problems when pushing large  
numbers of files (currently using mogtool).  I'm not sure if this  
corresponds to mogstored dying, or trying to hit a dead node before  
the restart kicks in, or what.  The errors given out by mogtool in  
these intermittent cases are one of these:
 > MogileFS backend error message: unknown_key unknown_key
 > System error message: MogileFS::NewHTTPFile: unable to write to any  
allocated storage node at /usr/lib64/perl5/5.8.5/x86_64-linux-thread- 
multi/IO/Handle.pm line 399
 > System error message: Close failed at /usr/bin/mogtool line 816,  
<Sock_minime336:7001> line 215.


I can live with transmit errors once in a while, and for now mogtool  
seems to be retrying and recovering.  But if they crash the storage  
node, that's a showstopper.   If it's not normal for mogstored to just  
die like that, I will spend some time trying to figure out why that  
is.  If it *is* normal for mogstored to just die sometimes, I need to  
get rid of it quickly and get lighttpd over its intermittent 403  
problems.  I don't think I have time to do both so I need pick a  
direction that's more likely to succeed.  My time to evaluate this  
solution for our application is running out quickly.

Thanks again for the replies.  I would be lost without the help from  
the list (which probably means the documentation is weak and puny, but  
c'est la vie).


More information about the mogilefs mailing list