New install: mogstored dying, and mogtool problems

Wed Apr 30 15:49:53 UTC 2008

Hello... I'm new to mogilefs and the list, so please feel free to 
redirect me to other resources that may be out there which I haven't 
found yet.

I just now finished setting up MogileFS for the first time.  I have run 
into some problems that hopefully others have seen before and can help 
me with.

First, I had to make some changes to the code (even after getting the 
current subversion trunk) which blocked me from running make or starting 
the daemons for the first time.  They seemed like they had been there a 
while but also seemed pretty easy to fix.  I'll post diffs if folks are 
interested but I suspect almost everyone has fixed these on their own or 
the cluster wouldn't be operational.  I'm only mentioning it here in 
case this indicates *I've* done something wrong.  These were:
MogileFS/Worker.pm
   #warn "proc ${self}[$$] read: [$out]\n";
   # fails because @self doesn't exist (should be $self-> ?)
   warn "proc \${self}[\$\$] read: [$out]\n";
Gearman/Client/Async/Connection.pm
   #socket my $sock, PF_INET, SOCK_STREAM, IPPROTO_TCP;
   #missing parens around "my"
   # changed to
   my $sock ; socket $sock, PF_INET, SOCK_STREAM, IPPROTO_TCP;
...and there may be one other change I'm now forgetting.

I'm now using mogtool to store contents of an entire directory, and 
encountering some problems. (Using a 90G directory to start but 
eventually this will be 1.5T directories).

First problem was that when I first used mogtool to start injecting 
files, I got a lot of errors (now scrolled off my screen, but it was 
something like "could not put file, unknown_fileid") and I observed that 
mogstored had stopped running on all 16 nodes.  After mogtool was 
killed, fsck reported that there were a lot of files, but listing the 
domain showed 0 files.  I could not figure out how to hunt down and 
delete the chunks that mogtool had already uploaded, since the target 
(only) domain seemed to be empty.  Since I could not figure out how to 
delete the files cleanly, I opted to drop database mogilefs, nuke dev*/0 
and restart everything from scratch.

On the second attempt, mogstored now stays up, and the upload completed 
quickly, but after a 52 minute upload mogtool then proceeded to checksum 
everything and got stuck on checksumming the same 6 blocks over and 
over, for 18 more hours before I stopped it.  It was saying something 
like "retrying checksum for chunks: blah... md5sum mismatch" over and 
over.  I'm not sure what the correct behavior here should be, but if 
both copies of a chunk have failed checksum, and the original file 
(stream) is no longer available, at some point it should probably 
declare failure and stop fetching the bad chunks repeatedly.

My first priority here would be to figure out why mogstored died and 
keep it from dying.  Has this happened before/frequently?  Is it common 
practice to put a wrapper or sentinel on mogstored to start it when it 
fails?  Is there a log file where mogstored shows any warn or die 
messages?  (I used an /etc/init.d/mogstored start script found in the 
archives of this list, so perhaps I just need to replace the >/dev/null 
in that script with an actual file)

My second priority would be to figure out how to recover from a failed 
mogtool injection.  I'm pretty sure files exist in the tracker, 
definitely they exist on nodes, but if mogtool list domain doesn't show 
them, how can I find and delete them?  (I probably will try the 
MogileFS::Client direct interface next).  If I ask mogtool to store the 
same ID again, also using --bigfile, will it overwrite the chunks it 
stored the first time or will I need to invent something to find 
orphaned bigfile chunks and remove them after a certain time?

Thinking ahead to the fix, what's the correct/desired behavior in cases 
where a bigfile fails to inject... would it be fairly easy to make 
mogtool aware of the incomplete bigfile and its chunks (possibly under a 
different master fileid?) so future invocations of mogtool can delete 
them as expected?  In the case where we have put the chunks and fetching 
them back gives us a bad checksum, what's the proper behavior there? 
Would it be feasible to make the spawned child process wait a short time 
and then fetch its own chunk back, so that it has a chance to put the 
data up again if there is a mismatch?  I'm willing to spend some extra 
memory to have threads wait around and checksum before freeing the memory.

At this point I'm not sure if I'm doing something wrong or if my 
experience is expected/typical, so any feedback (even if it's not an 
answer/suggestion) would be helpful.  Have people used mogtool as part 
of a production system for storing huge files?  Is it more common for 
people to implement their own chunking/splitting?

Thanks for any feedback.
gregc