Mogstored orphans block parent revival

Nathan Schmidt nschmidt at gmail.com
Thu Mar 15 18:22:57 UTC 2007


Due to an earlier crash (related to the 751 patch - so far resolved)  
in mogstored we were seeing a tricky failure mode - the mogstored  
process died but the children didn't follow.

We run http://www.tildeslash.com/monit/ and that tried to recover by  
running "/etc/init.d/mogstored start"  but the presence of the  
children made the new mogstored parent fail to completely start up.

This is probably just a regular bug, but we put in a brute-force  
workaround in the mean time, inlined patch below.

Regards,
-Nathan / PBwiki


Steps to reproduce:
 > /etc/init.d/mogstored start
 >  kill -9 `cat /var/run/mogstored.pid`
 > pgrep mogstored
	# just the child pids remain (iostat, diskusage listeners)
 > /etc/init.d/mogstored start
 > pgrep mogstored
	# same child pids as before, no new mogstored parent running




--- server/debian/mogstored.init        2007-02-13 00:50:36.000000000  
+0000
+++ /etc/init.d/mogstored       2007-03-15 18:04:05.000000000 +0000
@@ -38,6 +38,14 @@
         fi
+       # Reproducible problem with parent pid dying off but leaving  
two children (iostat, diskusage).
+       # '/etc/init.d/mogstored start' succeeds but the orphans  
prevent full start of the new parent pid,
+       # disabling the storage node. The following assures the  
orphans are taken to the woods.
+
+       # pids of mogstored | except for this script which is also  
called mogstored | summary execution
+       pgrep mogstored | grep -v $$ | xargs kill -9
+       sleep 1
+
         start-stop-daemon --start --quiet --exec $DAEMON --pidfile  
$PIDFILE -b -m --name $NAME
         echo "$NAME."
         ;;



More information about the mogilefs mailing list