Mogstored orphans block parent revival
Nathan Schmidt
nschmidt at gmail.com
Thu Mar 15 18:22:57 UTC 2007
Due to an earlier crash (related to the 751 patch - so far resolved)
in mogstored we were seeing a tricky failure mode - the mogstored
process died but the children didn't follow.
We run http://www.tildeslash.com/monit/ and that tried to recover by
running "/etc/init.d/mogstored start" but the presence of the
children made the new mogstored parent fail to completely start up.
This is probably just a regular bug, but we put in a brute-force
workaround in the mean time, inlined patch below.
Regards,
-Nathan / PBwiki
Steps to reproduce:
> /etc/init.d/mogstored start
> kill -9 `cat /var/run/mogstored.pid`
> pgrep mogstored
# just the child pids remain (iostat, diskusage listeners)
> /etc/init.d/mogstored start
> pgrep mogstored
# same child pids as before, no new mogstored parent running
--- server/debian/mogstored.init 2007-02-13 00:50:36.000000000
+0000
+++ /etc/init.d/mogstored 2007-03-15 18:04:05.000000000 +0000
@@ -38,6 +38,14 @@
fi
+ # Reproducible problem with parent pid dying off but leaving
two children (iostat, diskusage).
+ # '/etc/init.d/mogstored start' succeeds but the orphans
prevent full start of the new parent pid,
+ # disabling the storage node. The following assures the
orphans are taken to the woods.
+
+ # pids of mogstored | except for this script which is also
called mogstored | summary execution
+ pgrep mogstored | grep -v $$ | xargs kill -9
+ sleep 1
+
start-stop-daemon --start --quiet --exec $DAEMON --pidfile
$PIDFILE -b -m --name $NAME
echo "$NAME."
;;
More information about the mogilefs
mailing list