[PATCH] bug: some children do not receive state changes notifications

Radu Greab rg at yx.ro
Thu Sep 13 07:49:03 UTC 2007

There is a bug inside state_change() from MogileFS::ProcManager: the
cache of known states is not updated for the child that sends a state
change. Due to this some children may not be notified of future state
changes because their cached known state is wrong.

In the setup I found the above bug the load on the DB was growing
constantly due to a growing file_to_delete_later table.

The sequence of events was:
- the delete job received initially a notification that host X is
reachable and this state was cached by ProcManager.
- sometimes connections to the storage nodes timed out and the delete job
broadcasted a host X is unreachable message. The ProcManager changed the
known state cache for all children except for the delete job. The cache
for the delete job continue to says that host X is reachable.
- later, the monitor job broadcasted that host X is reachable again, but
this state change was not broadcasted to the delete job because the
cache entry already says that the host X is reachable. So the delete job
couldn't advance because its internal state said that host X was
unreachable and it didn't receive any notification from the parent that
host X is reachable again.

The fix is simple, also update the known state cache for the child that
sent the message.

=== lib/MogileFS/ProcManager.pm
--- lib/MogileFS/ProcManager.pm	(revision 87784)
+++ lib/MogileFS/ProcManager.pm	(local)
@@ -717,11 +717,12 @@
     my ($what, $whatid, $state, $exclude) = @_;
     my $key = "$what-$whatid";
     foreach my $child (values %child) {
-        next if $exclude && $child == $exclude;
         my $old = $child->{known_state}{$key} || "";
         if ($old ne $state) {
             $child->{known_state}{$key} = $state;
-            $child->write(":state_change $what $whatid $state\r\n");
+            $child->write(":state_change $what $whatid $state\r\n")
+                unless $exclude && $child == $exclude;

More information about the mogilefs mailing list