Restarting Mogilefsd after a database restarts

Mark Smith smitty at gmail.com
Fri Dec 21 17:29:15 UTC 2007


> I've been working on monitoring MogileFS using the check_tcp plugin with
> Nagios.  This will work to let me know if the Mogile ports are open, but
> I've noticed that when somebody restarts a MySql database without first
> stopping the mogilefsd processes, the TCP ports for Mogile will remain open
> and the only way you realize it's down is when you no longer can inject or
> extract items.

If the SQL server stays down?  If it restarts, MogileFS should
reconnect and continue working, ideally with no/little interruption.
Most of the operations end up calling validate_dbh which ends up doing
a DBI ping on the connection to verify that it's still up and running.

If this isn't the case, it sounds like a bug and should be remedied.
MogileFS needs to be tolerant of database unavailability, insofar as
it can be, returning errors and not imploding or being silent.

> I was wondering if anyone has any monitoring that they've written for Mogile
> that might addresses this problem or any other monitoring solutions that
> they've tried?

Generally for monitoring I always used the stats output (send !stats
to a MogileFS server) and verify that the number of queries is going
up, etc.  That ensures it's getting and processing traffic.  Of
course, if there are database issues, you won't see that through this
method, only that MogileFS is processing queries.  This also depends
somewhat on your site not having quiet periods.

If you want to get fancy you can setup your monitoring to actually put
files in and then get them out.  This is going to validate the process
end to end, which can be useful for alarms, but monitoring the
individual components feels like a better route overall.  Easier to
figure out what's going on in if you have more granular monitoring.


-- 
Mark Smith / xb95
smitty at gmail.com


More information about the mogilefs mailing list