Justin H. Brehm
jbrehm at icontact.com
Mon Apr 28 16:05:44 UTC 2008
I've had three different checks that I've used and all seem to have flaws.
First one was a simple TCP port check on the ports that MogileFS has open. This is cool if you want to make sure the daemons are still running, but I noticed that there were cases when a DB could go down and the port remains open.
Next I wrote something that used 'mogtool' to test injections and extractions, however 'mogtool' does way more than I needed it to do and it would also tend to keep retrying in areas if mogile went down making the nagios plugin NRPE timeout.
The last thing that I wrote was script that uses the MogileFS::Client perl modules and does an injection, extraction and I then compares the in/out files size to simply check if we have the same file. This is what we've been using so far, however, I have seen an instance where the database was down and MogileFS::Backend would have a return code of '82' or something in that range and my nagios check was giving me the UNKNOWN status. That was a long night of moving some development databases, so I wasn't up to debugging it that night and haven't revisited yet.
What I'm planning on doing, because most of the problems that I've seen tend to revolve around the database side, will be modifying my last nagios plugin to do a 'select 1' query on the Mogile DB first and if that fails then to alert. At least I'll elimnate that first and then move on to testing whether the trackers are functioning, etc.
----- Original Message -----
From: "Frieder Kundel" <frieder.kundel at gmail.com>
To: mogilefs at lists.danga.com
Sent: Monday, April 28, 2008 10:18:42 AM (GMT-0500) America/New_York
Subject: Nagios plugin?
how do you monitor your mogile? Has anyone written a nagios plugin?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mogilefs