disk monitoring

Nathan Schmidt nathan at pbwiki.com
Tue May 1 01:01:51 UTC 2007

All of our machines dump into some offworld syslogs -- from a  
representative crontab:

*/15 *  * * *   root    /usr/sbin/smartctl --all -d ata /dev/sda | 
grep -E "(SMART overall| Temperature)"|tr -s ' '|tr "\n" "," |logger - 
p daemon.info -t 'smartd: /dev/sda'
*/15 *  * * *   root    cat /sys/block/sda/stat |tr -s ' '| logger -t  
'dstat-sda' -p daemon.info
*/15 *  * * *   root    sensors -f w83793-i2c-0-2f|grep -E "(Temp| 
Fan)"|tr -s ' '|cut -d'(' -f1|tr "\n" ","|logger -t 'hwmon' -p  
*/15 *  * * *   root    cat /sys/class/net/eth0/statistics/*|tr "\n"  
" "| logger -t 'nstat-eth0' -p daemon.info

So far the only disk failure we've seen was correctly predicted by  
smartctl, though with only about 30 minutes warning (noted after the  
fact, of course). FWIW, the smartctl lines above are a little too  
terse for making a proper alarm -- smartctl died during its run  
indicating imminent failure, but without any output there was no  
useful info in the syslog stream.

I just saw James Byers note with explicit flags in the smartctl call  
- sort of points to the need for a more standard one-line-output mode  
for smartctl, or at least a wrapper script which could handle those  
more elaborate cases as well as well as our null output.

-Nathan / PBwiki

On Apr 30, 2007, at 5:45 PM, Brandon Ooi wrote:

> we've had a lot of disks fail with no real warning from  
> smartmontools, not really sure how it decides a drive is failing.  
> we tried using this with a bunch of sata/ide drives. i too would  
> like to know what other people are trying to do.
> brandon
> Eric Lambrecht wrote:
>> What does everybody here use for monitoring the health of disks?  
>> We've got quite a heterogenous setup going on, and at the moment I  
>> think manual monitoring of syslog is all we've got since various  
>> machines seem to fail in all sorts of new and interesting ways  
>> with new and interesting error messages.
>> A cursory look on the web makes me think 'smartmontools.sf.net'  
>> might work pretty well, as I think all our machines ultimately use  
>> SATA drives.
>> We're using the latest mogile with the test writes, but that still  
>> doesn't seem to catch everything.
>> Any sage advice from experienced disk-tenders?
>> Eric...

More information about the mogilefs mailing list