disk monitoring

Jared Klett jared at blip.tv
Tue May 1 17:36:58 UTC 2007


ooh! I had no idea.

MogileFS continues to be awesome. thanks Brad.

-----Original Message-----
From: Brad Fitzpatrick [mailto:brad at danga.com] 
Sent: Tuesday, May 01, 2007 1:33 PM
To: Jared Klett
Cc: mogilefs at lists.danga.com
Subject: RE: disk monitoring

Why aren't you using mogautomount?  It comes with the mogilefs server
and takes care of all this, so you don't accidentally make mistakes:

docs:
http://search.cpan.org/~bradfitz/mogilefs-server-2.10/mogautomount


On Tue, 1 May 2007, Jared Klett wrote:

> 	so far I've built all our MogileFS nodes with 3ware/AMCC
controllers 
> (we were using RAID-5 before Mogile) which do a great job of 
> monitoring disks on a regular basis and sending e-mail notification 
> when anything from a bad sector is detected to an entire drive
failing.
>
> 	the downside is that when a drive *does* fail and the drive is 
> replaced, the 3ware controller doesn't keep its internal unit IDs 
> consistent, which then in turn causes devices in Linux to change, i.e.
> /dev/sdc fails and now /dev/sdd shifts to become /dev/sdc, which 
> requires extremely careful manual remounting at the Mogile mount
points.
>
> 	hope that makes sense... if anyone has a solution to avoid that,

> please let me know. :)
>
> 	anyway, in the past I've used homegrown scripts to do things
like 
> write a temporary file to the device to be checked, make sure the file

> exists, and send an alert e-mail if a problem arises during those 
> operations.
>
> cheers,
>
> - Jared
>
> --
> Jared Klett
> Co-founder, Blip.tv
> JaredAtWrok (aim)
> http://blog.blip.tv
>
> -----Original Message-----
> From: mogilefs-bounces at lists.danga.com 
> [mailto:mogilefs-bounces at lists.danga.com] On Behalf Of Eric Lambrecht
> Sent: Tuesday, May 01, 2007 12:21 PM
> To: mogilefs at lists.danga.com
> Subject: Re: disk monitoring
>
> That paper is fascinating!
>
> Honestly, though,  I don't care if SMART predicts the drive failure, I

> just want to know when a drive has actually failed so we can 'dead' it

> and not end up in a situation where two drives are down and we start 
> losing data.
>
> Eric....
> ...... Original Message .......
> On Tue, 1 May 2007 11:11:03 +0300 "Egor Egorov" <egor at fine.kiev.ua>
> wrote:
> >
> >On 1 may 2007, at 03:23, Eric Lambrecht wrote:
> >
> >>
> >> A cursory look on the web makes me think 'smartmontools.sf.net'
> >> might work pretty well, as I think all our machines ultimately use 
> >> SATA drives.
> >
> >
> >The Google team found that 36% of the failed drives did not exhibit a

> >single SMART-monitored failure. They concluded that SMART data is 
> >almost useless for predicting the failure of a single drive.
> >
> >http://labs.google.com/papers/disk_failures.pdf
> >
> >Very interesting paper.
> >
> >--
> >     Egor Egorov
> >     http://www.fine.kiev.ua/
> >
> >
> >
> >
>
>


More information about the mogilefs mailing list