Lamont Granquist lamont at
Sat Jun 2 00:02:46 UTC 2007

On Thu, 31 May 2007, dormando wrote:
>> You still do have the possibility of failure of the whole machine.  You 
>> can scramble and swap drives into a working box (or boxes) though.  I 
> Have slightly more, less dense machines? Instead of having "cold spares" or 
> spare parts for my mogilefs cluster laying about, I just put them all into 
> production. So in case of a few machines failing over the course of a month I 
> don't have to do anything at all.

Yeah, having JBOD though would mean that I could keep all the spindles 
functioning in a complete rebuild so that the time is limited to a 'find 
/' through a single disk.  I keep waffling on how good of an idea I think 
this is when scaled...  Leaving RAID behind makes me nervous, I just need 
to think through all the possible consequences...

I do really like that it makes the boxes simpler and cheaper, and you can 
operate all the spindles fully independently so performance will be 
extremely fast...  That also mitigates the problems with rebuilds or 
administrative operations since you can do your 'find /' kind of 
operations in parallel over all the drives on the box...

> io's/sec are always going to kill you before you run out of space. Buy 
> slightly larger disks and crank up the devcount.

Yeah, that's what I'm expecting.  As long as the replication policy is 
pluggable or reasonably hackable -- you're probably right that i/o load 
and hotspots will be more problematic than data availability.

>> For the apps that I'm considering the small blobs would need to be 
>> randomly accessed with a very fast SLA so I can't pull 64MB to get at 10kB 
>> inside of it.
> A perlbal plugin that allows seeking into chunks based on the URL would be 
> fine... I'm concerned however that you just contradicted your load pattern?
> What exactly is your load pattern expected to be? :) Chunking won't help if 
> you have to access a small file from 500 different chunks in different 
> places. Chunking will only help if you're mass-processing data. Such as in 
> data mining, text indexing, image analysis, small file backups, blah blah. 
> This should be true for both MogileFS and GoogleFS.

Well, I don't want to break up small objects.  The idea is to coalesce the 
small objects so that backup / restore / replicate types of adminstrative 
operations can occur on the larger coalesced objects.  I'll still have the 
primary random read/write I/O on smaller 1-10kB blobs of data though.

More information about the mogilefs mailing list