chunks?

dormando dormando at rydia.net
Thu May 31 00:30:16 UTC 2007


Sounds like a fun patch ;)

With a few million files on a device, rebuilding the replicants took an 
hourish. It's not too bad since the files end up being fairly evenly 
distributed.

I haven't actually used the new fsck/rebalance code in production yet. 
We had to finish a few other upgrades first; I expect them to be 
reasonably fast but not snappy operations.

You might want to spend some more time reading mogile's docs and scan 
over the source code. In a typical use case (the same at 
gaiaonline.com), if we have multiple mogile drives per box, a failure 
only takes out one drive. So if you have a machine with 5TB of storage 
JBOD, a single disk failure isn't so bad. Though I guess it'd be the 
same depending on RAID... Honestly, when a drive fails I just mark it 
dead and continue with life. I don't even bother pulling it, since in 
our specific case the OS runs out of a small memory filesystem and 
having an internal drive or two dead doesn't make a difference for its 
CPU-bound operations.

Whole machine failure? Probably pretty traumatic for 5TB... With that 
much data I'd keep my mindevcount to 3 or more, to ensure enough 
resources available in case of a failure.

If you wanted to, you could chunk the data up inside your application 
and write a plugin for mogilefs to help serve out the data you want.

So if you're storing "clusters" of 10k files you always need to access 
all-at-once and never one-at-a-time, just store them into mogilefs as 
one 64 megabyte chunkfile. Store an offset map in a database (or another 
mogile file, or as a header to your chunk), and parse data as you see 
fit. Then mogilefs will happily be ignorant of all this and make sure 
your chunks are highly available and maintain a well balanced IO load.

-Dormando

Lamont Granquist wrote:
> 
> Yeah, I'm more interested in dealing with large scale data management 
> operations.  So when you've got 200 machines with 5TB of data per 
> machine and you fail one, how long will it take to rebuild it off of one 
> of the replicants?  If you ever need to fsck one of the partitions, how 
> long will it take?  If you need to replicate the data in one datacenter 
> to another, how long will it take?  The normal day-to-day read/write 
> operations may be adequately fast, but bulk operations like this can 
> take days/weeks/months when dealing with large amounts of data...
> 
> And as the size of disks increases much faster than the seek time on 
> disks it becomes more and more important that large amounts of data can 
> be accessed in bulk streaming operations rather than seeking through the 
> entire disk...  Streaming transfer rates has been scaling better than 
> seek times, which means that as time goes on it becomes more and more 
> preferable to reduce the number of seeks for bulk operations...  As we 
> go from cheap arrays of 12 250GB disks to cheap arrays of 12 2TB disks 
> this problem just gets worse...



More information about the mogilefs mailing list