dormando at rydia.net
Thu May 31 00:30:16 UTC 2007
Sounds like a fun patch ;)
With a few million files on a device, rebuilding the replicants took an
hourish. It's not too bad since the files end up being fairly evenly
I haven't actually used the new fsck/rebalance code in production yet.
We had to finish a few other upgrades first; I expect them to be
reasonably fast but not snappy operations.
You might want to spend some more time reading mogile's docs and scan
over the source code. In a typical use case (the same at
gaiaonline.com), if we have multiple mogile drives per box, a failure
only takes out one drive. So if you have a machine with 5TB of storage
JBOD, a single disk failure isn't so bad. Though I guess it'd be the
same depending on RAID... Honestly, when a drive fails I just mark it
dead and continue with life. I don't even bother pulling it, since in
our specific case the OS runs out of a small memory filesystem and
having an internal drive or two dead doesn't make a difference for its
Whole machine failure? Probably pretty traumatic for 5TB... With that
much data I'd keep my mindevcount to 3 or more, to ensure enough
resources available in case of a failure.
If you wanted to, you could chunk the data up inside your application
and write a plugin for mogilefs to help serve out the data you want.
So if you're storing "clusters" of 10k files you always need to access
all-at-once and never one-at-a-time, just store them into mogilefs as
one 64 megabyte chunkfile. Store an offset map in a database (or another
mogile file, or as a header to your chunk), and parse data as you see
fit. Then mogilefs will happily be ignorant of all this and make sure
your chunks are highly available and maintain a well balanced IO load.
Lamont Granquist wrote:
> Yeah, I'm more interested in dealing with large scale data management
> operations. So when you've got 200 machines with 5TB of data per
> machine and you fail one, how long will it take to rebuild it off of one
> of the replicants? If you ever need to fsck one of the partitions, how
> long will it take? If you need to replicate the data in one datacenter
> to another, how long will it take? The normal day-to-day read/write
> operations may be adequately fast, but bulk operations like this can
> take days/weeks/months when dealing with large amounts of data...
> And as the size of disks increases much faster than the seek time on
> disks it becomes more and more important that large amounts of data can
> be accessed in bulk streaming operations rather than seeking through the
> entire disk... Streaming transfer rates has been scaling better than
> seek times, which means that as time goes on it becomes more and more
> preferable to reduce the number of seeks for bulk operations... As we
> go from cheap arrays of 12 250GB disks to cheap arrays of 12 2TB disks
> this problem just gets worse...
More information about the mogilefs