chunks?
Lamont Granquist
lamont at scriptkiddie.org
Thu May 31 17:17:17 UTC 2007
On Wed, 30 May 2007, dormando wrote:
> You might want to spend some more time reading mogile's docs and scan over
> the source code. In a typical use case (the same at gaiaonline.com), if we
> have multiple mogile drives per box, a failure only takes out one drive.
That's a very interesting trade off. If you've got a RAID array you can
fail a drive and take zero impact, but then you've got the additional
complexity of the RAID controller where failure of the entire array is a
much worse failure case.
You still do have the possibility of failure of the whole machine. You
can scramble and swap drives into a working box (or boxes) though. I
don't tend to like those kinds of solution, though, because at large sites
you'll be dealing with remote hands to try to get the work done
correctly... I'm going to have to think about that one more...
> Whole machine failure? Probably pretty traumatic for 5TB... With that much
> data I'd keep my mindevcount to 3 or more, to ensure enough resources
> available in case of a failure.
Yup.
Another question... Does MogileFS duplicate entire servers to other
entire servers or does it use an entirely random allocation pattern? The
concern with the random allocation pattern is that if you've got
mindevcount of 2 that statistially you will lose some data every time you
crash two machines at the same time...
> If you wanted to, you could chunk the data up inside your application and
> write a plugin for mogilefs to help serve out the data you want.
>
> So if you're storing "clusters" of 10k files you always need to access
> all-at-once and never one-at-a-time, just store them into mogilefs as one 64
> megabyte chunkfile. Store an offset map in a database (or another mogile
> file, or as a header to your chunk), and parse data as you see fit. Then
> mogilefs will happily be ignorant of all this and make sure your chunks are
> highly available and maintain a well balanced IO load.
For the apps that I'm considering the small blobs would need to be
randomly accessed with a very fast SLA so I can't pull 64MB to get at 10kB
inside of it.
It looks like MogileFS also just uses HTTP/DAV so I'd need to break the
mapping of URLs onto file paths in order to make it operate on chunks...
More information about the mogilefs
mailing list