Mogile Deployment Layout: More Hosts or More Disks.
dormando
dormando at rydia.net
Tue Sep 18 06:58:19 UTC 2007
Hmm :) Maybe my next documentation spree should be a mogilefs FAQ :)
> Main question is, do we do more hosts per disks, or more disks per hosts.
I think the tradeoff here is pretty easy to spot:
As you spread out hosts:
- More local cache. mogstored relies on an OS's object cache to speed up
hot files, which you mention the CDN should take care of that...
- More bandwidth to the devices.
- Lessens the impact of losing a host (you should have enough mogilefs
hosts/devices that losing any one or two is something you don't have to
care about!).
- More CPU, I guess. It's rare but possible to load up mogstored on CPU.
As you add more devices:
- Fewer hosts to manage
- Losing an individual disk in a machine shouldn't hurt anything. In my
own setup I never bothered replacing dead disks in a host with multiple
drives. Just marked them as dead and got more hd's on the next server order.
Since you're somewhat more likely to lose a device than a whole host,
this isn't so bad.
You have to keep in mind:
- How full are your devices actually going to get before they become too
active to hold more files? 750G drives are nice, but usually I can't
even fill a 250G drive before it gets hosed with IO.
- The impact of losing a whole host with many 750G drives with many
(millions of?) files. It could take a long time for the reaper and
replicators to deal with this as they work in small batches of files.
Then again, it won't matter as much as you grow (and especially if you
can quickly deal with dead hosts).
So on a really busy service, I'd have tons of 64-bit hosts with extra
RAM. On something with more streaming involved, you have to understand
your dataset well to understand which way to go. Think about the average
size/access type of your files, as well as how often they're added or
replaced in the system.
Just remember to think of spindles more than disk size. Unless your
dataset is very idle you won't end up filling the disk, and the more
devices you have the more you can parallelize your batch operations :)
> As a side note, any real reason not to run the trackers on the storage
> nodes?
I did it. Worked okay. Most of my storage nodes didn't have trackers,
but some did. The only issue is the trackers can get CPU heavy, which
could interact with other things on your box.
> also, anyone have any pros cons on running mysql master/save
> with InnoDB on DRBD versus running lets say mysql cluster?
>
MySQL Cluster's probably not the greatest fit for the mogilefs database.
The dataset can be relatively small, but I don't think it's quite small
enough. Although honestly I only say that because I have limited
experience with cluster. My mogilefs DBs have been happy if they have
enough RAM for InnoDB to properly cache things...
DRBD should work okay. I've also done master:master with
auto_increment_offset, but that might scare the bejesus out of some
folks on the list. I like being able to optimize my tables though :)
-Dormando
More information about the mogilefs
mailing list