Architecture question

Dormando dormando at rydia.net
Fri Dec 1 09:00:42 UTC 2006


Hey,

How many nodes were you planning on setting up? MogileFS works best
with at least as many physically separate nodes as you require replicas.
I would say double that if possible (so 3-6 for 3x copies per file, or
more). It will prefer to put copies of files on different hosts for
reliability.

Don't bother with raid, just up them all as independent drives. If one
fails the other 7 would keep working and serving/storing files. Recent
SVN handles this a lot better than it used to thanks to some AIO tweaks
brad did.

For our own setup we have around 23 million files stored 3x times each.
These are all being heavily rewritten and accessed. I've, uh,
"accidentally" broken the deleter a few times and have ended up with
over 400 million files in my database and it handled that fine. You
should get a machine with a fair amount of CPU and as much RAM as is
reasonable, but MogileFS has proven to be really easy on the database.
Moreso if you cache paths and don't often hit the trackers for a read
that often.

I would see our DB handling a few hundred million files without much
issue, depending on the read or write load. What kind of access
patterns do you use?

have fun,
-Dormando

On Thu, 30 Nov 2006 12:32:14 -0800
Carl Forsythe <carl at immi.com> wrote:

> Hi all,
>     We are currently studying utilizing MogileFS for our
> application, and I'm trying to figure out the best way to architect
> the system from a hardware/software perspective. I have a few
> questions regarding the overall architecture and perhaps best
> practices for implementation.
> 
> Consider this environment:
> 
> Potentially 3+ TB of new data monthly that will have X replicas of  
> the data stored. (X is still a number I'm trying to figure.. my  
> initial thoughts are 3 for a decent level of safety). There will be  
> multiple classes of files, each with their own storage requirements.  
> The data is generally written once, then read some number of times  
> based on the compute tasks necessary to deal with the data. After a  
> period of time (to be determined) the data is expired and cleaned up.
> 
> If I understand the documentation I've seen correctly, there is a  
> central MySQL DB (Clustered or not) that the trackers talk to. Then  
> the storage nodes just read/write based on what is given to them.
> Are there limitations I should be aware of? Our current system has  
> roughly 200 million unique pieces of data stored as blobs in MySQL  
> (across multiple servers), this quantity of files won't be a problem  
> for MogileFS will it? In Mogile terminology there would be 10+  
> domains with differing numbers of classes within each domain  
> dependent on the parent class. So the 200 million files would be  
> spread out across multiple servers based on class and the rules of  
> replication for the class.
> 
> Hardware-wise, I'm looking at some relatively generic 4U servers
> with 8 x 750GB SATA-2 drives in them. How would they be best
> utilized? as 8 separate volumes or in a RAID 0 setup with some
> quantity of 1-2 TB volumes created. I'm leaning towards the 8 volume
> thing.
> 
> Any thoughts or input would be greatly appreciated.
> 
> -Carl
> 
> 
> 
> 
> 
> 


More information about the mogilefs mailing list