Architecture question

Thu Nov 30 20:32:14 UTC 2006

Hi all,
    We are currently studying utilizing MogileFS for our application,  
and I'm trying to figure out the best way to architect the system  
from a hardware/software perspective. I have a few questions  
regarding the overall architecture and perhaps best practices for  
implementation.

Consider this environment:

Potentially 3+ TB of new data monthly that will have X replicas of  
the data stored. (X is still a number I'm trying to figure.. my  
initial thoughts are 3 for a decent level of safety). There will be  
multiple classes of files, each with their own storage requirements.  
The data is generally written once, then read some number of times  
based on the compute tasks necessary to deal with the data. After a  
period of time (to be determined) the data is expired and cleaned up.

If I understand the documentation I've seen correctly, there is a  
central MySQL DB (Clustered or not) that the trackers talk to. Then  
the storage nodes just read/write based on what is given to them. Are  
there limitations I should be aware of? Our current system has  
roughly 200 million unique pieces of data stored as blobs in MySQL  
(across multiple servers), this quantity of files won't be a problem  
for MogileFS will it? In Mogile terminology there would be 10+  
domains with differing numbers of classes within each domain  
dependent on the parent class. So the 200 million files would be  
spread out across multiple servers based on class and the rules of  
replication for the class.

Hardware-wise, I'm looking at some relatively generic 4U servers with  
8 x 750GB SATA-2 drives in them. How would they be best utilized? as  
8 separate volumes or in a RAID 0 setup with some quantity of 1-2 TB  
volumes created. I'm leaning towards the 8 volume thing.

Any thoughts or input would be greatly appreciated.

-Carl