Newbe - hardware recommendations

Wed Jan 4 14:59:32 UTC 2006

On Wed, Jan 04, 2006 at 01:47:46AM +0100, jb wrote:

> Hello and happy new year;
> 
> We plan to use MogileFS for an online storage service. In fact, we look
> since several month a solution to scale our storage server, and I think it's
> time to make the right decision ;-)
> 
> Today :
> Docs are stored on a 1U server with Raid 5 and 600 Go.
> 300 Go - 4,5 millions of files. (99% are pictures, 1% are doc, pdf, small
> audio files..)
> During peak hours, we have 280 requests / sec on this server.
> On a basic day we serve 8,5 millions of requests.
> 
> -> This server is really a bottleneck, and we see it really near its limits.

What server software are you using? apache? lighttpd?

> Ideally, we'd like to perform at least 600 requests / sec with our future
> MogileFS installation.
> 
> We plan to use this configuration :
> 
> - 2 storage Node : 2 x 2U with for each 6 HD ( 2x 80 Go for Debian in raid
> 1, and 4x 400 Go for storage with no raid)
> 
> - 2 trackers : 2 x 1U (with Perl client on it too)
> 
> - 1 mySQL : 1 x 1U with 3 Go of RAM
> 
> So my questions :

> 1. Do you think this configuration is sufficient to handle this trafic. I'd
> like to have your experience and maybe some recommendations.

got me, I'm sure Brad would know though :-)

> 2. I think MySQL can become the new bottleneck, do you have some tips based
> on your experience with MogileFS for the hardware configuration of this
> server.

Maybe not the bottleneck, but it becomes a SPOF unless you do regular backups/replication.

> 3. Does PerlBal can cache documents (like Squid for example), or is it a
> good practice to imagine to cache documents on the tracker ?

Perbal doesn't need to cache documents.. it runs on the same machine that holds the files.
mogilefs works like:

1) client  -> tracker : I want to read foo.jpg
2) tracker -> mysql   : give me the record for foo.jpg
3) mysql   -> tracker -> client: that is file 00001, on x.x.x.2 or x.x.x.3
4) client  -> x.x.x.2(perlbal):  hey, give me 00001 ( GET /......./000001 )

What I believe livejournal does is use memcache to cache steps 2 and 3, so repeated hits for the same file never touch the tracker and mysql server.

Once you are at step 4, you simply have one or more URLS to your file and it is just plain old HTTP from that point on.

If you read some of the pdfs on http://danga.com/words/ they might give you some ideas.

> Thanks for your answers.

> JB

-- 
- Justin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.danga.com/pipermail/mogilefs/attachments/20060104/bfcde81b/attachment.html