mogilefs usage and performance

Anatoli Stoyanovski anatolia at gmail.com
Thu Aug 10 18:27:11 UTC 2006


Dear readers,

I'm testing mogilefs for website, which serves many pages and have
complex technique of generating and caching content.
I plan to store generated html files (actually, smth like
SSI-templates) on mogilefs, replicating over a number of servers. I
setup single test brand PIV-2.8Ghz server and run some benchmarks.
The mogilefs setup is perlbal on the front, apache w/ mod_perl,
mogstored/mogilefsd/mysql (http mode). SuSE Linux, default apache,
default mysql, default distro setup.

Perbal (80 port) proxies to apache/mod_perl (81 port), which contacts
tracker (6001 port) via MogileFS.pm, then reproxy to mogstored(7500
port), etc...
I used 'ab' utility, apache benchmark, with 1000 requests by 10
concurred connections for a single 10-kb image file in some
configurations

1) Apache get: 1500 requests/sec

2) Perlbal->mod_perl->Tracker->Perlbal->mogstored: 190 r/s
mogstored (direct download of known file url): 1170 r/s

3) Perlbal->mod_perl->Tracker: 140 r/s (don't use x-reproxy-url,
return file contents via mod_perl)

4) mod_perl->Tracker->mogstored: 180 r/s (we probably don't need
perlbal if mod_perl read file)

5) mod_perl->Tracker: 530 r/s (we don't get file content, just testing
the tracker get_paths api)

6) local disk reads using perl: about 50 000 reads/seconds (disk cache, etc)

As we see, perlbal in front is an optimal configuration, but very slow
compared with direct local disk access. That's bad.
Mogstored (based on perlbal) is fast enough by itself. It's good.
5th special test just for information, as a part of mogilefs process.

Overall, base configuration (2) have poor performance for us. Our
index page contains about 50 template files (we don't discuss how to
optimize it now), so I need 50 reads from mofilefs to assemble this
page. It tooks about 0.4 seconds, and that's bad. Under heavy load it
will be more slowly.

So, I want to ask - what's the optimal environment for these tasks,
where I need to read many small files very quickly and want to use
replicating file system mogilefs? As I understand, I can scale
trackers, mysql, setup more storage servers, but the minimum time for
50 files is 0.4 seconds.

I've checked /var/mogdata and found that it contans regular files. It
gives me draft idea of another method of using mogilefs: every server,
which contains mogilefs replicas will try to read all files locally
from /var/mogdata, and if it's not found, try to read it via
get_file_data (should be tuned for real project). Does anybody use
this method? I need as many as 500 10kbytes reads per second.

Finally, I've tested 'gfarm' filesystem in the same environment. It
gets 40 reads/second. Much less.


More information about the mogilefs mailing list