mogilefs usage and performance
anatolia at gmail.com
Fri Aug 11 07:42:03 UTC 2006
> Just for clarification... All of your tests have the Mogile Tracker in
> there. Can you test using a local (in mod_perl) or external (memcached)
> for a path cache? Or have you already?
Good idea. I was planning to avoid using memcached, the site engine is
already too complex, but it will be a good test. Will try today later.
But with cached file paths I don't have fresh information where files
really are, which of servers are down now. It could be tuned by
additional software design, of course.
> In our setup the typical flow is:
> perlbal->php->memcached, then back up to perlbal->mogstored.
> If the path is not in memcached (or has expired), then it does what you
> perlbal->php->tracker (then storing to memcached), hopping up to
> perlbal->mogstored for the rest.
> I'd suggest a test where you cache paths locally in mod_perl, then try
> again with memcached if possible.
> So it'd be perlbal->mod_perl->perlbal->mogstored, which is still going
> to be slow but should be much faster.
> Also what is the concurrency you're testing against all this with? Given
> all of the network delays in serving up mogilefs files I've gotten more
> performance by widening the number of requests going in at once.
Almost all tests were done using a single server. 'ab' runs from the
localhost and requests localhost. In this configuration no matters how
many concurrent connections you have - absolutelly linear dependency.
When I run 'ab' from another server, connected by 100mbps, the speed
lowers about 2-3%. So, it doesn't matter where to run 'ab', no
additional cpu load.
When I moved mysql to another server with the same hardware, connected
by 100mbps (not crossover), speed gets down about 5% (from 190 to 180
r/s) with single concurrent connection (tracker-mysql netword lags),
but gets up 5% (200 r/s) with 10 concurrent connections. So, network
lags less than dedicated database server when many connections. Not a
revolutionary results, of course. Just a numbers.
> Finally, I don't know if this can be replicated elsewhere, but I've
> found perlbal to be really hard to benchmark. I did not have enough time
> to play around with it but full proxy benchmarks to limits dynamic
> apache pages would sometimes lock up entirely or drop to almost
> negligable rates. However in some other tests with perlbal load
> balancing two static apache servers, I was getting thousands of requests
> per second. Something to keep an eye out for I guess...
> Anatoli Stoyanovski wrote:
> > Dear readers,
> > I'm testing mogilefs for website, which serves many pages and have
> > complex technique of generating and caching content.
> > I plan to store generated html files (actually, smth like
> > SSI-templates) on mogilefs, replicating over a number of servers. I
> > setup single test brand PIV-2.8Ghz server and run some benchmarks.
> > The mogilefs setup is perlbal on the front, apache w/ mod_perl,
> > mogstored/mogilefsd/mysql (http mode). SuSE Linux, default apache,
> > default mysql, default distro setup.
> > Perbal (80 port) proxies to apache/mod_perl (81 port), which contacts
> > tracker (6001 port) via MogileFS.pm, then reproxy to mogstored(7500
> > port), etc...
> > I used 'ab' utility, apache benchmark, with 1000 requests by 10
> > concurred connections for a single 10-kb image file in some
> > configurations
> > 1) Apache get: 1500 requests/sec
> > 2) Perlbal->mod_perl->Tracker->Perlbal->mogstored: 190 r/s
> > mogstored (direct download of known file url): 1170 r/s
> > 3) Perlbal->mod_perl->Tracker: 140 r/s (don't use x-reproxy-url,
> > return file contents via mod_perl)
> > 4) mod_perl->Tracker->mogstored: 180 r/s (we probably don't need
> > perlbal if mod_perl read file)
> > 5) mod_perl->Tracker: 530 r/s (we don't get file content, just testing
> > the tracker get_paths api)
> > 6) local disk reads using perl: about 50 000 reads/seconds (disk cache,
> > etc)
> > As we see, perlbal in front is an optimal configuration, but very slow
> > compared with direct local disk access. That's bad.
> > Mogstored (based on perlbal) is fast enough by itself. It's good.
> > 5th special test just for information, as a part of mogilefs process.
> > Overall, base configuration (2) have poor performance for us. Our
> > index page contains about 50 template files (we don't discuss how to
> > optimize it now), so I need 50 reads from mofilefs to assemble this
> > page. It tooks about 0.4 seconds, and that's bad. Under heavy load it
> > will be more slowly.
> > So, I want to ask - what's the optimal environment for these tasks,
> > where I need to read many small files very quickly and want to use
> > replicating file system mogilefs? As I understand, I can scale
> > trackers, mysql, setup more storage servers, but the minimum time for
> > 50 files is 0.4 seconds.
> > I've checked /var/mogdata and found that it contans regular files. It
> > gives me draft idea of another method of using mogilefs: every server,
> > which contains mogilefs replicas will try to read all files locally
> > from /var/mogdata, and if it's not found, try to read it via
> > get_file_data (should be tuned for real project). Does anybody use
> > this method? I need as many as 500 10kbytes reads per second.
> > Finally, I've tested 'gfarm' filesystem in the same environment. It
> > gets 40 reads/second. Much less.
More information about the mogilefs