MogileFS efficiency and external syncronization

Wed Jun 28 05:35:33 UTC 2006

Yo,

We want to use MogileFS for more images, or to at least store and manage 
master copies of files. I have a feeling our load pattern would not work 
well with MogileFS though.

Right now we have a code repo for high traffic static images, which gets 
syncronized to ramdisks on machines running lighttpd. Running them 
directly out of MogileFS is tempting since we can allow the artists easy 
upload/management and avoid nasty crap.

 From what I've measured I really don't want to have small numbers of 
images hit extremely hard with the MogileFS setup, which for a well 
cached image:

user -> firewall -> perlbal -> webapp -> memcached
Then back up through perlbal -> mogstored -> user

Lots of roundtripping. Lots of headers/protocols. Loading 30 images on a 
page will take noticably longer. Also, for our larger images with tons 
of hits, we'd have to set a mindevcount of 6+ to ensure the mogstored's 
could handle the load.

The graphics servers are just:

user -> firewall -> graphics (PF load balancing + a health check daemon 
I wrote).

So what I would like to do is add a cronjob to our graphics servers that 
can syncronize a domain to local memory.

If I'm understanding everything correctly, every time a key gets updated 
with a new file it gets a new fid? That fid should be a 
autoincrement-alike, so it will be the next highest in the series.

So I add a tracker command that allows me to dump all keys in a domain, 
and a command that dumps all keys in a domain with a fid greater than N. 
Then I should be able to use those to ensure what's in mogilefs is what 
I have on disk?

Those queries won't be incredbly fast however I don't intend to use them 
on domains with more than a few thousand keys, and can add an index if 
absolutely necessary.

Are there any better approaches which are roughly as simple? Perhaps a 
way to make the mogstored hits not as nuts?

-Dormando