MogileFS summit review

Thu Sep 21 22:30:19 UTC 2006

Hint: don't press tab then space in gmail until you're done writing the email.

Here are my notes.  Please feel free to add comments/corrections.

-lots of people (25-30) showed up
-most people are using mogile in development, but not in production
-much more documentation is needed before more people will start using this.

-Guba is making heavy use of mogile
 * they had a big problem with large (2.5 gb files) getting written
hundreds of times
    . this was because the size of the bytes written was reported
incorrectly, so mogile retried
 * mogile was designed for small (< 5 mb) files.
 * some people are using chunking for this, guba isn't.

-we talked about range requests...
 * its possible and probably a cool feature
 * not supported yet?

-Jonathan gave a pretty good demo of mounting Mogile (using WDFS)
  * need's webdav support
  * This is useful for management tasks
  * about 70% done, I'd love to see this released

-we need stat calls built in
  * I thought this should be everything that stat(2) returns
     . but a lot of fields (like inode, device, etc) don't make sense
     . implement as many that make sense (like file size, last access)

-Mark "Junior" Smith gave a demo of how plug ins work
  * I was a little unclear what the plugin he demoed did
  * if you write your own upload it to CPAN as MogileFS::Plugin::YouPluginName
  * You can write your own plug in for replication policies
     . eg, don't replicate files to machines on the same power strip
  * more documentation on what hooks there are needs to be written
      . we really need to do a better job on the wiki
      . wikis are great, but someone needs to organize them

-Database abstraction is a good thing for mogile
  * I looked into oracle support and it looks like I just need to
change these things:
       . REPLACE INTO is a mysql-ism and is used in a couple of places
       . validate_dbh() needs to be generic
       . there are hints like /*!40000 SQL_CACHE */ do these work in Oracle?
       . autoincrement used on table.  replace with sequence?
       . more that I probably haven't found yet
  * SQLite would be nice to lower barrier to entry

-we discussed webdav support
 * supported with lighttp and apache mod_dav
 * extending mogstored to support existing API and webdav?

-hardware suggestions
 * we asked that everyone send around their configuration
 * I promise to do this as soon as we buy hardware (I'm not running
it in prod yet)

-documentation
 * this would really help get more people involved
 * I'll write some POD for all the classes (I already started this)
 * Brett's How-To is an excellent start (http://durrett.net/mogilefs_setup.html)
 * I'd like to create a screencast demoing mogilefs

-tests
 * 2.0 has the beginnings of a test suite
 * you can never have too many tests.

-load testing
  * burn in a server using bonnie (http://www.textuality.com/bonnie/)
  * I suggested using JMeter (http://jakarta.apache.org/jmeter/)
      . I will write a JMeter script for this and publish it
      . really useful for benchmarking against other storage systems
  * someone also suggested push to test (http://pushtotest.com/)
  * published statistics would be nice

-monitoring
  * lots of people use these:
     . cacti       http://cacti.net/
     . ganglia   http://ganglia.sourceforge.net/
     . collectd  http://collectd.org/
  * I'd love to see a wiki article explaining how this works

-xen and vmware images
 * these would be nice to help people get started

-currently no way to change a class
-currently no job that cleans up replicated copies
  * if you decrease mindevcount, files before the change don't get reaped

-new features in 2.0
  * see brad's slides for a list of these
  * we talked a lot about a file system check (fsck) job
      . combined with meta files this could rebuild DB

- backing up mogile
  * mogile is the back up
  * but some people (me) want it stored offsite on tape just in case
  * Brad mentioned something I forgot about how to handle this
     . something about just back up fids larger than the the largest
one the last time you backed up?

- wishlist
  * automounting is another thing that would lower the barrier of entry
  * rebalance worker for when all your servers fill up and a you add a
new empty one

-file descriptor limit in RedHat is too small?
  * needs to be a test for this?

-Huge datasets (how scalable is Mogile)
  * i have 200 million images to put into Mogile
     . 4 "sizes" (thumbnail, screennail, orig, etc) mean almost a billion keys
  * the database would grow huge (mysql cluster runs out of memory)
  * I could partition the database
  * or I could write a plug in for dynamic keys
     . say the key for the thumbnail is 1234
     . the key for the screennail (which doesn't have a db entry) is 1234s
     . I like this idea the best

-meta data
  * for every file like 123456789.fid there would be a 123456789.meta
  * it would contain things like the key so fsck could rebuild
database if necessary.
  * the API would allow you store any other data you want in this file as well
     . like a photo site could store which "size" this is (thumbnail,
original, etc)

The meeting lasted about four hours.  My notes sucked, so please help
add stuff I
forgot.

This event got me totally motivated.  I'm working on documenting the
modules with  POD so I can understand how the internals work and then
I can start hacking on it.

Thanks to Brad and Six Apart for hosting this event.  They provided
Pizza and drinks for the event and it was much appreciated.

Jay