brad at danga.com
Wed Nov 15 21:18:05 UTC 2006
Wow, where do I start? This is all over the place. :)
Do you want comments already (and on what?), or do you want to break this
into smaller pieces and start getting it into svn?
Are you planning on staying caught up on svn but maintaining differences,
getting your stuff into svn, or just forking? (I'm doubting forking,
since you sent this code drop... :))
I'd love to help you get the common parts in and anything guba-specific
into clean hooks/policy classes, but it'll be some work on both our parts.
How do you want to proceed?
On Wed, 15 Nov 2006, Eric Lambrecht wrote:
> I've attached a patch that takes revision 458 from the sixapart svn and
> patches it to include the changes we've made internally to mogile. I can
> break this down, but I wanted to at least get something back in case I
> get distracted again...
> The big changes we made:
> All the decisions for where to put a particular file (when storing for
> the first time or replicating) are now part of the replication policy.
> We added a 'store_on' function to the ReplicationPolicy class that is
> called by the 'create_open' command to ask for a place to store the
> first instance of a file. The 'Mgd::find_device_id' code was moved to
> the default MultipleHosts replication policy.
> The replication code was updated so that a ReplicationPolicy class can
> tell the replication worker to delete a replica. It does this by
> returning a negative device id from the 'replicate_to' class. We also
> pass the size of the file to be replicated to the ReplicationPolicy.
> Also, the 'files_to_replicate' table has an 'extrareplicacount' column
> added that lets use request more than the minimum number of replicas for
> some file (see below).
> new 'increp' script that lets you tell mogile to make extra replicas of
> some file (see below).
> 'listpaths' command added to mogtool to assist our admins/developers in
> finding out where things are in mogile and checking their size (we had a
> lot of truncated/missing files for some reason). It just prints out the
> URLs of the requested file along with their actual size as determined by
> a HEAD request.
> The 'host_meta' table was added, along with the code to read it in when
> caching host information.
> Our MultipleHostsWithHints replication policy was added (see my previous
> email and the comments in the code for how it works).
> Our 'Reduce' worker was added (see below).
> Updates to make the Checker work (not heavily tested yet).
> Update to mogdbsetup to make all our database changes.
> With respect to the 'Reduce' worker, the 'extrareplica' count stuff, and
> the abililty for a ReplicationPolicy to mark something for deletion:
> Our internal goal has been to update mogile to push content around to
> different machines to deal with different file size/access patterns
> (without changing the API for interacting with mogile). Our
> MultipleHostsWithHints replication policy solves that and lets us throw
> things to specific machines upon insertion (thumbnails to low
> storage/high ram boxes, big ol' DVD's to reall dense/slow machines)..
> To handle content that suddenly becomes very popular, but is on slow
> machines, we came up with the notion of 'overreplicating' it. We realize
> that fid XXX is popular (via the lighttpd logs), so we tell mogile to
> make a couple extra replicas of XXX by throwing a new entry in the
> 'files_to_replicate' table with 'extrareplicacount' set to some non-zero
> Just making more copies of a file doesn't necessarily speed up access to
> it, but when we combine this with our replication policy (which says
> 'put replicas 3 and 4 of any file on these really fast boxes'), we can
> ensure that popular content gets migrated to our really fast machines
> and we don't beat the hell out of our higher density archive boxes.
> We added the 'Reduce' worker to randomly delete those extra replicas
> from the fast machines. Our system continuously pushes popular stuff to
> the high speed machines while randomly removing older (hopefully less
> popular) stuff from those boxes.
> We updated the ReplicationPolicy code to allow it to delete replicas so
> that it can push things around if their existing locations don't match
> up with where the policy wants them to be. This is useful if you've
> stored something under replication policy X, but now change it to
> replication policy Y or if you change the minimum devcount for some class.
> If you have more questions, let me know. I'd like to help push any
> changes into the official distro, but I understand if they don't work
> with your broader goals...
More information about the mogilefs