updates

Thu Nov 16 18:37:01 UTC 2006

To handle hot files, I have pondered putting a couple layers of cache in
front of mogile.  For very small (less than a meg) files, stick them in
memcache, for common files stick them in tugela-cache, then check
mogile.  An off-line process would populate memcache and tugela.  The
tugela box would have a bunch of ram to try and cache the fs, and maybe
an iram

http://www.newegg.com/Product/Product.asp?Item=N82E16815168001

or two.

Course mogilefs would have to honor/update the caches.

Earl

-----Original Message-----
From: mogilefs-bounces at lists.danga.com
[mailto:mogilefs-bounces at lists.danga.com] On Behalf Of Eric Lambrecht
Sent: Wednesday, November 15, 2006 3:50 PM
To: mogilefs at lists.danga.com
Subject: Re: updates

Brad Fitzpatrick wrote:
> We're building the realtime IO balancing into mogilefsd and mogstored
this
> week.
> 
> But how is the Guba code trying to do IO balancing?  I obviously
missed a
> part.

Here's how the system works with the new code:

We have two types of machines here: high density servers with around 4TB

of storage on them and maybe 4GB of RAM (we'll call them the 'slow 
boxes'), and low density servers with around 8GB of RAM on them (we'll 
call them the 'fast' boxes).

When we store stuff, we store two copies of everything on those slow 
boxes. When the site is slammed, however, those machines can't serve 
stuff up fast enough because there aren't enough spindles and they dont 
have enough RAM to cache things effectively.

Our system watches the lighttpd logs on those slow servers and 
periodically picks out the most frequently hit fids.

We then tell the tracker to 'overreplicate' those files. Ultimately that

translates to mogile making extra copies of those files onto the fast 
servers and thus reducing the load on the slow servers.

Our system periodically (and randomly) takes some of those 
overreplicated files off the fast servers, so they're only served via 
the slow servers again.

Additionally, the new system lets us specify that 'thumbnail images', 
for instance, should be stored only on a subset of machines that have 
enough RAM to effectively hold all of them in the filesystem cache.

Eric..

> 
> 
> On Wed, 15 Nov 2006, Jussi Vaihia wrote:
> 
> 
>>This seems to partly address http://brad.livejournal.com/2177644.html
>>? Is the code behind the article part of mogileFS by now?
>>
>>On 11/15/06, Brad Fitzpatrick <brad at danga.com> wrote:
>>
>>>Wow, where do I start?  This is all over the place.  :)
>>>
>>>Do you want comments already (and on what?), or do you want to break
this
>>>into smaller pieces and start getting it into svn?
>>>
>>>Are you planning on staying caught up on svn but maintaining
differences,
>>>getting your stuff into svn, or just forking?  (I'm doubting forking,
>>>since you sent this code drop... :))
>>>
>>>I'd love to help you get the common parts in and anything
guba-specific
>>>into clean hooks/policy classes, but it'll be some work on both our
parts.
>>>
>>>How do you want to proceed?
>>>
>>>
>>>
>>>On Wed, 15 Nov 2006, Eric Lambrecht wrote:
>>>
>>>
>>>>I've attached a patch that takes revision 458 from the sixapart svn
and
>>>>patches it to include the changes we've made internally to mogile. I
can
>>>>break this down, but I wanted to at least get something back in case
I
>>>>get distracted again...
>>>>
>>>>The big changes we made:
>>>>
>>>>All the decisions for where to put a particular file (when storing
for
>>>>the first time or replicating) are now part of the replication
policy.
>>>>We added a 'store_on' function to the ReplicationPolicy class that
is
>>>>called by the 'create_open' command to ask for a place to store the
>>>>first instance of a file. The 'Mgd::find_device_id' code was moved
to
>>>>the default MultipleHosts replication policy.
>>>>
>>>>The replication code was updated so that a ReplicationPolicy class
can
>>>>tell the replication worker to delete a replica. It does this by
>>>>returning a negative device id from the 'replicate_to' class. We
also
>>>>pass the size of the file to be replicated to the ReplicationPolicy.
>>>>Also, the 'files_to_replicate' table has an 'extrareplicacount'
column
>>>>added that lets use request more than the minimum number of replicas
for
>>>>some file (see below).
>>>>
>>>>new 'increp' script that lets you tell mogile to make extra replicas
of
>>>>some file (see below).
>>>>
>>>>'listpaths' command added to mogtool to assist our admins/developers
in
>>>>finding out where things are in mogile and checking their size (we
had a
>>>>lot of truncated/missing files for some reason). It just prints out
the
>>>>URLs of the requested file along with their actual size as
determined by
>>>>a HEAD request.
>>>>
>>>>The 'host_meta' table was added, along with the code to read it in
when
>>>>caching host information.
>>>>
>>>>Our MultipleHostsWithHints replication policy was added (see my
previous
>>>>email and the comments in the code for how it works).
>>>>
>>>>Our 'Reduce' worker was added (see below).
>>>>
>>>>Updates to make the Checker work (not heavily tested yet).
>>>>
>>>>Update to mogdbsetup to make all our database changes.
>>>>
>>>>--
>>>>
>>>>With respect to the 'Reduce' worker, the 'extrareplica' count stuff,
and
>>>>the abililty for a ReplicationPolicy to mark something for deletion:
>>>>
>>>>Our internal goal has been to update mogile to push content around
to
>>>>different machines to deal with different file size/access patterns
>>>>(without changing the API for interacting with mogile). Our
>>>>MultipleHostsWithHints replication policy solves that and lets us
throw
>>>>things to specific machines upon insertion (thumbnails to low
>>>>storage/high ram boxes, big ol' DVD's to reall dense/slow
machines)..
>>>>
>>>>To handle content that suddenly becomes very popular, but is on slow
>>>>machines, we came up with the notion of 'overreplicating' it. We
realize
>>>>that fid XXX is popular (via the lighttpd logs), so we tell mogile
to
>>>>make a couple extra replicas of XXX by throwing a new entry in the
>>>>'files_to_replicate' table with 'extrareplicacount' set to some
non-zero
>>>>amount.
>>>>
>>>>Just making more copies of a file doesn't necessarily speed up
access to
>>>>it, but when we combine this with our replication policy (which says
>>>>'put replicas 3 and 4 of any file on these really fast boxes'), we
can
>>>>ensure that popular content gets migrated to our really fast
machines
>>>>and we don't beat the hell out of our higher density archive boxes.
>>>>
>>>>We added the 'Reduce' worker to randomly delete those extra replicas
>>>>from the fast machines. Our system continuously pushes popular stuff
to
>>>>the high speed machines while randomly removing older (hopefully
less
>>>>popular) stuff from those boxes.
>>>>
>>>>We updated the ReplicationPolicy code to allow it to delete replicas
so
>>>>that it can push things around if their existing locations don't
match
>>>>up with where the policy wants them to be. This is useful if you've
>>>>stored something under replication policy X, but now change it to
>>>>replication policy Y or if you change the minimum devcount for some
class.
>>>>
>>>>If you have more questions, let me know. I'd like to help push any
>>>>changes into the official distro, but I understand if they don't
work
>>>>with your broader goals...
>>>>
>>>>Eric...
>>>>
>>>
>>