Recovery?

Mark Smith marksmith at danga.com
Tue Mar 15 09:01:18 PST 2005


> Ok. Does the copying happen the moment the new machine is added? Or only
> once a file is requested and the tracker notices it's missing?
> What if one of the content servers is down for a short while? Will the trackers
> somehow monitor this and copy all pending data to it when the content
> server is available again?

If a machine is down temporarily, content is not automatically replicated
to other machines.  You have to specifically indicate that a machine is
'dead'.  At that point, the reaper job goes through all files that used to
be on the now dead machine and marks them as not being on that machine
anymore.

The replication job, continuously running, notices that there are now a
bunch of files that are only on N-1 machines and starts replicating them to
the other machines to bring them back up to N copies.

So, a machine going "down" is different from a machine being "dead".
Downed machines are only temporarily gone and the files aren't replicated
any further as it's assumed the machine will be back.  Requests for content
are directed to the available machines.

> But it requires a select for each and every request, right? Or does it get
> chached in memory somehow?

Even with millions of files like we have, the data set is relatively tiny
and can fit in memory even on a 32 bit machine.  MySQL is able to load all
of it into a few hundred megabytes of memory and you don't need to access
disks much except for writing new files/replicated files which is fairly
simple access.

However -- that doesn't mean we don't cache.  :)  Our most popular file
paths are cached in memcache so that we don't need to contact the tracker
for them.  It saves a couple of steps and is a good first step to reducing
load on the MogileFS database.

--
Mark Smith
junior at danga.com


More information about the mogilefs mailing list