job_reaper and job_delete and a dead device
a.witt at marktplaats.nl
Mon Feb 26 14:25:12 UTC 2007
We are using MogileFS to store images.
We have 6 storage nodes on which mogilefsd is running, too, but only with
the queryworker process.
We have two dedicated trackers, running mogilefsd with queryworkers,
replicate workers, delete worker, reaper job.
We are running with the latest stable version from the CVS repository,
version 1.0.2 if I remember well.
The 6 storage nodes each have 1 or 2 disks. Some of the data disks are
We have lots of small files.
Recently one of the disks crashed and we marked it as dead, in order for the
reaper to reschedule files to be replicated. There were about 12M files on
The reaper job takes by default 1000 files per at least 10 seconds and
We saw the loads grow on the database node, so I decreased the number of
files per 10 seconds.
We actually changed the code a bit, so that it will fetch a record from the
database telling how many files it may take per reaper loop. This record is
then updated at scheduled time, so that during low-traffic hours, the reaper
can do more and at high traffic periods, it will do less.
At the same time we have quite some files inserted into the file_to_delete
table each day, about 300,000.
At some point we saw that because of the dead device no files were deleted
So I changed the code a bit, so that the job_delete checks if a device is
dead and just deletes the entry from the file_on table if the file resided
on the dead device. It just skips to try the delete on the dead device.
Furthermore, the reaper job takes SELECT records from the file_on table
where the devid equals the devid of the dead device. I changed the code a
bit, so that it uses a join on the file_to_delete table, where the
file_to_delete.fid IS NULL, since it makes no sense to replicate files, that
are to be deleted.
I have included a diff file, that show the changes we made.
Now some questions:
1. It takes more than 7 days to get all the files on the dead device to be
reaped and replicated again. What can we do to decrease this time? I was
thinking of creating more storage nodes, with smaller disks. does somebody
have other ideas?
2. In the logs we often see lines like:
Feb 26 12:50:38 tracker1 mogilefsd: [replicate(22880)] replicated=20,
What conclusions can we take from those lines?
Does a low ratio mean that we have too many replicate jobs running?
We currently have 30 of them running on each of the two dedicated trackers.
I have been playing around a little with the number of replicate jobs, and
it seems that increasing the number helps a little. When increasing the
number of replicate jobs, the load increases on the database server. And it
increases the load on the trackers, obviously.
3. We also see a lot of lines in the log like:
Feb 26 15:22:21 tracker2 mogilefsd: [replicate(32426)] Unable to
obtain lock mgfs:fid:348165238:replicate
What conclusions can we draw from these lines? It does not seem to be a
More information about the mogilefs