job_reaper and job_delete and a dead device

Mon Feb 26 14:25:12 UTC 2007

Dear All,

We are using MogileFS to store images.
We have 6 storage nodes on which mogilefsd is running, too, but only with 
the queryworker process.
We have two dedicated trackers, running mogilefsd with queryworkers, 
replicate workers, delete worker, reaper job.

We are running with the latest stable version from the CVS repository, 
version 1.0.2 if I remember well.

The 6 storage nodes each have 1 or 2 disks. Some of the data disks are 
300GB.
We have lots of small files.

Recently one of the disks crashed and we marked it as dead, in order for the 
reaper to reschedule files to be replicated. There were about 12M files on 
this disk.
The reaper job takes by default 1000 files per at least 10 seconds and 
reschedules them.
We saw the loads grow on the database node, so I decreased the number of 
files per 10 seconds.
We actually changed the code a bit, so that it will fetch a record from the 
database telling how many files it may take per reaper loop. This record is 
then updated at scheduled time, so that during low-traffic hours, the reaper 
can do more and at high traffic periods, it will do less.

At the same time we have quite some files inserted into the file_to_delete 
table each day, about 300,000.

At some point we saw that because of the dead device no files were deleted 
anymore.
So I changed the code a bit, so that the job_delete checks if a device is 
dead and just deletes the entry from the file_on table if the file resided 
on the dead device. It just skips to try the delete on the dead device.

Furthermore, the reaper job takes SELECT records from the file_on table 
where the devid equals the devid of the dead device. I changed the code a 
bit, so that it uses a join on the file_to_delete table, where the 
file_to_delete.fid IS NULL, since it makes no sense to replicate files, that 
are to be deleted.

I have included a diff file, that show the changes we made.

Now some questions:
1. It takes more than 7 days to get all the files on the dead device to be 
reaped and replicated again. What can we do to decrease this time? I was 
thinking of creating more storage nodes, with smaller disks. does somebody 
have other ideas?

2. In the logs we often see lines like:
Feb 26 12:50:38 tracker1 mogilefsd[22877]: [replicate(22880)] replicated=20, 
attempted=50, ratio=40.00%
What conclusions can we take from those lines?
Does a low ratio mean that we have too many replicate jobs running?
We currently have 30 of them running on each of the two dedicated trackers.
I have been playing around a little with the number of replicate jobs, and 
it seems that increasing the number helps a little. When increasing the 
number of replicate jobs, the load increases on the database server. And it 
increases the load on the trackers, obviously.

3. We also see a lot of lines in the log like:
Feb 26 15:22:21 tracker2 mogilefsd[30180]: [replicate(32426)] Unable to 
obtain lock mgfs:fid:348165238:replicate
What conclusions can we draw from these lines? It does not seem to be a 
problem.

Thanks!

Arnoud Witt