Suspicious behaviour of the delete process

Wed Oct 11 15:48:35 UTC 2006

>>>>> On Tue, 10 Oct 2006 12:03:40 +0200, andreas.koenig.gmwojprw at franz.ak.mind.de (Andreas J. Koenig) said:

Following up to my own bugreport as I've gained new insights.

The problem only occurs if a mogilefs node is down for a while. As
soon as it is brought up again, the jam clears up. The reason is this:

A delete job always tries to delete LIMIT files from the beginning of
the table file_to_delete. If it cannot delete the files because the
node holding them is unreachable, it finishes the process_deletes
routine without having done anything. It then signals the caller that
it has more work to do, the caller immediately calls process_deletes
again and we observed the high load without getting anything done.

I don't know a solution for that. My first idea was to introduce a
random offset for the LIMIT clause, so that we do not read from the
beginning of the table. But this immediately put such a high load on
the database that normal mogile operation suffered. I tried to remedy
that by introducing more sleep periods but then the Watchdog started
to kill the delete job and spawn a new one.

In the end I saw no other quick solution than bringing up the missing
node again and letting the normal algorithm resolve the congestion
which it then did quite efficiently.

What do you think?
-- 
andreas