First round of small crash fixes for svn
Brad Fitzpatrick
brad at danga.com
Mon Nov 6 00:25:27 UTC 2006
Alan,
On Fri, 27 Oct 2006, dormando wrote:
> (does this list do attachments?)
yes.
> Attached is a handful of small fixes to the svn mogilefs (not running
> the release, sorry :P).
>
> - Fix for HTTPFile, didn't import the 'error' subroutine, so it'd bomb
> out if trying to error.
Thanks. Committed.
> - Added a decent watchdog to the delete job as a default.
Committed.
> Given how many
> files it selects it almost never gets to update in time... I was
> thinking of a better way to do this though. Should delete ping every N
> files it deletes? Every few percent of files it has to delete? That
> would prevent it from timing out if a device is lagging significantly.
Added a ->still_alive call in the loop. It rate-limits itself to once a
second.
> - Also a fix for checking if the path returned is undef. The new
> make_path code has a lot of paths that can return undef, but a lot of
> code in mogile doesn't check to see if what it got back is undef.
Hm. Yeah, some auditing is needed. Somewhat harmless, though.
> - Fix a warning in Worker/Query.pm
Checked in a similar version.
> - Make the bogus error code death message more useful in Worker/Replicate.pm
Checked in.
> - Change a socket error to a src err in Worker/Replicate.pm. I'm not
> sure if this gets the intended result though. There're a lot of cases in
> that replicate section where it'll return a "bogus error code". The case
> this fixes is when a mogstored simply dies; if I took down a mogstored
> while heavy replication was going on, it would crash flood and
> eventually kill the parent process (!).
>
> I noticed if a lot of jobs were dying or errors were flying about, the
> parent has a tendency to crash, and the children don't necessarily
> notice :) I haven't tracked down how/why this happens yet.
An strace of the parent would almost certainly give the answer.
Can you reproduce easily enough? I'd love to fix that. Sounds scary.
> Finally, the deleter job in the new trackers sucks. We're already 2
> million files behind for deletion. I'll have the bottleneck narrowed
> down sometime on monday.
Yeah, it needs that rescheduling rewrite, so put off delete errors into
the future when things are alive again.
> Other than that, it works great! :P We're running it in production as of
> today and they're hella fast.for everything but deletes. It's also
> really nice having mogadm not suck anymore. We were able to toy with the
> trackers, and add 16 hosts + devices without having to touch the database.
Wonderful! Let's make it better, though.
- Brad
More information about the mogilefs
mailing list