have a hard time injecting when doing replication or deleting

dormando dormando at rydia.net
Sun Jun 29 02:34:11 UTC 2008


Hey,

Can you monitor your mysql database during this issue? Are there queries
stacking up? Can you dump the processlist and show us what they are?

Any chance you could fire up the mogilefsd (tracker) into a screen
session? Maybe redirect stderr/stdout to a file... Then slowly increase
replicate and delete workers in there.

Try 1 replicator, 5, then 10. Unfortunately stderr/stdout has a little
more information than mogilefs bubbles up into !watch, especially if
workers are dying.

Are your MySQL tables using myisam by any chance? Or are they all
InnoDB? Can you confirm with mysql -u whatever -h whatever -pwhatever -e
'SHOW TABLE STATUS' mogilefs (or whatever your database is named).

I'm not convinced there isn't something wrong with the storage nodes.
Any chance you can add monitoring to the lighttpd's? Or strace one of
them during the high load issue and see if it's blocking on something?

We'll find it, didn't figure we'd have to go into this sort of detail
though.

-Dormando

Komtanoo Pinpimai wrote:
> Hi,
> 
> Thanks for mentioning of the FD, it was default -- 1024. So I set it to
> a high value for mogstored, fsd, and lighttpd.
> However it didn't seem fix the problem.
> 
> I too suspect DB and the storage nodes, because they are centralized
> resources that all fsds connect.
> 
> Still, I couldn't find anything wrong with the DB or storage nodes. The
> status in mogadm check is always "writable" for all nodes.
> 
>> Do you have a huge replication queue?
> 
> Yes, I'm having about half a million to replicate.
> 
> Another evidence when I ran those many replicators is, in colo2/colo4
> trackers( they don't have replicator or delete running, just
> queryworker/listener jobs).
> They showed something like these:
> 
> [monitor(22673)] Monitor running; scanning usage files
> [reaper(22674)] Reaper running; looking for dead devices
> Watchdog killing worker 22658 (queryworker)
> Child 22824 (queryworker) died: 9 (UNEXPECTED)
> Job queryworker has only 34, wants 35, making 1.
> [monitor(22673)] dev10: used = 245859688, total = 304629952, writeable = 1
> Child 22658 (queryworker) died: 9 (UNEXPECTED)
> Job queryworker has only 34, wants 35, making 1.
> Watchdog killing worker 22652 (queryworker)
> Child 22829 (queryworker) died: 9 (UNEXPECTED)
> Job queryworker has only 34, wants 35, making 1.
> [monitor(22673)] Monitor running; scanning usage files
> [monitor(22673)] dev17: used = 37421628, total = 305723656, writeable = 1
> Watchdog killing worker 22651 (queryworker)
> Watchdog killing worker 22652 (queryworker)
> Watchdog killing worker 22647 (queryworker)
> Child 22652 (queryworker) died: 9 (UNEXPECTED)
> Job queryworker has only 34, wants 35, making 1.
> Watchdog killing worker 22654 (queryworker)


More information about the mogilefs mailing list