How to monitors [WAS Re: Can't fetch stats ]
jfrias at gmail.com
Fri Mar 21 05:42:38 UTC 2008
doh, forgot to reply-all
On Fri, Mar 21, 2008 at 1:38 AM, Javier Frias <jfrias at gmail.com> wrote:
> On Wed, Mar 19, 2008 at 1:38 AM, Robin H. Johnson <robbat2 at gentoo.org> wrote:
> > On Tue, Mar 18, 2008 at 08:44:13PM -0400, Javier Frias wrote:
> > > So i guess searching the lists *well* should have been my first recourse...
> > >
> > > http://lists.danga.com/pipermail/mogilefs/2007-June/001043.html
> > >
> > > mentions that the stats command is pretty heavy.. I'd assume this is
> > > still the case.. so... the follow up question would be,
> > >
> > > how does everyone here monitor mogilefs?
> > >
> > > I have db monitors, and port monitors, and a simple transaction
> > > monitor is in the works ( write a file, delete a file ), but it'd be
> > > nice to map growth across domains/classes.
> > I used to graph this in Munin (run the query every 5 minutes, store in
> > RRD):
> > SELECT COUNT(fid), dmid, classid FROM file GROUP BY dmid, classid;
> > (which the dmid/classid cached).
> > So all domain/classes on the same graph.
> > I stopped doing it as due to the differences in magnitude of numbers and
> > growth rate, I needed to have multiple separate graphs, and I just
> > couldn't be bothered.
> Yeah, this is a similar query as i'm trying to implement, the script
> running it would computer the difference from the last run, as well as
> the time frame, so as to compute both the counts ( a worthwile yet not
> really work draphing statistic, more just a thing you check ), but
> mostly what the growth rate is, as this I can use to plan out when in
> the future i will need more storage nodes.
> Another thing worth graphing is the replication stats, ie, files in
> replication queue, etc etc.
> > Instead, I do use the per-database graphs that Munin has for Postgresql.
> > postgres_block_read_
> > postgres_commits_
> > postgres_queries_
> > postgres_space_
> yeah, i'm graphing these statis in mysql as well.
> > This actually raises one interesting bit that I'm not sure if anybody
> > else has seen. Approximately once a day, Mogile is doing a SELECT query
> > that returns a massive 50-80k rows, while the normal 5-minute average is
> > ~500.
> hadn't noticed, but i do have spikes in my usage, so i attributed
> those to my traffic spikes.
> > Beyond that performance monitoring of PostGres, I do have a lot of stuff
> > watched via Nagios - daemon running and port connection tests for
> > all the mogilefsd nodes (3), and all the mogstored nodes (8), and the
> > haproxy nodes (local on each web client).
> same here.
> > *haproxy: None of the MogileFS client code does load-balancing/failover
> > between MogileFS instances very well, so we use haproxy on
> > loopback on each of our web nodes. If you want to just contact the
> > Mogile system, instead of looking for a mogilefsd instance that is up,
> > you just hit localhost:7001, and it directs you to one that IS actually
> > up. haproxy keeps state of ones that are up, so it works well. Doing it
> > on loopback cuts down on any latency and failure issue we might have if
> > we were to have it on a standalone system.
> I handle it at the client level. Aside from mogstored, we run an
> image transformation proxy on each storage node, that can
> upscale/downscale images. So when my client requests a file from
> mogile, i do a get_paths, get the path, and then translate the port to
> my image transformation proxy, if this fails, i try the next path, etc
> etc. Since files are distributed, i woundlt be able to use my load
> balancer on the storage daemons, since potentially, a file is not
> guaranteed to be on the same path/device as in all the servers.
> Thx for the reply, its cool to know what others are doing to monitor mogile.
> > --
> > Robin Hugh Johnson
> > Gentoo Linux Developer & Infra Guy
> > E-Mail : robbat2 at gentoo.org
> > GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85
More information about the mogilefs