How to monitors [WAS Re: Can't fetch stats ]

Fri Mar 21 05:42:38 UTC 2008

doh, forgot to reply-all

On Fri, Mar 21, 2008 at 1:38 AM, Javier Frias <jfrias at gmail.com> wrote:
> On Wed, Mar 19, 2008 at 1:38 AM, Robin H. Johnson <robbat2 at gentoo.org> wrote:
>  > On Tue, Mar 18, 2008 at 08:44:13PM -0400, Javier Frias wrote:
>  >  > So i guess searching the lists *well* should have been my first recourse...
>  >  >
>  >  > http://lists.danga.com/pipermail/mogilefs/2007-June/001043.html
>  >  >
>  >  > mentions that the stats command is pretty heavy.. I'd assume this is
>  >  > still the case.. so... the follow up question would be,
>  >  >
>  >  > how does everyone here monitor mogilefs?
>  >  >
>  >  > I have db monitors, and port monitors, and a simple transaction
>  >  > monitor is in the works ( write a file, delete a file ), but it'd be
>  >  > nice to map growth across domains/classes.
>  >  I used to graph this in Munin (run the query every 5 minutes, store in
>  >  RRD):
>  >  SELECT COUNT(fid), dmid, classid FROM file GROUP BY dmid, classid;
>  >  (which the dmid/classid cached).
>  >  So all domain/classes on the same graph.
>  >  I stopped doing it as due to the differences in magnitude of numbers and
>  >  growth rate, I needed to have multiple separate graphs, and I just
>  >  couldn't be bothered.
>
>  Yeah, this is a similar query as i'm trying to implement, the script
>  running it would computer the difference from the last run, as well as
>  the time frame, so as to compute both the counts ( a worthwile yet not
>  really work draphing statistic, more  just a thing you check ), but
>  mostly what the growth rate is, as this I can use to plan out when in
>  the future i will need more storage nodes.
>
>  Another thing worth graphing is the replication stats, ie, files in
>  replication queue, etc etc.
>
>
>
>  >
>  >  Instead, I do use the per-database graphs that Munin has for Postgresql.
>  >  postgres_block_read_
>  >  postgres_commits_
>  >  postgres_queries_
>  >  postgres_space_
>
>  yeah, i'm graphing these statis in mysql as well.
>
>
>  >
>  >  This actually raises one interesting bit that I'm not sure if anybody
>  >  else has seen. Approximately once a day, Mogile is doing a SELECT query
>  >  that returns a massive 50-80k rows, while the normal 5-minute average is
>  >  ~500.
>
>  hadn't noticed, but i do have spikes in my usage, so i attributed
>  those to my traffic spikes.
>
>
>  >
>  >  Beyond that performance monitoring of PostGres, I do have a lot of stuff
>  >  watched via Nagios - daemon running and port connection tests for
>  >  all the mogilefsd nodes (3), and all the mogstored nodes (8), and the
>  >  haproxy nodes (local on each web client).
>
>  same here.
>
>
>  >
>  >  *haproxy: None of the MogileFS client code does load-balancing/failover
>  >  between MogileFS instances very well, so we use haproxy on
>  >  loopback on each of our web nodes. If you want to just contact the
>  >  Mogile system, instead of looking for a mogilefsd instance that is up,
>  >  you just hit localhost:7001, and it directs you to one that IS actually
>  >  up. haproxy keeps state of ones that are up, so it works well. Doing it
>  >  on loopback cuts down on any latency and failure issue we might have if
>  >  we were to have it on a standalone system.
>
>  I handle it at the client level.  Aside from mogstored, we run an
>  image transformation proxy on each storage node, that can
>  upscale/downscale images. So when my client requests a file from
>  mogile, i do a get_paths, get the path, and then translate the port to
>  my image transformation proxy, if this fails, i try the next path, etc
>  etc. Since files are distributed, i woundlt be able to use my load
>  balancer on the storage daemons, since potentially, a file is not
>  guaranteed to be on the same path/device as in all the servers.
>
>
>
>  >
>
>
>  Thx for the reply, its cool to know what others are doing to monitor  mogile.
>
>  >  --
>
>
> >  Robin Hugh Johnson
>  >  Gentoo Linux Developer & Infra Guy
>  >  E-Mail     : robbat2 at gentoo.org
>  >  GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85
>  >
>