Intermittent memcached pauses

Mon Dec 4 07:09:44 UTC 2006

I eventually took the time to track our bug down using iostat at Brad's
suggestion.  We had an application bug that occasionally generated a very
large multi-get request (10,000 keys).  I discovered the bug when I noticed
the realloc in try_read_network was increasing the buffer to 1MB.  The
server got tied up processing the request, which caused various problems as
other machines in the cluster timed the machine out and marked it failed.

A nice to have feature would be a limit to the number of keys memcached
would attempt to process in a single multiple get request.

Chris

On 11/7/06, Brett G. Durrett <brett at imvu.com> wrote:
>
>
> I don't believe we are using either "stats cachedump" or "stats sizes".
> While trying to isolate the problem I disabled all monitoring scripts,
> which only use the "stats" command.  It is possible (but highly
> unlikely) that a diagnostic tool was running from another host, but the
> timing of these failures suggest it is something other than a monitoring
> script.
>
> Oddly enough, the behavior seemed to go away when I opened a telnet
> session to the memcached port and issued the 'stats' command... I assume
> this was just a coincidence.
>
> B-
>
> Steven Grimm wrote:
>
> > Are you by any chance using the "stats cachedump" or "stats sizes"
> > commands? Both of those will pause the server while they run; they are
> > intended for occasional debugging use rather than regular production
> use.
> >
> > -Steve
> >
> >
> > Chris Hondl wrote:
> >
> >>
> >> We've had memcached deployed at IMVU for about 9 months now.  It
> >> works great, scales easily, never goes down, and totally saved us as
> >> traffic on our site went up and up.
> >>
> >> We are now seeing a problem where one of our memcached hosts
> >> periodically stops handling requests for several seconds.  This
> >> particular host has a traffic pattern with about 30k get requests per
> >> second, a small number of set requests, all data values stored are
> >> small (booleans), and 90+% of the gets return a cache miss.
> >>
> >> Watching the machine it looks like it is actually memcached that is
> >> experiencing the delays and using all of the CPU.  We tried disabling
> >> all non-critical resources and still saw the pauses.  When we do see
> >> CPU get used up for 3-5 seconds, typically the number of processes
> >> waiting for time goes to 2 or 3 and if you look at top, the memcached
> >> process CPU utilization goes from ~30% to 99+%.
> >>
> >> Has anyone else seen behavior like this?
> >>
> >> Chris
> >>
> >>
> >>
> >> Normal:
> >> top - 13:00:00 up 123 days, 14:02,  4 users,  load average: 0.36,
> >> 0.52, 0.43
> >> Tasks:  51 total,   3 running,  48 sleeping,   0 stopped,   0 zombie
> >> Cpu(s): 22.3% us,  5.0% sy,  0.0% ni, 68.8% id,  0.0% wa,  0.0% hi,
> >> 4.0% si
> >> Mem:   3969768k total,   260392k used,  3709376k free,    22176k
> buffers
> >> Swap:  6144852k total,        0k used,  6144852k free,    60380k cached
> >>
> >>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> >> * 6465 root      16   0  108m  81m 1380 R 28.0  2.1   7:25.40memcached*
> >>  6829 root      16   0  1876  928 1672 R  0.7  0.0   0:00.11 top
> >>     1 root      16   0  1504  512 1352 S  0.0  0.0   0:13.55 init
> >>
> >> Problem:
> >>
> >> top - 13:00:42 up 123 days, 14:03,  4 users,  load average: 0.31,
> >> 0.49, 0.42
> >> Tasks:  52 total,   2 running,  50 sleeping,   0 stopped,   0 zombie
> >> Cpu(s): 98.3% us,  0.3% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,
> >> 1.3% si
> >> Mem:   3969768k total,   265136k used,  3704632k free,    22176k
> buffers
> >> Swap:  6144852k total,        0k used,  6144852k free,    60388k cached
> >>
> >>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> >> * 6465 root      20   0  110m  82m 1380 R 99.6  2.1   7:40.27memcached*
> >>     1 root      16   0  1504  512 1352 S  0.0  0.0   0:13.55 init
> >>     2 root      34  19     0    0    0 S  0.0  0.0   0:00.04ksoftirqd/0
> >>
> >>
> >>
> >>
> >> procs -----------memory---------- ---swap-- -----io---- --system--
> >> ----cpu----
> >>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us
> >> sy id wa
> >>  2  0      0 3713424  22176  60332    0    0     0     0    0     0
> >> 22  9 69  0
> >>  1  0      0 3713296  22176  60332    0    0     0     0    0     0
> >> 23  9 68  0
> >>  2  0      0 3713360  22176  60332    0    0     0     0    0     0
> >> 22 10 68  0
> >>  1  0      0 3713312  22176  60332    0    0     0     0    0     0
> >> 24 10 66  0
> >>  1  0      0 3713312  22176  60332    0    0     0     0    0     0
> >> 22 12 66  0
> >>  1  0      0 3713320  22176  60332    0    0     0     0    0     0
> >> 23 11 66  0
> >>  1  0      0 3713320  22176  60332    0    0     0    25    0     0
> >> 23  8 69  0
> >>  2  0      0 3713336  22176  60332    0    0     0    17    0     0
> >> 22 10 68  0
> >>  1  0      0 3713400  22176  60332    0    0     0     0    0     0
> >> 25 10 65  0
> >>  1  0      0 3713400  22176  60336    0    0     0     0    0     0
> >> 24 10 66  0
> >>  1  0      0 3713400  22176  60336    0    0     0     0    0     0
> >> 22  8 70  0
> >>  1  0      0 3713336  22176  60336    0    0     0     0    0     0
> >> 24  8 68  0
> >>  1  0      0 3713272  22176  60336    0    0     0    24    0     0
> >> 22 11 67  0
> >>  1  0      0 3713272  22176  60336    0    0     0     0    0     0
> >> 24 10 66  0
> >>  1  0      0 3713208  22176  60336    0    0     0     0    0     0
> >> 23  9 68  0
> >>  1  0      0 3713208  22176  60336    0    0     0     0    0     0
> >> 20 10 70  0
> >> * 2  0      0 3711032  22176  60336    0    0     0     0    0     0
> >> 73  2 25  0
> >>  2  0      0 3705720  22176  60340    0    0     0   132    0     0
> >> 98  2  0  0
> >>  2  0      0 3703544  22176  60344    0    0     0     0    0     0
> >> 99  1  0  0
> >>  2  0      0 3706488  22176  60344    0    0     0     0    0     0
> >> 97  3  0  0
> >>  2  0      0 3711416  22176  60348    0    0     0     0    0     0
> >> 98  2  0  0
> >>  2  0      0 3709624  22176  60348    0    0     0     0    0     0
> >> 99  1  0  0
> >>  2  0      0 3708848  22176  60348    0    0     0    72    0     0
> >> 98  2  0  0
> >>  2  0      0 3712048  22176  60348    0    0     0     0    0     0
> >> 87  3 10  0*
> >>  1  0      0 3712048  22176  60348    0    0     0     0    0     0
> >> 16  2 82  0
> >>  1  0      0 3712048  22176  60348    0    0     0     0    0     0
> >> 15  1 84  0
> >>  1  0      0 3712048  22176  60348    0    0     0     0    0     0
> >> 15  3 82  0
> >>  1  0      0 3712048  22176  60348    0    0     0    54    0     0
> >> 15  2 83  0
> >>  1  0      0 3712048  22176  60348    0    0     0     0    0     0
> >> 15  2 83  0
> >>  1  0      0 3712056  22176  60348    0    0     0     0    0     0
> >> 15  2 83  0
> >>  1  0      0 3712056  22176  60348    0    0     0     0    0     0
> >> 17  2 81  0
> >>  1  0      0 3712056  22176  60348    0    0     0     0    0     0
> >> 14  3 83  0
> >>  1  0      0 3712056  22176  60348    0    0     0    12    0     0
> >> 16  2 82  0
> >>
> >>
> >>
> >> --
> >> http://avatars.imvu.com/chris
> >
> >
>

-- 
http://avatars.imvu.com/chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.danga.com/pipermail/memcached/attachments/20061203/6e27fad2/attachment-0001.html