out of memory errors

Miguel DeAvila miguel.j.deavila at gmail.com
Thu Mar 20 19:05:27 UTC 2008


On Wednesday 19 March 2008 23:50:09 dormando wrote:
> Huzzah!
> 
> Actually that's a bad thing. That means when it tries to evict an object
> there are 50+ "locked" objects ahead of it, which may mean a refcount
> leak (but I haven't proved this yet).
> 
> Do you have the full text of the out of memory error? For 1.2.5 I
> changed all of the errors to have more context, so we can tell exactly
> where in the code it came from.

We get errors while storing objects and also when incrementing counters.
On the client side the responses from the server look like this,

	SERVER_ERROR out of memory storing object

or

	SERVER_ERROR out of memory in incr/decr

We're not doing any logging on the server side.

> Do you have the 'stats items' outofmemory counters? Are the errors 
> isolated to specific slab classes? 


The errors are limited to three (memcache02, memcache04, memcache08) servers
(out of 12).

Here are the first stats,

	memcache02.	STAT items:1:outofmemory 90936
	memcache04	STAT items:1:outofmemory 36973
	memcache08	STAT items:1:outofmemory 847494

Here are the stats at the moment,

	memcache02	STAT items:1:outofmemory 717719
	memcache04	STAT items:1:outofmemory 815924
	memcache08	STAT items:1:outofmemory 1512202

(~48 hours elapsed between the two sets of stats.)


> Does the 'evictions' state for those classes increase ever? 

No.

Here's the first stats 

	memcache02	STAT items:1:evicted 168443
	memcache04	STAT items:1:evicted 237803
	memcache08	STAT items:1:evicted 67210

Here's the next set (48 hours later),

	memcache02	STAT items:1:evicted 168443
	memcache04	STAT items:1:evicted 237803
	memcache08	STAT items:1:evicted 67210

> Do you get an error setting an item into those 
> classes 100% of the time? (from your text it'd appear that answer is no?).

The number of items on the three servers in the class
has not changed over the last 48 hours (nor has the age of the oldest item in the
LRU), so I suspect that all the 'set' operations are for the class are failing.

> You're definitely not running with -M (LRU disabled) mode?

Correct. We are *not* running with -M.

thanks,

Miguel


> 
> -Dormando
> 
> Miguel DeAvila wrote:
> > We have a 12-node memcache (v 1.2.5) cluster with ~72GB of memory (6GB
> > per server, ~1300 request/sec per server).
> > 
> > We've started getting "SERVER_ERROR out of memory" errors during both object 
> > stores and counter increments. The errors are isolated to 3 of the 12 servers, 
> > and to the  same slab class (class 1) on each server.
> > 
> > It seems like an out-of-memory error occurs when there are no free chunks in the class,
> > no additional slabs can be allocated, and if no items can be evicted from the LRU
> > (due to non-zero refcounts).
> > 
> > The cluster stores items with a wide range of sizes. It is certainly possible that the
> > item sizes that were prevalent while the cache was filling are different then the
> > item sizes on an ongoing basis (leading to an imperfect slab-to-class allocation).
> > 
> > We're using the default "powers-of-N" value (1.25).
> > 
> > The number of errors, relative to the number of successes, is quite small, but previously
> > there were no out-of-memory errors at all.
> > 
> > Are these types of errors typical for a busy, mid-sized cluster with a wide item-size
> > distribution? (Or is this a harbinger of things to come ...)
> > 
> > thanks,
> > 
> > Miguel
> 
> 




More information about the memcached mailing list