memory fragmentation issues in 1.2?

Thu Dec 7 18:47:25 UTC 2006

Paul T wrote:
> Slabs come with a price - they make valgdrind useless,
> they were causing wasting up to 50% of RAM (dunno if
> Facebook  fixed that, I suppose they did, but there
> was no benchmark to see how much RAM is still getting
> wasted in those 'regions') ... and it turns out that
> memory is still leaking ...
>   

First of all, if you don't like the slabs, you can compile with 
-DUSE_SYSTEM_MALLOC to bypass them completely and use the system malloc 
instead. At that point you should be able to use valgrind as you see 
fit. You will have to live with slightly higher CPU consumption (5-7% 
higher in my tests, though that's obviously highly dependent on your 
system malloc implementation). That said...

It's trivial to see the memory overhead, at least once the cache fills 
up. The "stats" command will show it to you. Here's the relevant part of 
the output from one of our servers:

STAT bytes 12224855073
STAT limit_maxbytes 13631488000

The second number is the configured memory limit, i.e., the total size 
of all the slabs once the cache is full. The first number is the total 
size of all the items in the cache, including their per-item headers. So 
from that, you can see that this particular cache instance has a memory 
efficiency of about 90%. Still plenty of room for improvement, no 
argument there (and I happen to know that at least one person is working 
on it) but it's in the realm of reasonableness, at least for us, 
considering the CPU efficiency gain and considering that we might well 
lose a similar amount of memory to fragmentation in a non-slab 
environment anyway.

As for memory leaks, I'm not denying it's *possible* there's a leak, but 
as one of the biggest memcached installations out there, we haven't run 
into it. Here are another few excerpts from our stats output:

STAT uptime 3180512
STAT cmd_get 24710319401
STAT cmd_set 876632609

That translates to about 36 days of uptime, and as you can see from the 
command counts, our instances aren't exactly sitting around idle. And 
they are not growing steadily over time. We peak at over 30,000 
connections (this instance doesn't use UDP for various reasons), so we 
also see:

STAT connection_structures 36656

which does cause some memory overhead. As you can see, our maximum 
configured size is 13000 megabytes; right now, according to "top", the 
process size of the instance in question is 14.0GB. So we have just over 
1GB of overhead -- but that holds steady once we've hit peak load a 
couple times and all the connection structures that need to get 
allocated are allocated. It is not a steadily increasing number -- we'd 
certainly know if it was, since the machine in question only has 16GB of 
RAM and we'd be in a world of hurt if memcached started swapping.

> Once upon a time instead of arguing for years about
> (many) moments like slabs I just bit the bullet and
> rewrote the whole thing without the slabs, without
> timers, without proprietory semi-binary protocol,
> without fancy (but logically questionable) 'automata'
> protocol implementation, without 'custom hash' e t.c.
> e t.c. 
>   

How is a state machine in any way logically questionable? More to the 
point, how *else* would one implement any protocol at all in a 
nonblocking, async-I/O-based environment where only part of a request 
might have arrived at any given point? How do you handle getting a 
partial HTTP request in univca, if not with a state machine of some kind?

That's assuming by "automata" you mean finite-state automata, i.e., 
state machines. If that's not what you're referring to then I'm not sure 
what you mean.

> Univca is *more* portable already - univca is using
> HTTP for a protocol hence it works with all the tools
> out there, that support HTTP protocol - memcached is
> using proprietory protocol that suffers from
> big/little endian problems.
>   

Okay, you've thrown me for a real loop here. Memcached's protocol is 
*text-based*. Human-readable, as in not binary. One can (and I 
frequently do) telnet to its TCP port and type commands into it. I'm not 
aware of anywhere in the memcached protocol where you could even 
*detect* what byte order the server is using, let alone where there's a 
dependency on it or a problem resulting from it. If you know 
differently, please tell me where it is, specifically!

I can do this:

pinklady% telnet localhost 11211
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
set foo 0 0 5
hello
STORED
set bar 0 0 6
pounce
STORED
get foo bar
VALUE foo 0 5
hello
VALUE bar 0 6
pounce
END

All perfectly human-readable (not binary) and no byte order 
dependencies. Which parts of the protocol are you referring to? The only 
thing I can think of that you might be referring to is the UDP header, 
but (a) the UDP protocol is totally optional, and (b) all its header 
fields are explicitly defined in the protocol spec to be in network byte 
order, so I'm not sure what little/bigendian problems there are with it.

-Steve