memory fragmentation issues in 1.2?

Thu Dec 7 19:11:17 UTC 2006

Everything I reffer to in my email is in archives of
memcached mailing list. 

If you want to see how to implement keepalive HTTP
protocol without state machine you can look at univca
source code - you were able to grasp memcached code -
reading univca code would be clearly no problem.

I see no problems in memory leaks, people make bugs
all the time -- it is no big deal.

http://cvs.danga.com/browse.cgi/wcmtools/memcached/doc/protocol.txt?rev=HEAD&content-type=text/plain

<quote>
There are two kinds of data sent in the memcache
protocol: text lines
and unstructured data.  Text lines are used for
commands from clients
and responses from servers. Unstructured data is sent
when a client
wants to store or retrieve data. The server will
transmit back
unstructured data in exactly the same way it received
it, as a byte
stream. The server doesn't care about byte order
issues in
unstructured data and isn't aware of them. There are
no limitations on
characters that may appear in unstructured data;
however, the reader
of such data (either a client or a server) will always
know, from a
preceding text line, the exact length of the data
block being
transmitted.

Text lines are always terminated by \r\n. Unstructured
data is _also_
terminated by \r\n, even though \r, \n or any other
8-bit characters
may also appear inside the data. Therefore, when a
client retrieves
data from a server, it must use the length of the data
block (which it
will be provided with) to determine where the data
block ends, and not
the fact that \r\n follows the end of the data block,
even though it
does.
</quote>

If this is *not* semi-binary protocol then I don't
know  how else to call it.

The bugreport about big/little endian is in the
archives.

Last time I checked, USE_SYSTEM_MALLOC was not
delivering what it promised - but that was in
memcached 1.* code.

I suggest you ignore everything I said and let's
forget this thread, because seriosly - I have not a
day left before xmas to spend on this mailing list. 

The guy had memory leak and was talking about "memory
fragmentation". I said "look elswhere", now this is a
flamewar. Geez.

Sorry it turned into *this*.

Happy hollidays, gentleman.

Rgds.Paul.

> First of all, if you don't like the slabs, you can
> compile with 
> -DUSE_SYSTEM_MALLOC to bypass them completely and
> use the system malloc 
> instead. 

> At that point you should be able to use
> valgrind as you see 
> fit. You will have to live with slightly higher CPU
> consumption (5-7% 
> higher in my tests, though that's obviously highly
> dependent on your 
> system malloc implementation). That said...
> 
> It's trivial to see the memory overhead, at least
> once the cache fills 
> up. The "stats" command will show it to you. Here's
> the relevant part of 
> the output from one of our servers:
> 
> STAT bytes 12224855073
> STAT limit_maxbytes 13631488000
> 
> The second number is the configured memory limit,
> i.e., the total size 
> of all the slabs once the cache is full. The first
> number is the total 
> size of all the items in the cache, including their
> per-item headers. So 
> from that, you can see that this particular cache
> instance has a memory 
> efficiency of about 90%. Still plenty of room for
> improvement, no 
> argument there (and I happen to know that at least
> one person is working 
> on it) but it's in the realm of reasonableness, at
> least for us, 
> considering the CPU efficiency gain and considering
> that we might well 
> lose a similar amount of memory to fragmentation in
> a non-slab 
> environment anyway.
> 
> As for memory leaks, I'm not denying it's *possible*
> there's a leak, but 
> as one of the biggest memcached installations out
> there, we haven't run 
> into it. Here are another few excerpts from our
> stats output:
> 
> STAT uptime 3180512
> STAT cmd_get 24710319401
> STAT cmd_set 876632609
> 
> That translates to about 36 days of uptime, and as
> you can see from the 
> command counts, our instances aren't exactly sitting
> around idle. And 
> they are not growing steadily over time. We peak at
> over 30,000 
> connections (this instance doesn't use UDP for
> various reasons), so we 
> also see:
> 
> STAT connection_structures 36656
> 
> which does cause some memory overhead. As you can
> see, our maximum 
> configured size is 13000 megabytes; right now,
> according to "top", the 
> process size of the instance in question is 14.0GB.
> So we have just over 
> 1GB of overhead -- but that holds steady once we've
> hit peak load a 
> couple times and all the connection structures that
> need to get 
> allocated are allocated. It is not a steadily
> increasing number -- we'd 
> certainly know if it was, since the machine in
> question only has 16GB of 
> RAM and we'd be in a world of hurt if memcached
> started swapping.
> 
> > Once upon a time instead of arguing for years
> about
> > (many) moments like slabs I just bit the bullet
> and
> > rewrote the whole thing without the slabs, without
> > timers, without proprietory semi-binary protocol,
> > without fancy (but logically questionable)
> 'automata'
> > protocol implementation, without 'custom hash' e
> t.c.
> > e t.c. 
> >   
> 
> How is a state machine in any way logically
> questionable? More to the 
> point, how *else* would one implement any protocol
> at all in a 
> nonblocking, async-I/O-based environment where only
> part of a request 
> might have arrived at any given point? How do you
> handle getting a 
> partial HTTP request in univca, if not with a state
> machine of some kind?
> 
> That's assuming by "automata" you mean finite-state
> automata, i.e., 
> state machines. If that's not what you're referring
> to then I'm not sure 
> what you mean.
> 
> > Univca is *more* portable already - univca is
> using
> > HTTP for a protocol hence it works with all the
> tools
> > out there, that support HTTP protocol - memcached
> is
> > using proprietory protocol that suffers from
> > big/little endian problems.
> >   
> 
> Okay, you've thrown me for a real loop here.
> Memcached's protocol is 
> *text-based*. Human-readable, as in not binary. One
> can (and I 
> frequently do) telnet to its TCP port and type
> commands into it. I'm not 
> aware of anywhere in the memcached protocol where
> you could even 
> *detect* what byte order the server is using, let
> alone where there's a 
> dependency on it or a problem resulting from it. If
> you know 
> differently, please tell me where it is,
> specifically!
> 
> I can do this:
> 
> pinklady% telnet localhost 11211
> Trying 127.0.0.1...
> Connected to localhost.
> Escape character is '^]'.
> set foo 0 0 5
> hello
> STORED
> set bar 0 0 6
> pounce
> STORED
> get foo bar
> VALUE foo 0 5
> hello
> VALUE bar 0 6
> pounce
> END
> 
> All perfectly human-readable (not binary) and no
> byte order 
> dependencies. Which parts of the protocol are you
> referring to? The only 
> thing I can think of that you might be referring to
> is the UDP header, 
> but (a) the UDP protocol is totally optional, and
> (b) all its header 
> fields are explicitly defined in the protocol spec
> to be in network byte 
> order, so I'm not sure what little/bigendian
> problems there are with it.
> 
> -Steve

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com