mecached - text protocol?

Andy Bakun memcached@thwartedefforts.org
Fri, 16 Jul 2004 14:05:50 -0500


On Fri, 2004-07-16 at 04:23, Michal Suszycki wrote:

> Protocol for memcached can be simple, without versioning.

I don't think this is in dispute.  Of course you can make it simple
without versioning.  But it is folly to generalize and say that all
protocols are better as binary.  One size does not fit all.  If the goal
of a protocol is to be extensible, then a non-binary protocol makes
sense, as it is easier to design it so as not be versioned.  If a goal
of a protocol is to be simple to write clients in C, then a binary
protocol might make sense.

> Validating client input is _ALWAYS_ necessary but by checking if length
> are not beyond limit or by checking if <u8 code> has defined function
> pointer in commands[] table, or by checking if <u16 len1> values (see
> above) fits inside DATA length.  This is simple integer comparision.

"This is simple integer comparision."  I see this a lot, as if to say
that "simple integer comparison" is somehow more efficient than a
"simple character comparison".  The ONLY thing that is less efficient
about string comparisons is that you do more of them (and comparing four
u8 characters can be exactly the same number of instructions as
comparing a single u32 integer if you are diligent about the word
alignment of your architecture and optimize your string (read "series of
bytes") comparison functions/abstractions.

The small number of string comparisons that memcached does is dwarfed by
the number of string comparisons that apache does or that SpamAssassin
does, and AFAIK reasonable person would pushing for a binary HTTP
protocol (given the goals of HTTP).  Removing string comparisons in the
name of "performance" is an extremely small gain for memcached, compared
to things like using a better polling method, zero-copy network drivers,
a different malloc implementation.

But of course, it's always fun to try to bum the last few instructions
out of something -- I surely won't deny that.  Unfortunately, memcached
seems to be designed to do a limited number of things really well, and
one of those things is the ability to have any kind of client
interface.  A binary protocol doesn't necessarily make interfacing
easier on the clients, in the case of scripting languages, it can make
it harder.  But options are always nice, but let's not forget to weigh
the possible problems of multiple protocols compared to the goals.  With
the right level of abstraction, these problems can, of course, be
mitigated.

Does anyone have any stats on how often memcached is waiting for network
activity?  Does gigabit ethernet make a difference?  Does FreeBSD, for
example, perform better as a memcached server than Linux or HP-UX
(although, at this point, unless there was something seriously wrong
with the performance under a certain OS, would it be wise to switch host
OSes for memcached for a reason other than to squeeze that last bit of
performance out of it?).  Is there a list of currently pending
bottlenecks that will eventually be taken care of?  Is "does too many
string comparisons" on this list?

-- 
Andy Bakun: unrepentantly ignorant 
        <abakun@thwartedefforts.org>