Updated C client, performance, etc...

Fri Nov 12 10:56:28 PST 2004

>> *) Using kqueue(2) instead of select(2) would probably be a nice
>> winner, but I don't want to deal with interface portability quite yet
>> (besides, one file descriptor on a select(2) call isn't too terrible).
>
> Note that anyone using memcached has already figured out wrangling
> libevent.  But why are you using select at all?

Needed a cheap and easy way to prevent sucking up cycles in the kernel. 
  non-blocking IO + select() gives me that.

> Looking through the
> code, it looks like you're using non-blocking sockets?

In part of the code, yes.

> Why?

When reading responses from the server, I use non-blocking IO to piece 
together a buffer that contains a full line of data.  If I don't have a 
full line of data, I go back and try and read another chunk of data.  
If I use blocking sockets, if I try and read(2) too much or too little, 
I'm screwed and the client hangs.

Right now I read(2) a bit of data into a buffer.  If it completes a 
line of input, then I continue processing.  If it doesn't, then I 
select(2) on the descriptor waiting for more data.  Once data comes in, 
select(2) returns control to the program and I read(2) it onto the 
buffer.  Repeat, wash.  Continue until a newline is found.  Normally 
this happens on the first read(2) call so the select(2) never gets 
called.  I've tried just wrapping the read(2) in a loop.  It takes the 
same amount of time, but it saves kernel time to use select(2).  On 
FreeBSD, things are amazingly fast.  On OS-X, it's like Apple crippled 
kqueue(2) and introduced a delay, which sucks, but gives me a chance to 
test what a slow memcached server feels like and allowed me to 
program/benchmark accordingly.

I could get rid of all of this if there was a binary protocol, however. 
  :)

On the read(2)'s for the data, I readv(2) the data with blocking 
sockets since I don't want to return until I've gotten the data from 
the server.

> All of the comments by mc_server_block just say "Switch to non-blocking
> io"...

It also unblocks the connection in the right places.  I'm not terribly 
happy with what it does to the code, but its effective and doesn't seem 
to be slow.  120K get/set's a second.  I'm hacking bind-dlz to use 
libmemcache and postgresql... working on caching the compiled DNS 
responses...  bind-dlz only gets 16000 req/sec, but I can squeeze out 
over 100K for memcached.  :)

> This, in particular, is worrisome:
> {
> [...]
>      try_read:
>
> #ifdef HAVE_SELECT
>   [...]
> #endif
>       rb = read(ms->fd, mc->read_cur, mc->size - (size_t)(mc->cur - 
> mc->buf));
>       switch(rb) {
>       case -1:
> 	/* We're in non-blocking mode, don't abort because of EAGAIN or
> 	 * EINTR */
> 	if (errno == EAGAIN || errno == EINTR)
> 	  goto try_read;
> [...]
> }
>
> Won't that just spin if HAVE_SELECT is not defined?

Yup.  Doesn't incur any time delay, but its nicer on the kernel to use 
SELECT, which is why it's in there by default.  I want to add 
HAVE_KQUEUE() and HAVE_POLL() then move to those, which is why I added 
those bits in the first place.

>> *) Using mmap(2) for buffers instead of malloc(3)'ed memory (not 
>> having
>> to copy data from kernel to user and visa versa is always good).
>>
>> *) A binary protocol.  The current protocol requires a tad bit of
>> searching, but it's not bad.  I don't imagine there would be much of a
>> speedup to be had here, but it's an option.
>>
>> ...but I doubt it'll buy much to go down those routes... no app is
>> going to be limited by libmemcache's performance (that I'm aware of).
>
> (Certainly, profiling would be in order first.)

mc_get_line() takes up most of the time.  System calls are the culprit. 
  Nothing else compares.  What algorithms in there are mostly O(1).  I 
could also cache prevent the malloc(3) that I perform in mc_get(), but 
I'm not worried about it.  malloc(3) overhead isn't even showing up on 
the radar.

I forgot to mention in my previous commit, this last release added the 
ability to install your own memory functions via mcSetupMem().  Handy 
for writing wrappers such as Ruby or PostgreSQL.

> Random other comment:  in your "license", you write: "Use of this 
> software in
> programs released under the GPL programs is expressly prohibited by 
> the author
> (ie, BSD, closed source, or artistic license is okay, but GPL is not)."
>
> Without initiating a license flamewar, I would like to point out that
> weird licences such as this prevent this from being used by software
> such as LiveJournal itself, which is what memcached was written for in
> the first place.

Yeah, I know it's strange.  Most people care about having their code 
included in commercial products, I could care less.  I care about 
making sure my bits stay out of GPL software.  I can't use GPL bits 
because of their license[1], so why should I return the favor?  I just 
hate the GPL.  If you're not going to use the GPL, then I couldn't give 
a shit how you use my software.  I'm working on a license with OSI that 
covers my interests (version two of the OSSAL license and an 
OSSAL-light http://people.FreeBSD.org/~seanc/ossal/) and is easy for 
everyone to comply with (BSD + some GPL protection foo).  I want to 
protect against GPL forks.  For now, I'm reserving all rights which 
makes things simple.  In 99.9% of all instances, ask and I shall grant 
usage.  I want it used, just not in GPL software unless it's already 
established software (ex, I have patches to integrate libmemcache into 
perdition...).  So, as you can see, I haven't quite figured things out 
which is why it's goofy for now.  I'll get it squared away by 1.0.

[1] TenDRA, before anyone asks (offlist, however).

-- 
Sean Chittenden