A C client for memcached

Wed Oct 27 00:21:42 PDT 2004

> Wow, nice surprise.

Yeah, I'd been feeling like a dick for sitting on this code.  It'd be a 
crime to have someone else duplicate the work 'cause I was too lazy to 
send an email.

> Are you going to host this somewhere that I can link to from the 
> clients
> page, or would you rather I host it?

I'd like to see it incorporated into the tree that way other language 
authors can wrap them for other various language APIs (ex: the PHP api, 
which doesn't support multiple servers... I could also write a Ruby 
wrapper with relative ease if there was interest).  I'm not familiar 
with the autofuck tool chain... could you put together the necessary 
glue to have it built and installed by default?  I could be of help if 
you were using pmk, but I've managed to largely avoid auto* and its 
headaches.

Speaking of APIs, there is a rather ugly pimple in the memcached 
system: propagation of a server list is a PITA and always ends up 
getting hand rolled for each installation/site.  As things stand, each 
client has to grab their own list and maintain it.  For persistent 
running programs that only update their list of servers on startup... 
it's problematic.  What I propose doing is adding a few things to 
memcached/libmemcache.

1) Change libmemcache so that it uses shared memory for its list of 
servers.  Then, when an application starts up, it defines an 
application/key space domain (the key for the shared memory segment), 
which it uses to grab a unique set of memcached servers that are 
available for that domain.  This is handy for folks who have different 
memcached instances for different applications.  Right now, when you 
create a new memcache object, you call mc_new(void).  I'd like to see 
this become a wrapper around mc_new2(const char *domain), where 
mc_new(void) calls mc_new2("default").  This would preserve API, but, 
would allow all kinds of useful things to happen with a memcached admin 
tool, which would manipulate the various server lists for the available 
domains.  The other pieces in shared memory would include a count of 
the number of servers, and a u_int64_t version number to version the 
server list.

For users who don't have shared memory or don't want it, mc_new() would 
call mc_new_private(), which would be the same as calling 
mc_new2("private").  A private server mapping is specific to the 
memcache instance (identical to the current behavior of the C API).

2) Add a memcached administration program that manages the server lists 
that reside in shared memory... say, mcadmin(8).  If a memcache client 
is using a shared list, someone should be able to execute, `mcadmin 
--domain myapp add new_memcache_host:11211` and instantly have all 
libmemcache users take advantage of the memcache instance.  A delete 
command should be available as well.  Hell, why not have a generic 
memcache(1) program that can be integrated with shell scripts (`mclient 
get key`, or `some_cmd | mclient set foo`).

3) A "clients" command.  It prints out a list of the client IP 
addresses.  This would primarily be used by the mcadmin(8) program, 
which, when run on any client that has a server list, would get the 
list of servers for a domain, connect to each server and issue the 
"clients", and record the list of consumers.  From this list, it should 
be possible to have mcadmin(8) run around to the various servers (some 
kerberized service, etc.) and run the appropriate mcadmin command.

A better way to do #3 would be to have the server return something like 
SERVER_UPDATE right before an END command.  Here is an example:

get foo\r\n
bar\r\n
SERVER_UPDATE <domain> <version>\r\n
END\r\n
SERVERS <domain> <version>\r\n
SERVER mc1.example.com:11211\r\n
SERVER mc2.example.com:11211\r\n
SERVER mc4.example.com:11211\r\n
END

Or, in the event of a cache miss:

get non-exist\r\n
SERVER_UPDATE <domain> <version>\r\n
...
END\r\n

Where <version> represents the version number for its server list 
stored in shared memory (date derived stored in a u_int64_t... 
something like 200410260000000, which would allow for 10 million 
updates in a day).  If the version number given by the server is newer 
than the version number the client already has loaded, it reads the 
server list from the server, updates the shared map, and proceeds with 
its queries.  All clients connected to a memcached server would receive 
this command, but only one on a given host should update the shared 
map.  The only problem with this is that the memcached server would 
send every server in the list.  Not a huge issue, but, still an issue.  
It's not like a routing where one can justify the overhead of adding 
incremental changes support.

4) A SERVER command that way a client can propagate changes that it 
learns about.  For example:

server <domain> <version>\r\n
delete mc6.example.com:11211\r\n
add mc5.example.com:11211\r\n
END\r\n

Then, if the version is newer than version stored on the server, it 
adds the listed servers to its server map and announces the changes to 
its clients.  Having a client such as mcadmin(8) query a memcached 
server, get a list of servers, then issue updates to all servers is 
more appealing to me than having servers aware of their neighbors.  It 
sure is tempting to have the memcached cluster aware of its other 
servers and propagating changes that way, but that seems like too big 
of a logistics headache to me (seems like the same headache that 
routing software is plagued with).  Having a single mcadmin(8) program 
connect to all servers seems like a better way to go.  Simple is best.

I know I can get this information from sockstat(1)/netstat(1), but 
sockstat(1)/netstat(1) doesn't exist everywhere and I just assume 
integrate this simple functionality into the base that way it'll 
propagate very quickly.

5) A server <domain> list\r\n command.  When a client first connects to 
the memcached server, it *should* (doesn't have to) issue this command 
to get an updated server list.  With long running connections, this 
overhead seems negligible to me and easily justified.

Yeah, I know these aren't small changes and would probably require a 
major version bump, but, I think it'd be worth it.  :)

> As for your other comments, I'll look into them.

Thanks.  If you have any questions, please let me know.  I'm going to 
knock out an mclient(1) program as a start, then go about adding the 
above functionality unless I hear some kind of overwhelming objection.  
Right now I have to have each client maintain its own server list and 
now that I've got libmemcache embedded in PostgreSQL, postfix, dbmail, 
and a few other places, maintaining, distributing, and notifying long 
running processes of those changes is a *huge* pain in the ass.  Having 
it built into the protocol/system would be exceedingly convenient for 
developers and admins who want to bring machines up and down with 
little notice.  -sc

-- 
Sean Chittenden