C API

Fri, 16 Jan 2004 10:55:21 -0800 (PST)

First off, there's a big confusion here about hosts, groups, domains, etc.

Let's get some terminology down.  Here's what I'd say:

host:  a memcached instance, running on an IP+port.

group:  a weighted group of hosts, probably weighted by the amount of
        memory on each host

domains:  a namespace id for keys on a host/group.  for instance, how
          Slashcode had to add a key prefix to allow multiple slashcode
          installs to not collide when hitting the same memcached servers.
          memcached doesn't support this yet, so you have to do a
          Slashcode-style hack for it to work.

You could make a client work on a host-level or group-level, but I'd argue
the group-level is what needs to be primarily exposed.  Certainly if we
make an XS version of the Perl module (which we plan to), the underlying C
module must support groups to get the performance we want.

Now, your original API was very host-focused.  This might be okay for
your application, but it's not for a general library.  But the thing
is, it won't take much work to make a library support either way:  just
make the handle object either represent a single machine or a group of
machines.

> > /* gets a handle for a group of servers.  could set the servers here,
> >    the domain, the time-out restrictions, etc.  like the Perl API.
> >    also, perhaps a function pointer to the memory allocator, if not
> >    malloc.  */
>
> This sounds fine, except that if you lose one server, you end up losing
> them all I would suspect (this is why I would really like to say that an
> object is in a particular pool and just purge only that pool on a
> failure).

Absolutely not.

You need to go read the Perl module source first.  memcached is a hash of
hashes.  A memcached instance (a host) is a big hashtable and nothing
else.  The first layer, though, the client (Cache::Memcached) is the first
layer of the hash, which maps a key to a specific host in a group.

> > /* gets a single item, using mc's allocator, or nothing. */
> > char* memcached_get(memcached_client* mc, const char *key);
>
> See this wouldn't work for me. I have buffers to fill in MySQL and don't
> want the library to allocate anything at all (no mallocs). Possibly have
> the return value be a ptr to the position in the buffer that
> memcahced_client. Could pass in a size_t that could be used to say how
> much data is being returned. Still this solution wouldn't tell the user
> how many times they would have to call memcached_get() to complete a
> fetch.

This is a bizarre enough fetch model that you should make it a separate
function, so the common case has a simple API.

Somehing like:

   memcached_get_partial_begin:  you give it starting address and max size_t,
                                 it tells you the total size_t, and a
                                 handle to get more with, if there's more.

   memcached_get_partial_continue:  you give it that handle, and another
                                    buffer and size_t, it gives you
                                    another handle.

   memcached_get_partial_end:  releases stuff, given a handle

> Separate fetch sounds better since you can keep calling it until it
> returns no data (and you can keep your buffers small).
>
> Is there any advantage to making a get call with multiple keys?

Hell yeah.... Latency!

If you're fetching 500 keys from a server, you can send all keys at once
and get all the replies in one round-trip, or you can do them all one at a
time and wait *at least* a half second (unacceptable).

The perl module supports get_multi to a group of hosts, and splits the
keys up and does parallel get_multis to each host.

> > The "connect" stuff is an internal detail, not part of the public API.
> > You don't make clients deal with that.  They just want to get/set and not
> > know from where.
>
> They need to know hosts and ports.

Yes, the hosts/ports/weights for servers are in the memcached_client
struct which each function takes.  See the Perl module.

> > Also, you'll want internal functions which return the non-blocking
> > socket fds to wait on in your select loop.  (memcached should never assume
> > a server is up or functioning quickly.  your select timeout is specified
> > in *memcached_client.... something like a half second or a second at
> > most.)
>
> So a non-blocking cursor for a fetch method?
>
>
> Is a boolean really good enough for error? If I go do an add I may want
> to know why the failure occurred (aka was the object already there...
> did I get a chunk error?).

So if it returns false, call one of:

    int get_error_code(memcached_client *mc);
    char* get_error_string(memcached_client *mc);
    char* get_error_string(int error_code);

(Like $dbh->err, $dbh->errstr, etc)

Don't make the state global to the library... put it in the mc handle
struct.  (which could represent a single host or a group)

- Brad