[memcached] sgrimm, r508: Merge multithreaded branch into trunk.

commits at code.sixapart.com commits at code.sixapart.com
Mon Apr 16 15:32:47 UTC 2007


Merge multithreaded branch into trunk.


A   trunk/server/doc/threads.txt
A   trunk/server/stats.c
A   trunk/server/stats.h
A   trunk/server/t/stats-detail.t
A   trunk/server/thread.c


Added: trunk/server/doc/threads.txt
===================================================================
--- trunk/server/doc/threads.txt	2007-04-16 15:18:35 UTC (rev 507)
+++ trunk/server/doc/threads.txt	2007-04-16 15:32:45 UTC (rev 508)
@@ -0,0 +1,68 @@
+Multithreading support in memcached
+
+OVERVIEW
+
+By default, memcached is compiled as a single-threaded application. This is
+the most CPU-efficient mode of operation, and it is appropriate for memcached
+instances running on single-processor servers or whose request volume is
+low enough that available CPU power is not a bottleneck.
+
+More heavily-used memcached instances can benefit from multithreaded mode.
+To enable it, use the "--enable-threads" option to the configure script:
+
+./configure --enable-threads
+
+You must have the POSIX thread functions (pthread_*) on your system in order
+to use memcached's multithreaded mode.
+
+Once you have a thread-capable memcached executable, you can control the
+number of threads using the "-t" option; the default is 4. On a machine
+that's dedicated to memcached, you will typically want one thread per
+processor core. Due to memcached's nonblocking architecture, there is no
+real advantage to using more threads than the number of CPUs on the machine;
+doing so will increase lock contention and is likely to degrade performance.
+
+
+INTERNALS
+
+The threading support is mostly implemented as a series of wrapper functions
+that protect calls to underlying code with one of a small number of locks.
+In single-threaded mode, the wrappers are replaced with direct invocations
+of the target code using #define; that is done in memcached.h. This approach
+allows memcached to be compiled in either single- or multi-threaded mode.
+
+Each thread has its own instance of libevent ("base" in libevent terminology).
+The only direct interaction between threads is for new connections. One of
+the threads handles the TCP listen socket; each new connection is passed to
+a different thread on a round-robin basis. After that, each thread operates
+on its set of connections as if it were running in single-threaded mode,
+using libevent to manage nonblocking I/O as usual.
+
+UDP requests are a bit different, since there is only one UDP socket that's
+shared by all clients. The UDP socket is monitored by all of the threads.
+When a datagram comes in, all the threads that aren't already processing
+another request will receive "socket readable" callbacks from libevent.
+Only one thread will successfully read the request; the others will go back
+to sleep or, in the case of a very busy server, will read whatever other
+UDP requests are waiting in the socket buffer. Note that in the case of
+moderately busy servers, this results in increased CPU consumption since
+threads will constantly wake up and find no input waiting for them. But
+short of much more major surgery on the I/O code, this is not easy to avoid.
+
+
+TO DO
+
+The locking is currently very coarse-grained.  There is, for example, one
+lock that protects all the calls to the hashtable-related functions. Since
+memcached spends much of its CPU time on command parsing and response
+assembly, rather than managing the hashtable per se, this is not a huge
+bottleneck for small numbers of processors. However, the locking will likely
+have to be refined in the event that memcached needs to run well on
+massively-parallel machines.
+
+One cheap optimization to reduce contention on that lock: move the hash value
+computation so it occurs before the lock is obtained whenever possible.
+Right now the hash is performed at the lowest levels of the functions in
+assoc.c. If instead it was computed in memcached.c, then passed along with
+the key and length into the items.c code and down into assoc.c, that would
+reduce the amount of time each thread needs to keep the hashtable lock held.

Added: trunk/server/stats.c
===================================================================
--- trunk/server/stats.c	2007-04-16 15:18:35 UTC (rev 507)
+++ trunk/server/stats.c	2007-04-16 15:32:45 UTC (rev 508)
@@ -0,0 +1,359 @@
+/* -*- Mode: C; tab-width: 4; c-basic-offset: 4; indent-tabs-mode: nil -*- */
+/*
+ * Detailed statistics management. For simple stats like total number of
+ * "get" requests, we use inline code in memcached.c and friends, but when
+ * stats detail mode is activated, the code here records more information.
+ *
+ * Author:
+ *   Steven Grimm <sgrimm at facebook.com>
+ *
+ * $Id$
+ */
+#include "memcached.h"
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+/*
+ * Stats are tracked on the basis of key prefixes. This is a simple
+ * fixed-size hash of prefixes; we run the prefixes through the same
+ * CRC function used by the cache hashtable.
+ */
+typedef struct _prefix_stats PREFIX_STATS;
+struct _prefix_stats {
+    char               *prefix;
+    int                prefix_len;
+    unsigned long long num_gets;
+    unsigned long long num_sets;
+    unsigned long long num_deletes;
+    unsigned long long num_hits;
+    PREFIX_STATS       *next;
+};
+
+#define PREFIX_HASH_SIZE 256
+static PREFIX_STATS *prefix_stats[PREFIX_HASH_SIZE];
+static int num_prefixes = 0;
+static int total_prefix_size = 0;
+
+void stats_prefix_init() {
+    memset(prefix_stats, 0, sizeof(prefix_stats));
+}
+
+/*
+ * Cleans up all our previously collected stats. NOTE: the stats lock is
+ * assumed to be held when this is called.
+ */
+void stats_prefix_clear() {
+    int i;
+    PREFIX_STATS *cur, *next;
+
+    for (i = 0; i < PREFIX_HASH_SIZE; i++) {
+        for (cur = prefix_stats[i]; cur != NULL; cur = next) {
+            next = cur->next;
+            free(cur->prefix);
+            free(cur);
+        }
+        prefix_stats[i] = NULL;
+    }
+    num_prefixes = 0;
+    total_prefix_size = 0;
+}
+
+/*
+ * Returns the stats structure for a prefix, creating it if it's not already
+ * in the list.
+ */
+static PREFIX_STATS *stats_prefix_find(char *key) {
+    PREFIX_STATS *pfs;
+    int hashval;
+    int length;
+
+    for (length = 0; key[length] != '\0'; length++)
+        if (key[length] == settings.prefix_delimiter)
+            break;
+
+    hashval = hash(key, length, 0) % PREFIX_HASH_SIZE;
+
+    for (pfs = prefix_stats[hashval]; NULL != pfs; pfs = pfs->next) {
+        if (! strncmp(pfs->prefix, key, length))
+            return pfs;
+    }
+
+    pfs = calloc(sizeof(PREFIX_STATS), 1);
+    if (NULL == pfs) {
+        perror("Can't allocate space for stats structure: calloc");
+        return NULL;
+    }
+
+    pfs->prefix = malloc(length + 1);
+    if (NULL == pfs->prefix) {
+        perror("Can't allocate space for copy of prefix: malloc");
+        free(pfs);
+        return NULL;
+    }
+
+    strncpy(pfs->prefix, key, length);
+    pfs->prefix[length] = '\0';      // because strncpy() sucks
+    pfs->prefix_len = length;
+
+    pfs->next = prefix_stats[hashval];
+    prefix_stats[hashval] = pfs;
+
+    num_prefixes++;
+    total_prefix_size += length;
+
+    return pfs;
+}
+
+/*
+ * Records a "get" of a key.
+ */
+void stats_prefix_record_get(char *key, int is_hit) {
+    PREFIX_STATS *pfs;
+
+    STATS_LOCK();
+    pfs = stats_prefix_find(key);
+    if (NULL != pfs) {
+        pfs->num_gets++;
+        if (is_hit) {
+            pfs->num_hits++;
+        }
+    }
+    STATS_UNLOCK();
+}
+
+/*
+ * Records a "delete" of a key.
+ */
+void stats_prefix_record_delete(char *key) {
+    PREFIX_STATS *pfs;
+
+    STATS_LOCK();
+    pfs = stats_prefix_find(key);
+    if (NULL != pfs) {
+        pfs->num_deletes++;
+    }
+    STATS_UNLOCK();
+}
+
+/*
+ * Records a "set" of a key.
+ */
+void stats_prefix_record_set(char *key) {
+    PREFIX_STATS *pfs;
+
+    STATS_LOCK();
+    pfs = stats_prefix_find(key);
+    if (NULL != pfs) {
+        pfs->num_sets++;
+    }
+    STATS_UNLOCK();
+}
+
+/*
+ * Returns stats in textual form suitable for writing to a client.
+ */
+char *stats_prefix_dump(int *length) {
+    char *format = "PREFIX %s get %llu hit %llu set %llu del %llu\r\n";
+    PREFIX_STATS *pfs;
+    char *buf;
+    int i, pos;
+    int size;
+
+    /*
+     * Figure out how big the buffer needs to be. This is the sum of the
+     * lengths of the prefixes themselves, plus the size of one copy of
+     * the per-prefix output with 20-digit values for all the counts,
+     * plus space for the "END" at the end.
+     */
+    STATS_LOCK();
+    size = strlen(format) + total_prefix_size +
+           num_prefixes * (strlen(format) - 2 /* %s */
+                           + 4 * (20 - 4)) /* %llu replaced by 20-digit num */
+                           + sizeof("END\r\n");
+    buf = malloc(size);
+    if (NULL == buf) {
+        perror("Can't allocate stats response: malloc");
+        STATS_UNLOCK();
+        return NULL;
+    }
+
+    pos = 0;
+    for (i = 0; i < PREFIX_HASH_SIZE; i++) {
+        for (pfs = prefix_stats[i]; NULL != pfs; pfs = pfs->next) {
+            pos += sprintf(buf + pos, format,
+                           pfs->prefix, pfs->num_gets, pfs->num_hits,
+                           pfs->num_sets, pfs->num_deletes);
+        }
+    }
+
+    STATS_UNLOCK();
+    strcpy(buf + pos, "END\r\n");
+
+    *length = pos + 5;
+    return buf;
+}
+
+
+#ifdef UNIT_TEST
+
+/****************************************************************************
+      To run unit tests, compile with $(CC) -DUNIT_TEST stats.c assoc.o
+      (need assoc.o to get the hash() function).
+****************************************************************************/
+
+struct settings settings;
+
+static char *current_test = "";
+static int test_count = 0;
+static int fail_count = 0;
+
+static void fail(char *what) { printf("\tFAIL: %s\n", what); fflush(stdout); fail_count++; }
+static void test_equals_int(char *what, int a, int b) { test_count++; if (a != b) fail(what); }
+static void test_equals_ptr(char *what, void *a, void *b) { test_count++; if (a != b) fail(what); }
+static void test_equals_str(char *what, const char *a, const char *b) { test_count++; if (strcmp(a, b)) fail(what); }
+static void test_equals_ull(char *what, unsigned long long a, unsigned long long b) { test_count++; if (a != b) fail(what); }
+static void test_notequals_ptr(char *what, void *a, void *b) { test_count++; if (a == b) fail(what); }
+static void test_notnull_ptr(char *what, void *a) { test_count++; if (NULL == a) fail(what); }
+
+static void test_prefix_find() {
+    PREFIX_STATS *pfs1, *pfs2;
+
+    pfs1 = stats_prefix_find("abc");
+    test_notnull_ptr("initial prefix find", pfs1);
+    test_equals_ull("request counts", 0ULL,
+        pfs1->num_gets + pfs1->num_sets + pfs1->num_deletes + pfs1->num_hits);
+    pfs2 = stats_prefix_find("abc");
+    test_equals_ptr("find of same prefix", pfs1, pfs2);
+    pfs2 = stats_prefix_find("abc:");
+    test_equals_ptr("find of same prefix, ignoring delimiter", pfs1, pfs2);
+    pfs2 = stats_prefix_find("abc:d");
+    test_equals_ptr("find of same prefix, ignoring extra chars", pfs1, pfs2);
+    pfs2 = stats_prefix_find("xyz123");
+    test_notequals_ptr("find of different prefix", pfs1, pfs2);
+    pfs2 = stats_prefix_find("ab:");
+    test_notequals_ptr("find of shorter prefix", pfs1, pfs2);
+}
+
+static void test_prefix_record_get() {
+    PREFIX_STATS *pfs;
+
+    stats_prefix_record_get("abc:123", 0);
+    pfs = stats_prefix_find("abc:123");
+    test_equals_ull("get count after get #1", 1, pfs->num_gets);
+    test_equals_ull("hit count after get #1", 0, pfs->num_hits);
+    stats_prefix_record_get("abc:456", 0);
+    test_equals_ull("get count after get #2", 2, pfs->num_gets);
+    test_equals_ull("hit count after get #2", 0, pfs->num_hits);
+    stats_prefix_record_get("abc:456", 1);
+    test_equals_ull("get count after get #3", 3, pfs->num_gets);
+    test_equals_ull("hit count after get #3", 1, pfs->num_hits);
+    stats_prefix_record_get("def:", 1);
+    test_equals_ull("get count after get #4", 3, pfs->num_gets);
+    test_equals_ull("hit count after get #4", 1, pfs->num_hits);
+}
+
+static void test_prefix_record_delete() {
+    PREFIX_STATS *pfs;
+
+    stats_prefix_record_delete("abc:123");
+    pfs = stats_prefix_find("abc:123");
+    test_equals_ull("get count after delete #1", 0, pfs->num_gets);
+    test_equals_ull("hit count after delete #1", 0, pfs->num_hits);
+    test_equals_ull("delete count after delete #1", 1, pfs->num_deletes);
+    test_equals_ull("set count after delete #1", 0, pfs->num_sets);
+    stats_prefix_record_delete("def:");
+    test_equals_ull("delete count after delete #2", 1, pfs->num_deletes);
+}
+
+static void test_prefix_record_set() {
+    PREFIX_STATS *pfs;
+
+    stats_prefix_record_set("abc:123");
+    pfs = stats_prefix_find("abc:123");
+    test_equals_ull("get count after set #1", 0, pfs->num_gets);
+    test_equals_ull("hit count after set #1", 0, pfs->num_hits);
+    test_equals_ull("delete count after set #1", 0, pfs->num_deletes);
+    test_equals_ull("set count after set #1", 1, pfs->num_sets);
+    stats_prefix_record_delete("def:");
+    test_equals_ull("set count after set #2", 1, pfs->num_sets);
+}
+
+static void test_prefix_dump() {
+    int hashval = hash("abc", 3, 0) % PREFIX_HASH_SIZE;
+    char tmp[500];
+    char *expected;
+    int keynum;
+    int length;
+
+    test_equals_str("empty stats", "END\r\n", stats_prefix_dump(&length));
+    test_equals_int("empty stats length", 5, length);
+    stats_prefix_record_set("abc:123");
+    expected = "PREFIX abc get 0 hit 0 set 1 del 0\r\nEND\r\n";
+    test_equals_str("stats after set", expected, stats_prefix_dump(&length));
+    test_equals_int("stats length after set", strlen(expected), length);
+    stats_prefix_record_get("abc:123", 0);
+    expected = "PREFIX abc get 1 hit 0 set 1 del 0\r\nEND\r\n";
+    test_equals_str("stats after get #1", expected, stats_prefix_dump(&length));
+    test_equals_int("stats length after get #1", strlen(expected), length);
+    stats_prefix_record_get("abc:123", 1);
+    expected = "PREFIX abc get 2 hit 1 set 1 del 0\r\nEND\r\n";
+    test_equals_str("stats after get #2", expected, stats_prefix_dump(&length));
+    test_equals_int("stats length after get #2", strlen(expected), length);
+    stats_prefix_record_delete("abc:123");
+    expected = "PREFIX abc get 2 hit 1 set 1 del 1\r\nEND\r\n";
+    test_equals_str("stats after del #1", expected, stats_prefix_dump(&length));
+    test_equals_int("stats length after del #1", strlen(expected), length);
+
+    /* The order of results might change if we switch hash functions. */
+    stats_prefix_record_delete("def:123");
+    expected = "PREFIX abc get 2 hit 1 set 1 del 1\r\n"
+               "PREFIX def get 0 hit 0 set 0 del 1\r\n"
+               "END\r\n";
+    test_equals_str("stats after del #2", expected, stats_prefix_dump(&length));
+    test_equals_int("stats length after del #2", strlen(expected), length);
+
+    /* Find a key that hashes to the same bucket as "abc" */
+    for (keynum = 0; keynum < PREFIX_HASH_SIZE * 100; keynum++) {
+        sprintf(tmp, "%d", keynum);
+        if (hashval == hash(tmp, strlen(tmp), 0) % PREFIX_HASH_SIZE) {
+            break;
+        }
+    }
+    stats_prefix_record_set(tmp);
+    sprintf(tmp, "PREFIX %d get 0 hit 0 set 1 del 0\r\n"
+                 "PREFIX abc get 2 hit 1 set 1 del 1\r\n"
+                 "PREFIX def get 0 hit 0 set 0 del 1\r\n"
+                 "END\r\n", keynum);
+    test_equals_str("stats with two stats in one bucket",
+                    tmp, stats_prefix_dump(&length));
+    test_equals_int("stats length with two stats in one bucket",
+                    strlen(tmp), length);
+}
+
+static void run_test(char *what, void (*func)(void)) {
+    current_test = what;
+    test_count = fail_count = 0;
+    puts(what);
+    fflush(stdout);
+
+    stats_prefix_clear();
+    (func)();
+    printf("\t%d / %d pass\n", (test_count - fail_count), test_count);
+}
+
+/* In case we're compiled in thread mode */
+void mt_stats_lock() { }
+void mt_stats_unlock() { }
+
+main(int argc, char **argv) {
+    stats_prefix_init();
+    settings.prefix_delimiter = ':';
+    run_test("stats_prefix_find", test_prefix_find);
+    run_test("stats_prefix_record_get", test_prefix_record_get);
+    run_test("stats_prefix_record_delete", test_prefix_record_delete);
+    run_test("stats_prefix_record_set", test_prefix_record_set);
+    run_test("stats_prefix_dump", test_prefix_dump);
+}
+
+#endif

Added: trunk/server/stats.h
===================================================================
--- trunk/server/stats.h	2007-04-16 15:18:35 UTC (rev 507)
+++ trunk/server/stats.h	2007-04-16 15:32:45 UTC (rev 508)
@@ -0,0 +1,7 @@
+/* stats */
+void stats_prefix_init(void);
+void stats_prefix_clear(void);
+void stats_prefix_record_get(char *key, int is_hit);
+void stats_prefix_record_delete(char *key);
+void stats_prefix_record_set(char *key);
+char *stats_prefix_dump(int *length);

Added: trunk/server/t/stats-detail.t
===================================================================
--- trunk/server/t/stats-detail.t	2007-04-16 15:18:35 UTC (rev 507)
+++ trunk/server/t/stats-detail.t	2007-04-16 15:32:45 UTC (rev 508)
@@ -0,0 +1,63 @@
+#!/usr/bin/perl
+
+use strict;
+use Test::More tests => 24;
+use FindBin qw($Bin);
+use lib "$Bin/lib";
+use MemcachedTest;
+
+my $server = new_memcached();
+my $sock = $server->sock;
+my $expire;
+
+print $sock "stats detail dump\r\n";
+is(scalar <$sock>, "END\r\n", "verified empty stats at start");
+
+print $sock "stats detail on\r\n";
+is(scalar <$sock>, "OK\r\n", "detail collection turned on");
+
+print $sock "set foo:123 0 0 6\r\nfooval\r\n";
+is(scalar <$sock>, "STORED\r\n", "stored foo");
+
+print $sock "stats detail dump\r\n";
+is(scalar <$sock>, "PREFIX foo get 0 hit 0 set 1 del 0\r\n", "details after set");
+is(scalar <$sock>, "END\r\n", "end of details");
+
+mem_get_is($sock, "foo:123", "fooval");
+print $sock "stats detail dump\r\n";
+is(scalar <$sock>, "PREFIX foo get 1 hit 1 set 1 del 0\r\n", "details after get with hit");
+is(scalar <$sock>, "END\r\n", "end of details");
+
+mem_get_is($sock, "foo:124", undef);
+
+print $sock "stats detail dump\r\n";
+is(scalar <$sock>, "PREFIX foo get 2 hit 1 set 1 del 0\r\n", "details after get without hit");
+is(scalar <$sock>, "END\r\n", "end of details");
+
+print $sock "delete foo:125 0\r\n";
+is(scalar <$sock>, "NOT_FOUND\r\n", "sent delete command");
+
+print $sock "stats detail dump\r\n";
+is(scalar <$sock>, "PREFIX foo get 2 hit 1 set 1 del 1\r\n", "details after delete");
+is(scalar <$sock>, "END\r\n", "end of details");
+
+print $sock "stats reset\r\n";
+is(scalar <$sock>, "RESET\r\n", "stats cleared");
+
+print $sock "stats detail dump\r\n";
+is(scalar <$sock>, "END\r\n", "empty stats after clear");
+
+mem_get_is($sock, "foo:123", "fooval");
+print $sock "stats detail dump\r\n";
+is(scalar <$sock>, "PREFIX foo get 1 hit 1 set 0 del 0\r\n", "details after clear and get");
+is(scalar <$sock>, "END\r\n", "end of details");
+
+print $sock "stats detail off\r\n";
+is(scalar <$sock>, "OK\r\n", "detail collection turned off");
+
+mem_get_is($sock, "foo:124", undef);
+
+mem_get_is($sock, "foo:123", "fooval");
+print $sock "stats detail dump\r\n";
+is(scalar <$sock>, "PREFIX foo get 1 hit 1 set 0 del 0\r\n", "details after stats turned off");
+is(scalar <$sock>, "END\r\n", "end of details");

Added: trunk/server/thread.c
===================================================================
--- trunk/server/thread.c	2007-04-16 15:18:35 UTC (rev 507)
+++ trunk/server/thread.c	2007-04-16 15:32:45 UTC (rev 508)
@@ -0,0 +1,614 @@
+/* -*- Mode: C; tab-width: 4; c-basic-offset: 4; indent-tabs-mode: nil -*- */
+/*
+ * Thread management for memcached.
+ *
+ *  $Id$
+ */
+#include "memcached.h"
+#include <stdio.h>
+#include <errno.h>
+#include <stdlib.h>
+#include <errno.h>
+
+#ifdef HAVE_MALLOC_H
+#include <malloc.h>
+#endif
+
+#ifdef USE_THREADS
+
+#include <pthread.h>
+
+#define ITEMS_PER_ALLOC 64
+
+/* An item in the connection queue. */
+typedef struct conn_queue_item CQ_ITEM;
+struct conn_queue_item {
+    int     sfd;
+    int     init_state;
+    int     event_flags;
+    int     read_buffer_size;
+    int     is_udp;
+    CQ_ITEM *next;
+};
+
+/* A connection queue. */
+typedef struct conn_queue CQ;
+struct conn_queue {
+    CQ_ITEM *head;
+    CQ_ITEM *tail;
+    pthread_mutex_t lock;
+    pthread_cond_t  cond;
+};
+
+/* Lock for connection freelist */
+static pthread_mutex_t conn_lock;
+
+/* Lock for cache operations (item_*, assoc_*) */
+static pthread_mutex_t cache_lock;
+
+/* Lock for slab allocator operations */
+static pthread_mutex_t slabs_lock;
+
+/* Lock for global stats */
+static pthread_mutex_t stats_lock;
+
+/* Free list of CQ_ITEM structs */
+static CQ_ITEM *cqi_freelist;
+static pthread_mutex_t cqi_freelist_lock;
+
+/*
+ * Each libevent instance has a wakeup pipe, which other threads
+ * can use to signal that they've put a new connection on its queue.
+ */
+typedef struct {
+    pthread_t thread_id;        /* unique ID of this thread */
+    struct event_base *base;    /* libevent handle this thread uses */
+    struct event notify_event;  /* listen event for notify pipe */
+    int notify_receive_fd;      /* receiving end of notify pipe */
+    int notify_send_fd;         /* sending end of notify pipe */
+    CQ  new_conn_queue;         /* queue of new connections to handle */
+} LIBEVENT_THREAD;
+
+static LIBEVENT_THREAD *threads;
+
+/*
+ * Number of threads that have finished setting themselves up.
+ */
+static int init_count = 0;
+static pthread_mutex_t init_lock;
+static pthread_cond_t init_cond;
+
+
+static void thread_libevent_process(int fd, short which, void *arg);
+
+/*
+ * Initializes a connection queue.
+ */
+static void cq_init(CQ *cq) {
+    pthread_mutex_init(&cq->lock, NULL);
+    pthread_cond_init(&cq->cond, NULL);
+    cq->head = NULL;
+    cq->tail = NULL;
+}
+
+/*
+ * Waits for work on a connection queue.
+ */
+static CQ_ITEM *cq_pop(CQ *cq) {
+    CQ_ITEM *item;
+
+    pthread_mutex_lock(&cq->lock);
+    while (NULL == cq->head)
+        pthread_cond_wait(&cq->cond, &cq->lock);
+    item = cq->head;
+    cq->head = item->next;
+    if (NULL == cq->head)
+        cq->tail = NULL;
+    pthread_mutex_unlock(&cq->lock);
+
+    return item;
+}
+
+/*
+ * Looks for an item on a connection queue, but doesn't block if there isn't
+ * one.
+ */
+static CQ_ITEM *cq_peek(CQ *cq) {
+    CQ_ITEM *item;
+
+    pthread_mutex_lock(&cq->lock);
+    item = cq->head;
+    if (NULL != item) {
+        cq->head = item->next;
+        if (NULL == cq->head)
+            cq->tail = NULL;
+    }
+    pthread_mutex_unlock(&cq->lock);
+
+    return item;
+}
+
+/*
+ * Adds an item to a connection queue.
+ */
+static void cq_push(CQ *cq, CQ_ITEM *item) {
+    item->next = NULL;
+
+    pthread_mutex_lock(&cq->lock);
+    if (NULL == cq->tail)
+        cq->head = item;
+    else
+        cq->tail->next = item;
+    cq->tail = item;
+    pthread_cond_signal(&cq->cond);
+    pthread_mutex_unlock(&cq->lock);
+}
+
+/*
+ * Returns a fresh connection queue item.
+ */
+static CQ_ITEM *cqi_new() {
+    CQ_ITEM *item = NULL;
+    pthread_mutex_lock(&cqi_freelist_lock);
+    if (cqi_freelist) {
+        item = cqi_freelist;
+        cqi_freelist = item->next;
+    }
+    pthread_mutex_unlock(&cqi_freelist_lock);
+
+    if (NULL == item) {
+        int i;
+
+        /* Allocate a bunch of items at once to reduce fragmentation */
+        item = malloc(sizeof(CQ_ITEM) * ITEMS_PER_ALLOC);
+        if (NULL == item)
+            return NULL;
+
+        /*
+         * Link together all the new items except the first one
+         * (which we'll return to the caller) for placement on
+         * the freelist.
+         */
+        for (i = 2; i < ITEMS_PER_ALLOC; i++)
+            item[i - 1].next = &item[i];
+
+        pthread_mutex_lock(&cqi_freelist_lock);
+        item[ITEMS_PER_ALLOC - 1].next = cqi_freelist;
+        cqi_freelist = &item[1];
+        pthread_mutex_unlock(&cqi_freelist_lock);
+    }
+
+    return item;
+}
+
+
+/*
+ * Frees a connection queue item (adds it to the freelist.)
+ */
+static void cqi_free(CQ_ITEM *item) {
+    pthread_mutex_lock(&cqi_freelist_lock);
+    item->next = cqi_freelist;
+    cqi_freelist = item;
+    pthread_mutex_unlock(&cqi_freelist_lock);
+}
+
+
+/*
+ * Creates a worker thread.
+ */
+static void create_worker(void *(*func)(void *), void *arg) {
+    pthread_t       thread;
+    pthread_attr_t  attr;
+    int             ret;
+
+    pthread_attr_init(&attr);
+
+    if (ret = pthread_create(&thread, &attr, func, arg)) {
+        fprintf(stderr, "Can't create thread: %s\n",
+                strerror(ret));
+        exit(1);
+    }
+}
+
+
+/*
+ * Pulls a conn structure from the freelist, if one is available.
+ */
+conn *mt_conn_from_freelist() {
+    conn *c;
+
+    pthread_mutex_lock(&conn_lock);
+    c = do_conn_from_freelist();
+    pthread_mutex_unlock(&conn_lock);
+
+    return c;
+}
+
+
+/*
+ * Adds a conn structure to the freelist.
+ *
+ * Returns 0 on success, 1 if the structure couldn't be added.
+ */
+int mt_conn_add_to_freelist(conn *c) {
+    int result;
+
+    pthread_mutex_lock(&conn_lock);
+    result = do_conn_add_to_freelist(c);
+    pthread_mutex_unlock(&conn_lock);
+
+    return result;
+}
+
+/****************************** LIBEVENT THREADS *****************************/
+
+/*
+ * Set up a thread's information.
+ */
+static void setup_thread(LIBEVENT_THREAD *me) {
+    if (! me->base) {
+        me->base = event_init();
+        if (! me->base) {
+            fprintf(stderr, "Can't allocate event base\n");
+            exit(1);
+        }
+    }
+
+    /* Listen for notifications from other threads */
+    event_set(&me->notify_event, me->notify_receive_fd,
+              EV_READ | EV_PERSIST, thread_libevent_process, me);
+    event_base_set(me->base, &me->notify_event);
+
+    if (event_add(&me->notify_event, 0) == -1) {
+        fprintf(stderr, "Can't monitor libevent notify pipe\n");
+        exit(1);
+    }
+
+    cq_init(&me->new_conn_queue);
+}
+
+
+/*
+ * Worker thread: main event loop
+ */
+static void *worker_libevent(void *arg) {
+    LIBEVENT_THREAD *me = arg;
+
+    /* Any per-thread setup can happen here; thread_init() will block until
+     * all threads have finished initializing.
+     */
+
+    pthread_mutex_lock(&init_lock);
+    init_count++;
+    pthread_cond_signal(&init_cond);
+    pthread_mutex_unlock(&init_lock);
+
+    event_base_loop(me->base, 0);
+}
+
+
+/*
+ * Processes an incoming "handle a new connection" item. This is called when
+ * input arrives on the libevent wakeup pipe.
+ */
+static void thread_libevent_process(int fd, short which, void *arg) {
+    LIBEVENT_THREAD *me = arg;
+    CQ_ITEM *item;
+    char buf[1];
+
+    if (read(fd, buf, 1) != 1)
+        if (settings.verbose > 0)
+            fprintf(stderr, "Can't read from libevent pipe\n");
+
+    if (item = cq_peek(&me->new_conn_queue)) {
+    conn *c = conn_new(item->sfd, item->init_state, item->event_flags,
+                       item->read_buffer_size, item->is_udp, me->base);
+    if (!c) {
+        if (item->is_udp) {
+            fprintf(stderr, "Can't listen for events on UDP socket\n");
+            exit(1);
+            }
+        else {
+        if (settings.verbose > 0) {
+                fprintf(stderr, "Can't listen for events on fd %d\n",
+                    item->sfd);
+        }
+        close(item->sfd);
+        }
+    }
+        cqi_free(item);
+    }
+}
+
+/* Which thread we assigned a connection to most recently. */
+static int last_thread = -1;
+
+/*
+ * Dispatches a new connection to another thread. This is only ever called
+ * from the main thread, either during initialization (for UDP) or because
+ * of an incoming connection.
+ */
+void dispatch_conn_new(int sfd, int init_state, int event_flags,
+                       int read_buffer_size, int is_udp) {
+    CQ_ITEM *item = cqi_new();
+    int thread = (last_thread + 1) % settings.num_threads;
+
+    last_thread = thread;
+
+    item->sfd = sfd;
+    item->init_state = init_state;
+    item->event_flags = event_flags;
+    item->read_buffer_size = read_buffer_size;
+    item->is_udp = is_udp;
+
+    cq_push(&threads[thread].new_conn_queue, item);
+    if (write(threads[thread].notify_send_fd, "", 1) != 1) {
+        perror("Writing to thread notify pipe");
+    }
+}
+
+/*
+ * Returns true if this is the thread that listens for new TCP connections.
+ */
+int mt_is_listen_thread() {
+    return pthread_self() == threads[0].thread_id;
+}
+
+/********************************* ITEM ACCESS *******************************/
+
+/*
+ * Walks through the list of deletes that have been deferred because the items
+ * were locked down at the tmie.
+ */
+void mt_run_deferred_deletes() {
+    pthread_mutex_lock(&cache_lock);
+    do_run_deferred_deletes();
+    pthread_mutex_unlock(&cache_lock);
+}
+
+/*
+ * Allocates a new item.
+ */
+item *mt_item_alloc(char *key, size_t nkey, int flags, rel_time_t exptime, int nbytes) {
+    item *it;
+    pthread_mutex_lock(&cache_lock);
+    it = do_item_alloc(key, nkey, flags, exptime, nbytes);
+    pthread_mutex_unlock(&cache_lock);
+    return it;
+}
+
+/*
+ * Returns an item if it hasn't been marked as expired or deleted,
+ * lazy-expiring as needed.
+ */
+item *mt_item_get_notedeleted(char *key, size_t nkey, int *delete_locked) {
+    item *it;
+    pthread_mutex_lock(&cache_lock);
+    it = do_item_get_notedeleted(key, nkey, delete_locked);
+    pthread_mutex_unlock(&cache_lock);
+    return it;
+}
+
+/*
+ * Returns an item whether or not it's been marked as expired or deleted.
+ */
+item *mt_item_get_nocheck(char *key, size_t nkey) {
+    item *it;
+
+    pthread_mutex_lock(&cache_lock);
+    it = assoc_find(key, nkey);
+    it->refcount++;
+    pthread_mutex_unlock(&cache_lock);
+    return it;
+}
+
+/*
+ * Links an item into the LRU and hashtable.
+ */
+int mt_item_link(item *item) {
+    int ret;
+
+    pthread_mutex_lock(&cache_lock);
+    ret = do_item_link(item);
+    pthread_mutex_unlock(&cache_lock);
+    return ret;
+}
+
+/*
+ * Decrements the reference count on an item and adds it to the freelist if
+ * needed.
+ */
+void mt_item_remove(item *item) {
+    pthread_mutex_lock(&cache_lock);
+    do_item_remove(item);
+    pthread_mutex_unlock(&cache_lock);
+}
+
+/*
+ * Replaces one item with another in the hashtable.
+ */
+int mt_item_replace(item *old, item *new) {
+    int ret;
+
+    pthread_mutex_lock(&cache_lock);
+    ret = do_item_replace(old, new);
+    pthread_mutex_unlock(&cache_lock);
+    return ret;
+}
+
+/*
+ * Unlinks an item from the LRU and hashtable.
+ */
+void mt_item_unlink(item *item) {
+    pthread_mutex_lock(&cache_lock);
+    do_item_unlink(item);
+    pthread_mutex_unlock(&cache_lock);
+}
+
+/*
+ * Moves an item to the back of the LRU queue.
+ */
+void mt_item_update(item *item) {
+    pthread_mutex_lock(&cache_lock);
+    do_item_update(item);
+    pthread_mutex_unlock(&cache_lock);
+}
+
+/*
+ * Adds an item to the deferred-delete list so it can be reaped later.
+ */
+char *mt_defer_delete(item *item, time_t exptime) {
+    char *ret;
+
+    pthread_mutex_lock(&cache_lock);
+    ret = do_defer_delete(item, exptime);
+    pthread_mutex_unlock(&cache_lock);
+    return ret;
+}
+
+/*
+ * Does arithmetic on a numeric item value.
+ */
+char *mt_add_delta(item *item, int incr, unsigned int delta, char *buf) {
+    char *ret;
+
+    pthread_mutex_lock(&cache_lock);
+    ret = do_add_delta(item, incr, delta, buf);
+    pthread_mutex_unlock(&cache_lock);
+    return ret;
+}
+
+/*
+ * Stores an item in the cache (high level, obeys set/add/replace semantics)
+ */
+int mt_store_item(item *item, int comm) {
+    int ret;
+
+    pthread_mutex_lock(&cache_lock);
+    ret = do_store_item(item, comm);
+    pthread_mutex_unlock(&cache_lock);
+    return ret;
+}
+
+/*
+ * Flushes expired items after a flush_all call
+ */
+void mt_item_flush_expired() {
+    pthread_mutex_lock(&cache_lock);
+    do_item_flush_expired();
+    pthread_mutex_unlock(&cache_lock);
+}
+
+/****************************** HASHTABLE MODULE *****************************/
+
+void mt_assoc_move_next_bucket() {
+    pthread_mutex_lock(&cache_lock);
+    do_assoc_move_next_bucket();
+    pthread_mutex_unlock(&cache_lock);
+}
+
+/******************************* SLAB ALLOCATOR ******************************/
+
+void *mt_slabs_alloc(size_t size) {
+    void *ret;
+
+    pthread_mutex_lock(&slabs_lock);
+    ret = do_slabs_alloc(size);
+    pthread_mutex_unlock(&slabs_lock);
+    return ret;
+}
+
+void mt_slabs_free(void *ptr, size_t size) {
+    pthread_mutex_lock(&slabs_lock);
+    do_slabs_free(ptr, size);
+    pthread_mutex_unlock(&slabs_lock);
+}
+
+char *mt_slabs_stats(int *buflen) {
+    char *ret;
+
+    pthread_mutex_lock(&slabs_lock);
+    ret = do_slabs_stats(buflen);
+    pthread_mutex_unlock(&slabs_lock);
+    return ret;
+}
+
+#ifdef ALLOW_SLABS_REASSIGN
+int mt_slabs_reassign(unsigned char srcid, unsigned char dstid) {
+    int ret;
+
+    pthread_mutex_lock(&slabs_lock);
+    ret = do_slabs_reassign(srcid, dstid);
+    pthread_mutex_unlock(&slabs_lock);
+    return ret;
+}
+#endif
+
+/******************************* GLOBAL STATS ******************************/
+
+void mt_stats_lock() {
+    pthread_mutex_lock(&stats_lock);
+}
+
+void mt_stats_unlock() {
+    pthread_mutex_unlock(&stats_lock);
+}
+
+/*
+ * Initializes the thread subsystem, creating various worker threads.
+ *
+ * nthreads  Number of event handler threads to spawn
+ * main_base Event base for main thread
+ */
+void thread_init(int nthreads, struct event_base *main_base) {
+    int         i;
+    pthread_t   *thread;
+
+    pthread_mutex_init(&cache_lock, NULL);
+    pthread_mutex_init(&conn_lock, NULL);
+    pthread_mutex_init(&slabs_lock, NULL);
+    pthread_mutex_init(&stats_lock, NULL);
+
+    pthread_mutex_init(&init_lock, NULL);
+    pthread_cond_init(&init_cond, NULL);
+
+    pthread_mutex_init(&cqi_freelist_lock, NULL);
+    cqi_freelist = NULL;
+
+    threads = malloc(sizeof(LIBEVENT_THREAD) * nthreads);
+    if (! threads) {
+        perror("Can't allocate thread descriptors");
+        exit(1);
+    }
+
+    threads[0].base = main_base;
+    threads[0].thread_id = pthread_self();
+
+    for (i = 0; i < nthreads; i++) {
+        int fds[2];
+        if (pipe(fds)) {
+            perror("Can't create notify pipe");
+            exit(1);
+        }
+
+        threads[i].notify_receive_fd = fds[0];
+        threads[i].notify_send_fd = fds[1];
+
+    setup_thread(&threads[i]);
+    }
+
+    /* Create threads after we've done all the libevent setup. */
+    for (i = 1; i < nthreads; i++) {
+        create_worker(worker_libevent, &threads[i]);
+    }
+
+    /* Wait for all the threads to set themselves up before returning. */
+    pthread_mutex_lock(&init_lock);
+    init_count++; // main thread
+    while (init_count < nthreads) {
+        pthread_cond_wait(&init_cond, &init_lock);
+    }
+    pthread_mutex_unlock(&init_lock);
+}
+
+#endif




More information about the memcached-commits mailing list