[memcached] sgrimm, r508: Merge multithreaded branch into trunk.
commits at code.sixapart.com
commits at code.sixapart.com
Mon Apr 16 15:32:47 UTC 2007
Merge multithreaded branch into trunk.
A trunk/server/doc/threads.txt
A trunk/server/stats.c
A trunk/server/stats.h
A trunk/server/t/stats-detail.t
A trunk/server/thread.c
Added: trunk/server/doc/threads.txt
===================================================================
--- trunk/server/doc/threads.txt 2007-04-16 15:18:35 UTC (rev 507)
+++ trunk/server/doc/threads.txt 2007-04-16 15:32:45 UTC (rev 508)
@@ -0,0 +1,68 @@
+Multithreading support in memcached
+
+OVERVIEW
+
+By default, memcached is compiled as a single-threaded application. This is
+the most CPU-efficient mode of operation, and it is appropriate for memcached
+instances running on single-processor servers or whose request volume is
+low enough that available CPU power is not a bottleneck.
+
+More heavily-used memcached instances can benefit from multithreaded mode.
+To enable it, use the "--enable-threads" option to the configure script:
+
+./configure --enable-threads
+
+You must have the POSIX thread functions (pthread_*) on your system in order
+to use memcached's multithreaded mode.
+
+Once you have a thread-capable memcached executable, you can control the
+number of threads using the "-t" option; the default is 4. On a machine
+that's dedicated to memcached, you will typically want one thread per
+processor core. Due to memcached's nonblocking architecture, there is no
+real advantage to using more threads than the number of CPUs on the machine;
+doing so will increase lock contention and is likely to degrade performance.
+
+
+INTERNALS
+
+The threading support is mostly implemented as a series of wrapper functions
+that protect calls to underlying code with one of a small number of locks.
+In single-threaded mode, the wrappers are replaced with direct invocations
+of the target code using #define; that is done in memcached.h. This approach
+allows memcached to be compiled in either single- or multi-threaded mode.
+
+Each thread has its own instance of libevent ("base" in libevent terminology).
+The only direct interaction between threads is for new connections. One of
+the threads handles the TCP listen socket; each new connection is passed to
+a different thread on a round-robin basis. After that, each thread operates
+on its set of connections as if it were running in single-threaded mode,
+using libevent to manage nonblocking I/O as usual.
+
+UDP requests are a bit different, since there is only one UDP socket that's
+shared by all clients. The UDP socket is monitored by all of the threads.
+When a datagram comes in, all the threads that aren't already processing
+another request will receive "socket readable" callbacks from libevent.
+Only one thread will successfully read the request; the others will go back
+to sleep or, in the case of a very busy server, will read whatever other
+UDP requests are waiting in the socket buffer. Note that in the case of
+moderately busy servers, this results in increased CPU consumption since
+threads will constantly wake up and find no input waiting for them. But
+short of much more major surgery on the I/O code, this is not easy to avoid.
+
+
+TO DO
+
+The locking is currently very coarse-grained. There is, for example, one
+lock that protects all the calls to the hashtable-related functions. Since
+memcached spends much of its CPU time on command parsing and response
+assembly, rather than managing the hashtable per se, this is not a huge
+bottleneck for small numbers of processors. However, the locking will likely
+have to be refined in the event that memcached needs to run well on
+massively-parallel machines.
+
+One cheap optimization to reduce contention on that lock: move the hash value
+computation so it occurs before the lock is obtained whenever possible.
+Right now the hash is performed at the lowest levels of the functions in
+assoc.c. If instead it was computed in memcached.c, then passed along with
+the key and length into the items.c code and down into assoc.c, that would
+reduce the amount of time each thread needs to keep the hashtable lock held.
Added: trunk/server/stats.c
===================================================================
--- trunk/server/stats.c 2007-04-16 15:18:35 UTC (rev 507)
+++ trunk/server/stats.c 2007-04-16 15:32:45 UTC (rev 508)
@@ -0,0 +1,359 @@
+/* -*- Mode: C; tab-width: 4; c-basic-offset: 4; indent-tabs-mode: nil -*- */
+/*
+ * Detailed statistics management. For simple stats like total number of
+ * "get" requests, we use inline code in memcached.c and friends, but when
+ * stats detail mode is activated, the code here records more information.
+ *
+ * Author:
+ * Steven Grimm <sgrimm at facebook.com>
+ *
+ * $Id$
+ */
+#include "memcached.h"
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+/*
+ * Stats are tracked on the basis of key prefixes. This is a simple
+ * fixed-size hash of prefixes; we run the prefixes through the same
+ * CRC function used by the cache hashtable.
+ */
+typedef struct _prefix_stats PREFIX_STATS;
+struct _prefix_stats {
+ char *prefix;
+ int prefix_len;
+ unsigned long long num_gets;
+ unsigned long long num_sets;
+ unsigned long long num_deletes;
+ unsigned long long num_hits;
+ PREFIX_STATS *next;
+};
+
+#define PREFIX_HASH_SIZE 256
+static PREFIX_STATS *prefix_stats[PREFIX_HASH_SIZE];
+static int num_prefixes = 0;
+static int total_prefix_size = 0;
+
+void stats_prefix_init() {
+ memset(prefix_stats, 0, sizeof(prefix_stats));
+}
+
+/*
+ * Cleans up all our previously collected stats. NOTE: the stats lock is
+ * assumed to be held when this is called.
+ */
+void stats_prefix_clear() {
+ int i;
+ PREFIX_STATS *cur, *next;
+
+ for (i = 0; i < PREFIX_HASH_SIZE; i++) {
+ for (cur = prefix_stats[i]; cur != NULL; cur = next) {
+ next = cur->next;
+ free(cur->prefix);
+ free(cur);
+ }
+ prefix_stats[i] = NULL;
+ }
+ num_prefixes = 0;
+ total_prefix_size = 0;
+}
+
+/*
+ * Returns the stats structure for a prefix, creating it if it's not already
+ * in the list.
+ */
+static PREFIX_STATS *stats_prefix_find(char *key) {
+ PREFIX_STATS *pfs;
+ int hashval;
+ int length;
+
+ for (length = 0; key[length] != '\0'; length++)
+ if (key[length] == settings.prefix_delimiter)
+ break;
+
+ hashval = hash(key, length, 0) % PREFIX_HASH_SIZE;
+
+ for (pfs = prefix_stats[hashval]; NULL != pfs; pfs = pfs->next) {
+ if (! strncmp(pfs->prefix, key, length))
+ return pfs;
+ }
+
+ pfs = calloc(sizeof(PREFIX_STATS), 1);
+ if (NULL == pfs) {
+ perror("Can't allocate space for stats structure: calloc");
+ return NULL;
+ }
+
+ pfs->prefix = malloc(length + 1);
+ if (NULL == pfs->prefix) {
+ perror("Can't allocate space for copy of prefix: malloc");
+ free(pfs);
+ return NULL;
+ }
+
+ strncpy(pfs->prefix, key, length);
+ pfs->prefix[length] = '\0'; // because strncpy() sucks
+ pfs->prefix_len = length;
+
+ pfs->next = prefix_stats[hashval];
+ prefix_stats[hashval] = pfs;
+
+ num_prefixes++;
+ total_prefix_size += length;
+
+ return pfs;
+}
+
+/*
+ * Records a "get" of a key.
+ */
+void stats_prefix_record_get(char *key, int is_hit) {
+ PREFIX_STATS *pfs;
+
+ STATS_LOCK();
+ pfs = stats_prefix_find(key);
+ if (NULL != pfs) {
+ pfs->num_gets++;
+ if (is_hit) {
+ pfs->num_hits++;
+ }
+ }
+ STATS_UNLOCK();
+}
+
+/*
+ * Records a "delete" of a key.
+ */
+void stats_prefix_record_delete(char *key) {
+ PREFIX_STATS *pfs;
+
+ STATS_LOCK();
+ pfs = stats_prefix_find(key);
+ if (NULL != pfs) {
+ pfs->num_deletes++;
+ }
+ STATS_UNLOCK();
+}
+
+/*
+ * Records a "set" of a key.
+ */
+void stats_prefix_record_set(char *key) {
+ PREFIX_STATS *pfs;
+
+ STATS_LOCK();
+ pfs = stats_prefix_find(key);
+ if (NULL != pfs) {
+ pfs->num_sets++;
+ }
+ STATS_UNLOCK();
+}
+
+/*
+ * Returns stats in textual form suitable for writing to a client.
+ */
+char *stats_prefix_dump(int *length) {
+ char *format = "PREFIX %s get %llu hit %llu set %llu del %llu\r\n";
+ PREFIX_STATS *pfs;
+ char *buf;
+ int i, pos;
+ int size;
+
+ /*
+ * Figure out how big the buffer needs to be. This is the sum of the
+ * lengths of the prefixes themselves, plus the size of one copy of
+ * the per-prefix output with 20-digit values for all the counts,
+ * plus space for the "END" at the end.
+ */
+ STATS_LOCK();
+ size = strlen(format) + total_prefix_size +
+ num_prefixes * (strlen(format) - 2 /* %s */
+ + 4 * (20 - 4)) /* %llu replaced by 20-digit num */
+ + sizeof("END\r\n");
+ buf = malloc(size);
+ if (NULL == buf) {
+ perror("Can't allocate stats response: malloc");
+ STATS_UNLOCK();
+ return NULL;
+ }
+
+ pos = 0;
+ for (i = 0; i < PREFIX_HASH_SIZE; i++) {
+ for (pfs = prefix_stats[i]; NULL != pfs; pfs = pfs->next) {
+ pos += sprintf(buf + pos, format,
+ pfs->prefix, pfs->num_gets, pfs->num_hits,
+ pfs->num_sets, pfs->num_deletes);
+ }
+ }
+
+ STATS_UNLOCK();
+ strcpy(buf + pos, "END\r\n");
+
+ *length = pos + 5;
+ return buf;
+}
+
+
+#ifdef UNIT_TEST
+
+/****************************************************************************
+ To run unit tests, compile with $(CC) -DUNIT_TEST stats.c assoc.o
+ (need assoc.o to get the hash() function).
+****************************************************************************/
+
+struct settings settings;
+
+static char *current_test = "";
+static int test_count = 0;
+static int fail_count = 0;
+
+static void fail(char *what) { printf("\tFAIL: %s\n", what); fflush(stdout); fail_count++; }
+static void test_equals_int(char *what, int a, int b) { test_count++; if (a != b) fail(what); }
+static void test_equals_ptr(char *what, void *a, void *b) { test_count++; if (a != b) fail(what); }
+static void test_equals_str(char *what, const char *a, const char *b) { test_count++; if (strcmp(a, b)) fail(what); }
+static void test_equals_ull(char *what, unsigned long long a, unsigned long long b) { test_count++; if (a != b) fail(what); }
+static void test_notequals_ptr(char *what, void *a, void *b) { test_count++; if (a == b) fail(what); }
+static void test_notnull_ptr(char *what, void *a) { test_count++; if (NULL == a) fail(what); }
+
+static void test_prefix_find() {
+ PREFIX_STATS *pfs1, *pfs2;
+
+ pfs1 = stats_prefix_find("abc");
+ test_notnull_ptr("initial prefix find", pfs1);
+ test_equals_ull("request counts", 0ULL,
+ pfs1->num_gets + pfs1->num_sets + pfs1->num_deletes + pfs1->num_hits);
+ pfs2 = stats_prefix_find("abc");
+ test_equals_ptr("find of same prefix", pfs1, pfs2);
+ pfs2 = stats_prefix_find("abc:");
+ test_equals_ptr("find of same prefix, ignoring delimiter", pfs1, pfs2);
+ pfs2 = stats_prefix_find("abc:d");
+ test_equals_ptr("find of same prefix, ignoring extra chars", pfs1, pfs2);
+ pfs2 = stats_prefix_find("xyz123");
+ test_notequals_ptr("find of different prefix", pfs1, pfs2);
+ pfs2 = stats_prefix_find("ab:");
+ test_notequals_ptr("find of shorter prefix", pfs1, pfs2);
+}
+
+static void test_prefix_record_get() {
+ PREFIX_STATS *pfs;
+
+ stats_prefix_record_get("abc:123", 0);
+ pfs = stats_prefix_find("abc:123");
+ test_equals_ull("get count after get #1", 1, pfs->num_gets);
+ test_equals_ull("hit count after get #1", 0, pfs->num_hits);
+ stats_prefix_record_get("abc:456", 0);
+ test_equals_ull("get count after get #2", 2, pfs->num_gets);
+ test_equals_ull("hit count after get #2", 0, pfs->num_hits);
+ stats_prefix_record_get("abc:456", 1);
+ test_equals_ull("get count after get #3", 3, pfs->num_gets);
+ test_equals_ull("hit count after get #3", 1, pfs->num_hits);
+ stats_prefix_record_get("def:", 1);
+ test_equals_ull("get count after get #4", 3, pfs->num_gets);
+ test_equals_ull("hit count after get #4", 1, pfs->num_hits);
+}
+
+static void test_prefix_record_delete() {
+ PREFIX_STATS *pfs;
+
+ stats_prefix_record_delete("abc:123");
+ pfs = stats_prefix_find("abc:123");
+ test_equals_ull("get count after delete #1", 0, pfs->num_gets);
+ test_equals_ull("hit count after delete #1", 0, pfs->num_hits);
+ test_equals_ull("delete count after delete #1", 1, pfs->num_deletes);
+ test_equals_ull("set count after delete #1", 0, pfs->num_sets);
+ stats_prefix_record_delete("def:");
+ test_equals_ull("delete count after delete #2", 1, pfs->num_deletes);
+}
+
+static void test_prefix_record_set() {
+ PREFIX_STATS *pfs;
+
+ stats_prefix_record_set("abc:123");
+ pfs = stats_prefix_find("abc:123");
+ test_equals_ull("get count after set #1", 0, pfs->num_gets);
+ test_equals_ull("hit count after set #1", 0, pfs->num_hits);
+ test_equals_ull("delete count after set #1", 0, pfs->num_deletes);
+ test_equals_ull("set count after set #1", 1, pfs->num_sets);
+ stats_prefix_record_delete("def:");
+ test_equals_ull("set count after set #2", 1, pfs->num_sets);
+}
+
+static void test_prefix_dump() {
+ int hashval = hash("abc", 3, 0) % PREFIX_HASH_SIZE;
+ char tmp[500];
+ char *expected;
+ int keynum;
+ int length;
+
+ test_equals_str("empty stats", "END\r\n", stats_prefix_dump(&length));
+ test_equals_int("empty stats length", 5, length);
+ stats_prefix_record_set("abc:123");
+ expected = "PREFIX abc get 0 hit 0 set 1 del 0\r\nEND\r\n";
+ test_equals_str("stats after set", expected, stats_prefix_dump(&length));
+ test_equals_int("stats length after set", strlen(expected), length);
+ stats_prefix_record_get("abc:123", 0);
+ expected = "PREFIX abc get 1 hit 0 set 1 del 0\r\nEND\r\n";
+ test_equals_str("stats after get #1", expected, stats_prefix_dump(&length));
+ test_equals_int("stats length after get #1", strlen(expected), length);
+ stats_prefix_record_get("abc:123", 1);
+ expected = "PREFIX abc get 2 hit 1 set 1 del 0\r\nEND\r\n";
+ test_equals_str("stats after get #2", expected, stats_prefix_dump(&length));
+ test_equals_int("stats length after get #2", strlen(expected), length);
+ stats_prefix_record_delete("abc:123");
+ expected = "PREFIX abc get 2 hit 1 set 1 del 1\r\nEND\r\n";
+ test_equals_str("stats after del #1", expected, stats_prefix_dump(&length));
+ test_equals_int("stats length after del #1", strlen(expected), length);
+
+ /* The order of results might change if we switch hash functions. */
+ stats_prefix_record_delete("def:123");
+ expected = "PREFIX abc get 2 hit 1 set 1 del 1\r\n"
+ "PREFIX def get 0 hit 0 set 0 del 1\r\n"
+ "END\r\n";
+ test_equals_str("stats after del #2", expected, stats_prefix_dump(&length));
+ test_equals_int("stats length after del #2", strlen(expected), length);
+
+ /* Find a key that hashes to the same bucket as "abc" */
+ for (keynum = 0; keynum < PREFIX_HASH_SIZE * 100; keynum++) {
+ sprintf(tmp, "%d", keynum);
+ if (hashval == hash(tmp, strlen(tmp), 0) % PREFIX_HASH_SIZE) {
+ break;
+ }
+ }
+ stats_prefix_record_set(tmp);
+ sprintf(tmp, "PREFIX %d get 0 hit 0 set 1 del 0\r\n"
+ "PREFIX abc get 2 hit 1 set 1 del 1\r\n"
+ "PREFIX def get 0 hit 0 set 0 del 1\r\n"
+ "END\r\n", keynum);
+ test_equals_str("stats with two stats in one bucket",
+ tmp, stats_prefix_dump(&length));
+ test_equals_int("stats length with two stats in one bucket",
+ strlen(tmp), length);
+}
+
+static void run_test(char *what, void (*func)(void)) {
+ current_test = what;
+ test_count = fail_count = 0;
+ puts(what);
+ fflush(stdout);
+
+ stats_prefix_clear();
+ (func)();
+ printf("\t%d / %d pass\n", (test_count - fail_count), test_count);
+}
+
+/* In case we're compiled in thread mode */
+void mt_stats_lock() { }
+void mt_stats_unlock() { }
+
+main(int argc, char **argv) {
+ stats_prefix_init();
+ settings.prefix_delimiter = ':';
+ run_test("stats_prefix_find", test_prefix_find);
+ run_test("stats_prefix_record_get", test_prefix_record_get);
+ run_test("stats_prefix_record_delete", test_prefix_record_delete);
+ run_test("stats_prefix_record_set", test_prefix_record_set);
+ run_test("stats_prefix_dump", test_prefix_dump);
+}
+
+#endif
Added: trunk/server/stats.h
===================================================================
--- trunk/server/stats.h 2007-04-16 15:18:35 UTC (rev 507)
+++ trunk/server/stats.h 2007-04-16 15:32:45 UTC (rev 508)
@@ -0,0 +1,7 @@
+/* stats */
+void stats_prefix_init(void);
+void stats_prefix_clear(void);
+void stats_prefix_record_get(char *key, int is_hit);
+void stats_prefix_record_delete(char *key);
+void stats_prefix_record_set(char *key);
+char *stats_prefix_dump(int *length);
Added: trunk/server/t/stats-detail.t
===================================================================
--- trunk/server/t/stats-detail.t 2007-04-16 15:18:35 UTC (rev 507)
+++ trunk/server/t/stats-detail.t 2007-04-16 15:32:45 UTC (rev 508)
@@ -0,0 +1,63 @@
+#!/usr/bin/perl
+
+use strict;
+use Test::More tests => 24;
+use FindBin qw($Bin);
+use lib "$Bin/lib";
+use MemcachedTest;
+
+my $server = new_memcached();
+my $sock = $server->sock;
+my $expire;
+
+print $sock "stats detail dump\r\n";
+is(scalar <$sock>, "END\r\n", "verified empty stats at start");
+
+print $sock "stats detail on\r\n";
+is(scalar <$sock>, "OK\r\n", "detail collection turned on");
+
+print $sock "set foo:123 0 0 6\r\nfooval\r\n";
+is(scalar <$sock>, "STORED\r\n", "stored foo");
+
+print $sock "stats detail dump\r\n";
+is(scalar <$sock>, "PREFIX foo get 0 hit 0 set 1 del 0\r\n", "details after set");
+is(scalar <$sock>, "END\r\n", "end of details");
+
+mem_get_is($sock, "foo:123", "fooval");
+print $sock "stats detail dump\r\n";
+is(scalar <$sock>, "PREFIX foo get 1 hit 1 set 1 del 0\r\n", "details after get with hit");
+is(scalar <$sock>, "END\r\n", "end of details");
+
+mem_get_is($sock, "foo:124", undef);
+
+print $sock "stats detail dump\r\n";
+is(scalar <$sock>, "PREFIX foo get 2 hit 1 set 1 del 0\r\n", "details after get without hit");
+is(scalar <$sock>, "END\r\n", "end of details");
+
+print $sock "delete foo:125 0\r\n";
+is(scalar <$sock>, "NOT_FOUND\r\n", "sent delete command");
+
+print $sock "stats detail dump\r\n";
+is(scalar <$sock>, "PREFIX foo get 2 hit 1 set 1 del 1\r\n", "details after delete");
+is(scalar <$sock>, "END\r\n", "end of details");
+
+print $sock "stats reset\r\n";
+is(scalar <$sock>, "RESET\r\n", "stats cleared");
+
+print $sock "stats detail dump\r\n";
+is(scalar <$sock>, "END\r\n", "empty stats after clear");
+
+mem_get_is($sock, "foo:123", "fooval");
+print $sock "stats detail dump\r\n";
+is(scalar <$sock>, "PREFIX foo get 1 hit 1 set 0 del 0\r\n", "details after clear and get");
+is(scalar <$sock>, "END\r\n", "end of details");
+
+print $sock "stats detail off\r\n";
+is(scalar <$sock>, "OK\r\n", "detail collection turned off");
+
+mem_get_is($sock, "foo:124", undef);
+
+mem_get_is($sock, "foo:123", "fooval");
+print $sock "stats detail dump\r\n";
+is(scalar <$sock>, "PREFIX foo get 1 hit 1 set 0 del 0\r\n", "details after stats turned off");
+is(scalar <$sock>, "END\r\n", "end of details");
Added: trunk/server/thread.c
===================================================================
--- trunk/server/thread.c 2007-04-16 15:18:35 UTC (rev 507)
+++ trunk/server/thread.c 2007-04-16 15:32:45 UTC (rev 508)
@@ -0,0 +1,614 @@
+/* -*- Mode: C; tab-width: 4; c-basic-offset: 4; indent-tabs-mode: nil -*- */
+/*
+ * Thread management for memcached.
+ *
+ * $Id$
+ */
+#include "memcached.h"
+#include <stdio.h>
+#include <errno.h>
+#include <stdlib.h>
+#include <errno.h>
+
+#ifdef HAVE_MALLOC_H
+#include <malloc.h>
+#endif
+
+#ifdef USE_THREADS
+
+#include <pthread.h>
+
+#define ITEMS_PER_ALLOC 64
+
+/* An item in the connection queue. */
+typedef struct conn_queue_item CQ_ITEM;
+struct conn_queue_item {
+ int sfd;
+ int init_state;
+ int event_flags;
+ int read_buffer_size;
+ int is_udp;
+ CQ_ITEM *next;
+};
+
+/* A connection queue. */
+typedef struct conn_queue CQ;
+struct conn_queue {
+ CQ_ITEM *head;
+ CQ_ITEM *tail;
+ pthread_mutex_t lock;
+ pthread_cond_t cond;
+};
+
+/* Lock for connection freelist */
+static pthread_mutex_t conn_lock;
+
+/* Lock for cache operations (item_*, assoc_*) */
+static pthread_mutex_t cache_lock;
+
+/* Lock for slab allocator operations */
+static pthread_mutex_t slabs_lock;
+
+/* Lock for global stats */
+static pthread_mutex_t stats_lock;
+
+/* Free list of CQ_ITEM structs */
+static CQ_ITEM *cqi_freelist;
+static pthread_mutex_t cqi_freelist_lock;
+
+/*
+ * Each libevent instance has a wakeup pipe, which other threads
+ * can use to signal that they've put a new connection on its queue.
+ */
+typedef struct {
+ pthread_t thread_id; /* unique ID of this thread */
+ struct event_base *base; /* libevent handle this thread uses */
+ struct event notify_event; /* listen event for notify pipe */
+ int notify_receive_fd; /* receiving end of notify pipe */
+ int notify_send_fd; /* sending end of notify pipe */
+ CQ new_conn_queue; /* queue of new connections to handle */
+} LIBEVENT_THREAD;
+
+static LIBEVENT_THREAD *threads;
+
+/*
+ * Number of threads that have finished setting themselves up.
+ */
+static int init_count = 0;
+static pthread_mutex_t init_lock;
+static pthread_cond_t init_cond;
+
+
+static void thread_libevent_process(int fd, short which, void *arg);
+
+/*
+ * Initializes a connection queue.
+ */
+static void cq_init(CQ *cq) {
+ pthread_mutex_init(&cq->lock, NULL);
+ pthread_cond_init(&cq->cond, NULL);
+ cq->head = NULL;
+ cq->tail = NULL;
+}
+
+/*
+ * Waits for work on a connection queue.
+ */
+static CQ_ITEM *cq_pop(CQ *cq) {
+ CQ_ITEM *item;
+
+ pthread_mutex_lock(&cq->lock);
+ while (NULL == cq->head)
+ pthread_cond_wait(&cq->cond, &cq->lock);
+ item = cq->head;
+ cq->head = item->next;
+ if (NULL == cq->head)
+ cq->tail = NULL;
+ pthread_mutex_unlock(&cq->lock);
+
+ return item;
+}
+
+/*
+ * Looks for an item on a connection queue, but doesn't block if there isn't
+ * one.
+ */
+static CQ_ITEM *cq_peek(CQ *cq) {
+ CQ_ITEM *item;
+
+ pthread_mutex_lock(&cq->lock);
+ item = cq->head;
+ if (NULL != item) {
+ cq->head = item->next;
+ if (NULL == cq->head)
+ cq->tail = NULL;
+ }
+ pthread_mutex_unlock(&cq->lock);
+
+ return item;
+}
+
+/*
+ * Adds an item to a connection queue.
+ */
+static void cq_push(CQ *cq, CQ_ITEM *item) {
+ item->next = NULL;
+
+ pthread_mutex_lock(&cq->lock);
+ if (NULL == cq->tail)
+ cq->head = item;
+ else
+ cq->tail->next = item;
+ cq->tail = item;
+ pthread_cond_signal(&cq->cond);
+ pthread_mutex_unlock(&cq->lock);
+}
+
+/*
+ * Returns a fresh connection queue item.
+ */
+static CQ_ITEM *cqi_new() {
+ CQ_ITEM *item = NULL;
+ pthread_mutex_lock(&cqi_freelist_lock);
+ if (cqi_freelist) {
+ item = cqi_freelist;
+ cqi_freelist = item->next;
+ }
+ pthread_mutex_unlock(&cqi_freelist_lock);
+
+ if (NULL == item) {
+ int i;
+
+ /* Allocate a bunch of items at once to reduce fragmentation */
+ item = malloc(sizeof(CQ_ITEM) * ITEMS_PER_ALLOC);
+ if (NULL == item)
+ return NULL;
+
+ /*
+ * Link together all the new items except the first one
+ * (which we'll return to the caller) for placement on
+ * the freelist.
+ */
+ for (i = 2; i < ITEMS_PER_ALLOC; i++)
+ item[i - 1].next = &item[i];
+
+ pthread_mutex_lock(&cqi_freelist_lock);
+ item[ITEMS_PER_ALLOC - 1].next = cqi_freelist;
+ cqi_freelist = &item[1];
+ pthread_mutex_unlock(&cqi_freelist_lock);
+ }
+
+ return item;
+}
+
+
+/*
+ * Frees a connection queue item (adds it to the freelist.)
+ */
+static void cqi_free(CQ_ITEM *item) {
+ pthread_mutex_lock(&cqi_freelist_lock);
+ item->next = cqi_freelist;
+ cqi_freelist = item;
+ pthread_mutex_unlock(&cqi_freelist_lock);
+}
+
+
+/*
+ * Creates a worker thread.
+ */
+static void create_worker(void *(*func)(void *), void *arg) {
+ pthread_t thread;
+ pthread_attr_t attr;
+ int ret;
+
+ pthread_attr_init(&attr);
+
+ if (ret = pthread_create(&thread, &attr, func, arg)) {
+ fprintf(stderr, "Can't create thread: %s\n",
+ strerror(ret));
+ exit(1);
+ }
+}
+
+
+/*
+ * Pulls a conn structure from the freelist, if one is available.
+ */
+conn *mt_conn_from_freelist() {
+ conn *c;
+
+ pthread_mutex_lock(&conn_lock);
+ c = do_conn_from_freelist();
+ pthread_mutex_unlock(&conn_lock);
+
+ return c;
+}
+
+
+/*
+ * Adds a conn structure to the freelist.
+ *
+ * Returns 0 on success, 1 if the structure couldn't be added.
+ */
+int mt_conn_add_to_freelist(conn *c) {
+ int result;
+
+ pthread_mutex_lock(&conn_lock);
+ result = do_conn_add_to_freelist(c);
+ pthread_mutex_unlock(&conn_lock);
+
+ return result;
+}
+
+/****************************** LIBEVENT THREADS *****************************/
+
+/*
+ * Set up a thread's information.
+ */
+static void setup_thread(LIBEVENT_THREAD *me) {
+ if (! me->base) {
+ me->base = event_init();
+ if (! me->base) {
+ fprintf(stderr, "Can't allocate event base\n");
+ exit(1);
+ }
+ }
+
+ /* Listen for notifications from other threads */
+ event_set(&me->notify_event, me->notify_receive_fd,
+ EV_READ | EV_PERSIST, thread_libevent_process, me);
+ event_base_set(me->base, &me->notify_event);
+
+ if (event_add(&me->notify_event, 0) == -1) {
+ fprintf(stderr, "Can't monitor libevent notify pipe\n");
+ exit(1);
+ }
+
+ cq_init(&me->new_conn_queue);
+}
+
+
+/*
+ * Worker thread: main event loop
+ */
+static void *worker_libevent(void *arg) {
+ LIBEVENT_THREAD *me = arg;
+
+ /* Any per-thread setup can happen here; thread_init() will block until
+ * all threads have finished initializing.
+ */
+
+ pthread_mutex_lock(&init_lock);
+ init_count++;
+ pthread_cond_signal(&init_cond);
+ pthread_mutex_unlock(&init_lock);
+
+ event_base_loop(me->base, 0);
+}
+
+
+/*
+ * Processes an incoming "handle a new connection" item. This is called when
+ * input arrives on the libevent wakeup pipe.
+ */
+static void thread_libevent_process(int fd, short which, void *arg) {
+ LIBEVENT_THREAD *me = arg;
+ CQ_ITEM *item;
+ char buf[1];
+
+ if (read(fd, buf, 1) != 1)
+ if (settings.verbose > 0)
+ fprintf(stderr, "Can't read from libevent pipe\n");
+
+ if (item = cq_peek(&me->new_conn_queue)) {
+ conn *c = conn_new(item->sfd, item->init_state, item->event_flags,
+ item->read_buffer_size, item->is_udp, me->base);
+ if (!c) {
+ if (item->is_udp) {
+ fprintf(stderr, "Can't listen for events on UDP socket\n");
+ exit(1);
+ }
+ else {
+ if (settings.verbose > 0) {
+ fprintf(stderr, "Can't listen for events on fd %d\n",
+ item->sfd);
+ }
+ close(item->sfd);
+ }
+ }
+ cqi_free(item);
+ }
+}
+
+/* Which thread we assigned a connection to most recently. */
+static int last_thread = -1;
+
+/*
+ * Dispatches a new connection to another thread. This is only ever called
+ * from the main thread, either during initialization (for UDP) or because
+ * of an incoming connection.
+ */
+void dispatch_conn_new(int sfd, int init_state, int event_flags,
+ int read_buffer_size, int is_udp) {
+ CQ_ITEM *item = cqi_new();
+ int thread = (last_thread + 1) % settings.num_threads;
+
+ last_thread = thread;
+
+ item->sfd = sfd;
+ item->init_state = init_state;
+ item->event_flags = event_flags;
+ item->read_buffer_size = read_buffer_size;
+ item->is_udp = is_udp;
+
+ cq_push(&threads[thread].new_conn_queue, item);
+ if (write(threads[thread].notify_send_fd, "", 1) != 1) {
+ perror("Writing to thread notify pipe");
+ }
+}
+
+/*
+ * Returns true if this is the thread that listens for new TCP connections.
+ */
+int mt_is_listen_thread() {
+ return pthread_self() == threads[0].thread_id;
+}
+
+/********************************* ITEM ACCESS *******************************/
+
+/*
+ * Walks through the list of deletes that have been deferred because the items
+ * were locked down at the tmie.
+ */
+void mt_run_deferred_deletes() {
+ pthread_mutex_lock(&cache_lock);
+ do_run_deferred_deletes();
+ pthread_mutex_unlock(&cache_lock);
+}
+
+/*
+ * Allocates a new item.
+ */
+item *mt_item_alloc(char *key, size_t nkey, int flags, rel_time_t exptime, int nbytes) {
+ item *it;
+ pthread_mutex_lock(&cache_lock);
+ it = do_item_alloc(key, nkey, flags, exptime, nbytes);
+ pthread_mutex_unlock(&cache_lock);
+ return it;
+}
+
+/*
+ * Returns an item if it hasn't been marked as expired or deleted,
+ * lazy-expiring as needed.
+ */
+item *mt_item_get_notedeleted(char *key, size_t nkey, int *delete_locked) {
+ item *it;
+ pthread_mutex_lock(&cache_lock);
+ it = do_item_get_notedeleted(key, nkey, delete_locked);
+ pthread_mutex_unlock(&cache_lock);
+ return it;
+}
+
+/*
+ * Returns an item whether or not it's been marked as expired or deleted.
+ */
+item *mt_item_get_nocheck(char *key, size_t nkey) {
+ item *it;
+
+ pthread_mutex_lock(&cache_lock);
+ it = assoc_find(key, nkey);
+ it->refcount++;
+ pthread_mutex_unlock(&cache_lock);
+ return it;
+}
+
+/*
+ * Links an item into the LRU and hashtable.
+ */
+int mt_item_link(item *item) {
+ int ret;
+
+ pthread_mutex_lock(&cache_lock);
+ ret = do_item_link(item);
+ pthread_mutex_unlock(&cache_lock);
+ return ret;
+}
+
+/*
+ * Decrements the reference count on an item and adds it to the freelist if
+ * needed.
+ */
+void mt_item_remove(item *item) {
+ pthread_mutex_lock(&cache_lock);
+ do_item_remove(item);
+ pthread_mutex_unlock(&cache_lock);
+}
+
+/*
+ * Replaces one item with another in the hashtable.
+ */
+int mt_item_replace(item *old, item *new) {
+ int ret;
+
+ pthread_mutex_lock(&cache_lock);
+ ret = do_item_replace(old, new);
+ pthread_mutex_unlock(&cache_lock);
+ return ret;
+}
+
+/*
+ * Unlinks an item from the LRU and hashtable.
+ */
+void mt_item_unlink(item *item) {
+ pthread_mutex_lock(&cache_lock);
+ do_item_unlink(item);
+ pthread_mutex_unlock(&cache_lock);
+}
+
+/*
+ * Moves an item to the back of the LRU queue.
+ */
+void mt_item_update(item *item) {
+ pthread_mutex_lock(&cache_lock);
+ do_item_update(item);
+ pthread_mutex_unlock(&cache_lock);
+}
+
+/*
+ * Adds an item to the deferred-delete list so it can be reaped later.
+ */
+char *mt_defer_delete(item *item, time_t exptime) {
+ char *ret;
+
+ pthread_mutex_lock(&cache_lock);
+ ret = do_defer_delete(item, exptime);
+ pthread_mutex_unlock(&cache_lock);
+ return ret;
+}
+
+/*
+ * Does arithmetic on a numeric item value.
+ */
+char *mt_add_delta(item *item, int incr, unsigned int delta, char *buf) {
+ char *ret;
+
+ pthread_mutex_lock(&cache_lock);
+ ret = do_add_delta(item, incr, delta, buf);
+ pthread_mutex_unlock(&cache_lock);
+ return ret;
+}
+
+/*
+ * Stores an item in the cache (high level, obeys set/add/replace semantics)
+ */
+int mt_store_item(item *item, int comm) {
+ int ret;
+
+ pthread_mutex_lock(&cache_lock);
+ ret = do_store_item(item, comm);
+ pthread_mutex_unlock(&cache_lock);
+ return ret;
+}
+
+/*
+ * Flushes expired items after a flush_all call
+ */
+void mt_item_flush_expired() {
+ pthread_mutex_lock(&cache_lock);
+ do_item_flush_expired();
+ pthread_mutex_unlock(&cache_lock);
+}
+
+/****************************** HASHTABLE MODULE *****************************/
+
+void mt_assoc_move_next_bucket() {
+ pthread_mutex_lock(&cache_lock);
+ do_assoc_move_next_bucket();
+ pthread_mutex_unlock(&cache_lock);
+}
+
+/******************************* SLAB ALLOCATOR ******************************/
+
+void *mt_slabs_alloc(size_t size) {
+ void *ret;
+
+ pthread_mutex_lock(&slabs_lock);
+ ret = do_slabs_alloc(size);
+ pthread_mutex_unlock(&slabs_lock);
+ return ret;
+}
+
+void mt_slabs_free(void *ptr, size_t size) {
+ pthread_mutex_lock(&slabs_lock);
+ do_slabs_free(ptr, size);
+ pthread_mutex_unlock(&slabs_lock);
+}
+
+char *mt_slabs_stats(int *buflen) {
+ char *ret;
+
+ pthread_mutex_lock(&slabs_lock);
+ ret = do_slabs_stats(buflen);
+ pthread_mutex_unlock(&slabs_lock);
+ return ret;
+}
+
+#ifdef ALLOW_SLABS_REASSIGN
+int mt_slabs_reassign(unsigned char srcid, unsigned char dstid) {
+ int ret;
+
+ pthread_mutex_lock(&slabs_lock);
+ ret = do_slabs_reassign(srcid, dstid);
+ pthread_mutex_unlock(&slabs_lock);
+ return ret;
+}
+#endif
+
+/******************************* GLOBAL STATS ******************************/
+
+void mt_stats_lock() {
+ pthread_mutex_lock(&stats_lock);
+}
+
+void mt_stats_unlock() {
+ pthread_mutex_unlock(&stats_lock);
+}
+
+/*
+ * Initializes the thread subsystem, creating various worker threads.
+ *
+ * nthreads Number of event handler threads to spawn
+ * main_base Event base for main thread
+ */
+void thread_init(int nthreads, struct event_base *main_base) {
+ int i;
+ pthread_t *thread;
+
+ pthread_mutex_init(&cache_lock, NULL);
+ pthread_mutex_init(&conn_lock, NULL);
+ pthread_mutex_init(&slabs_lock, NULL);
+ pthread_mutex_init(&stats_lock, NULL);
+
+ pthread_mutex_init(&init_lock, NULL);
+ pthread_cond_init(&init_cond, NULL);
+
+ pthread_mutex_init(&cqi_freelist_lock, NULL);
+ cqi_freelist = NULL;
+
+ threads = malloc(sizeof(LIBEVENT_THREAD) * nthreads);
+ if (! threads) {
+ perror("Can't allocate thread descriptors");
+ exit(1);
+ }
+
+ threads[0].base = main_base;
+ threads[0].thread_id = pthread_self();
+
+ for (i = 0; i < nthreads; i++) {
+ int fds[2];
+ if (pipe(fds)) {
+ perror("Can't create notify pipe");
+ exit(1);
+ }
+
+ threads[i].notify_receive_fd = fds[0];
+ threads[i].notify_send_fd = fds[1];
+
+ setup_thread(&threads[i]);
+ }
+
+ /* Create threads after we've done all the libevent setup. */
+ for (i = 1; i < nthreads; i++) {
+ create_worker(worker_libevent, &threads[i]);
+ }
+
+ /* Wait for all the threads to set themselves up before returning. */
+ pthread_mutex_lock(&init_lock);
+ init_count++; // main thread
+ while (init_count < nthreads) {
+ pthread_cond_wait(&init_cond, &init_lock);
+ }
+ pthread_mutex_unlock(&init_lock);
+}
+
+#endif
More information about the memcached-commits
mailing list