Extensible command syntax
Dustin Sallings
dustin at spy.net
Tue Nov 13 06:23:15 UTC 2007
On Nov 8, 2007, at 2:35, Tomash Brechko wrote:
>> I fail to see what I'm missing. As far as I can tell, you're
>> describing what I already do. See my write up on client optimization
>> and let me know what I'm missing.
>>
>> http://bleu.west.spy.net/~dustin/projects/memcached/
>> optimization.html
>
> According to this page, when several threads issue [a], [b], [a, b,
> c], [a], [d], you combine this requests into the one, [a, b, c, d].
> Alright, suppose you got the reply, [c, d]. How do you know what to
> reply to each thread? You have to compare keys in client side to know
> what results you have got. But if the reply was [nil, nil, c, d],
> you'll have to compare only numbers: first goes to t1, t3, t4, second
> goes to t2. t3, etc.
In the binary protocol, you do only use numbers (keys aren't
returned). In the text protocol, you do string compares on the
results. In both cases I use a hash table. If it ever bubbles up in
my profiler, I might try to make it more efficient, but in the
meantime, it doesn't seem to matter much.
> What matters is _overall_ throughput, not solely server performance.
> If you optimize the server at the cost of additional work on all
> clients, it's not good.
If you optimize the server at the cost of additional work on the
clients, you're optimizing the more centralized resource at the cost
of the less centralized resource. The server absolutely has to be the
most optimal part of the whole thing for this very reason.
That's not to say the optimization of other components should be
ignored, but I have not found any dissatisfaction in my ability to
optimize things with the existing protocol.
>> In the text protocol, a get with several keys only returns hits and
>> an end marker. The idea is that if you're issuing that request,
>> you're probably going to return some kind of dictionary structure to
>> something.
>
> This "probably" comes from no where, and is a bad assumption for the
> generic design. Client might not need to have the dictionary, and
> currently it is forced to have it.
Does any client out there do a multi-get for a series of keys and not
return values mapped to those keys?
>> Ah, well in the general case, there's no processing to do for not
>> found keys.
>
> As follows from your page, t1 that has requested [a] would have to
> wait until [d] is processed, while it could continue once [nil] for
> [a] has been returned.
>
> On the page you should also describe the drawbacks of single
> I/O-thread approach: that threads effectively block each other. For
> instance, high-priority t5 asking for small data for [d] would be
> blocked by low-priority t1 asking for large data for [a].
I don't see that as a drawback. If you actually had different
priorities for different request types and determined that you were
actually running into a blocking condition as such that caused a
latency problem, you could just use different client instances and
it'd go away immediately. Alternatively, if anyone cared, I could
implement a priority concept for requests.
It had been suggested that I allow multiple connections per
destination per client to reduce latency. I experimented with a
branch doing that, but I wasn't able to measure a difference (I think
you need more computers than I have available for such a test).
Whichever way you look at it, it's only a drawback to the single IO
thread approach if you can demonstrate that my IO thread is somehow
limiting the throughput.
It seems like memcached has historically been single threaded for
most installations, and as far as I can tell, when it's multithreaded
it's not because one thread can't keep IO buffers full.
--
Dustin Sallings
More information about the memcached
mailing list