New user questions
dustin at spy.net
Thu Apr 12 21:47:47 UTC 2007
On Apr 12, 2007, at 9:59 , Cal Heldenbrand wrote:
> 1) is it better to have a large number of variables with small
> values, or a smaller amount of variables with larger values? I ran
> a test of 300,000 variables each 26 bytes in length. A set() loop
> for the whole test took around 20 seconds. 8 variables at around
> 1MB a piece took 0.287 seconds. I realize that there might be some
> overhead during each iteration, but this is quite a time
> difference. ( strlen() is called 2x for each iteration) The
> performance consideration here was to create one large value with
> comma separated ID strings, insert them to memcache, then pull them
> back and run a big split on the string. This would still require
> some client side processing time, but it would be nice from a
> programming perspective to be able to add 300,000 variables in a
> quick amount of time.
> Is there some efficiency tweaking that I'm missing on the memcached
> server? (Side tangent question -- is it possible to increase the
> max value length of 1MB?)
It's got to do with processing the results, I believe. I'd consider
this terminal velocity:
dustintmb:/tmp 503% nc -v -w 1 localhost 11211 < mcsets > /dev/null
localhost [127.0.0.1] 11211 (?) open
0.014u 0.085s 0:06.46 1.3% 0+0k 0+0io 0pf+0w
For that, I generated a list of 300,000 in the form of 'k' + i The -
w 1 adds about a second to the end of the transaction, so I'd say I
loaded them in about five seconds.
Doing the same with my java API took me about 7 seconds to queue the
sets, but another 26s before the last set actually made it into the
server since I read and validate the results of each one
individually. Note that the netcat case pipelines writes in and
completely ignores store status (though I can check it with stats).
> 2) I'm still trying to get into the mindset that memcache is to
> be used as a volatile cache, not a long term session storage
> space. Still, it's an attractive idea -- has anyone created a
> mirrored cache system? I was thinking, if I have 30 web machines
> with 2GB of spare memory a piece, I could run two memcached procs @
> 1GB each, then create an API wrapper to write/read to the two
> separate clusters. The only consideration is the probability that
> the hashing algorithm might choose the two mirrored variables to
> store on one machine, killing the redundancy. This might be easier
> to implement in the daemon... or am I completely thinking down the
> wrong path on this one? Does the availability of cache data (hit/
> miss ratios) have a large effect on overall performance?
There are other tools out there more appropriate for long-term
storage. It sounds like you may be wanting something more like a
persistent DHT. As an interim, try treating it as a volatile caching
backing a centralized store and see how often you really end up
needing to hit the central point.
> 3) I don't know if this is the right place to ask this -- I'm
> using libmemcache. The mc_set() prototype has a 'flags' parameter,
> but everywhere I see it set to 0 with no documentation. Anyone
> know what these are for, and any documentation on this?
This is the best guide for that:
Basically, flags mean whatever you want. I have a transcoder in my
java API that uses the flags to remember what the value stored for a
given key actually means. For example, I use half of the flags to
tell me what type of an object I stored (integer, string, byte array,
serialized java object, etc...) and the other half to set common
flags like whether I gzipped the data in the transcoder (so it'll
know to decompress it).
> 4) I've been testing the event of the memcached servers being
> full. Initially I was thinking along the functionality of the -M
> parameter, to tell the client it's full and have some sort of
> contingency based on that... however I'm thinking this is in the
> mentality of #2, trying to save data that shouldn't be saved. I
> did notice that given a short expiration time on variables, the -M
> option didn't seem to actually delete them, it kept giving out of
> memory errors on successive set operations. Is this a bug or
> normal behavior? In any event, I decided it's probably best to
> leave the daemon at default behavior to clean up after itself, so
> this is just more of a curiosity.
The value of -M is never clear to me. I have some data with which I
need to do something. It may be processed already and sitting in my
memcached cluster, or I may just have to do some preprocessing on it
directly from the source (and store that in memcached). Having
memcached stop working just because it's full seems like it would
just cause me problems.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the memcached