New user questions

Fri Apr 13 14:24:15 UTC 2007

Dustin,

Thanks for all the info!  Do you think you could give me a small snippet of
that mcsets file you generated?  I've only scanned over the protocol a bit,
but I'd like to do a few tests in my environment too.  I didn't think I
could use netcat directly on the server, that's pretty cool.  :-)

Thanks everyone!

--Cal

On 4/12/07, Dustin Sallings <dustin at spy.net> wrote:
>
>
> On Apr 12, 2007, at 9:59 , Cal Heldenbrand wrote:
>
> 1)  is it better to have a large number of variables with small values, or
> a smaller amount of variables with larger values?  I ran a test of 300,000
> variables each 26 bytes in length.  A set() loop for the whole test took
> around 20 seconds.  8 variables at around 1MB a piece took 0.287 seconds.
> I realize that there might be some overhead during each iteration, but this
> is quite a time difference.  (  strlen() is called 2x for each iteration)
> The performance consideration here was to create one large value with comma
> separated ID strings, insert them to memcache, then pull them back and run a
> big split on the string.  This would still require some client side
> processing time, but it would be nice from a programming perspective to be
> able to add 300,000 variables in a quick amount of time.
>
> Is there some efficiency tweaking that I'm missing on the memcached
> server?  (Side tangent question -- is it possible to increase the max value
> length of 1MB?)
>
>
> It's got to do with processing the results, I believe.  I'd consider this
> terminal velocity:
>
> dustintmb:/tmp 503% nc -v -w 1 localhost 11211 < mcsets > /dev/null
> localhost [127.0.0.1] 11211 (?) open
> 0.014u 0.085s 0:06.46 1.3%      0+0k 0+0io 0pf+0w
>
> For that, I generated a list of 300,000 in the form of 'k' + i  The -w 1
> adds about a second to the end of the transaction, so I'd say I loaded them
> in about five seconds.
>
> Doing the same with my java API took me about 7 seconds to queue the sets,
> but another 26s before the last set actually made it into the server since I
> read and validate the results of each one individually.  Note that the
> netcat case pipelines writes in and completely ignores store status (though
> I can check it with stats).
>
> 2)   I'm still trying to get into the mindset that memcache is to be used
> as a volatile cache, not a long term session storage space.  Still, it's an
> attractive idea -- has anyone created a mirrored cache system?  I was
> thinking, if I have 30 web machines with 2GB of spare memory a piece, I
> could run two memcached procs @ 1GB each, then create an API wrapper to
> write/read to the two separate clusters.  The only consideration is the
> probability that the hashing algorithm might choose the two mirrored
> variables to store on one machine, killing the redundancy.  This might be
> easier to implement in the daemon...  or am I completely thinking down the
> wrong path on this one?   Does the availability of cache data (hit/miss
> ratios) have a large effect on overall performance?
>
>
> There are other tools out there more appropriate for long-term storage.
> It sounds like you may be wanting something more like a persistent DHT.  As
> an interim, try treating it as a volatile caching backing a centralized
> store and see how often  you really end up needing to hit the central point.
>
> 3)  I don't know if this is the right place to ask this -- I'm using
> libmemcache.  The mc_set() prototype has a 'flags' parameter, but everywhere
> I see it set to 0 with no documentation.  Anyone know what these are for,
> and any documentation on this?
>
>
> This is the best guide for that:
>
> http://code.sixapart.com/svn/memcached/trunk/server/doc/protocol.txt
>
> Basically, flags mean whatever you want.  I have a transcoder in my java
> API that uses the flags to remember what the value stored for a given key
> actually means.  For example, I use half of the flags to tell me what type
> of an object I stored (integer, string, byte array, serialized java object,
> etc...) and the other half to set common flags like whether I gzipped the
> data in the transcoder (so it'll know to decompress it).
>
> 4)  I've been testing the event of the memcached servers being full.
> Initially I was thinking along the functionality of the -M parameter, to
> tell the client it's full and have some sort of contingency based on that...
> however I'm thinking this is in the mentality of #2, trying to save data
> that shouldn't be saved.  I did notice that given a short expiration time on
> variables, the -M option didn't seem to actually delete them, it kept giving
> out of memory errors on successive set operations.  Is this a bug or normal
> behavior?  In any event, I decided it's probably best to leave the daemon at
> default behavior to clean up after itself, so this is just more of a
> curiosity.
>
>
> The value of -M is never clear to me.  I have some data with which I need
> to do something.  It may be processed already and sitting in my memcached
> cluster, or I may just have to do some preprocessing on it directly from the
> source (and store that in memcached).  Having memcached stop working just
> because it's full seems like it would just cause me problems.
>
> --
> Dustin Sallings
>
>
>

-- 
Cal Heldenbrand
   FBS Data Systems
   E-mail:  cal at fbsdata.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.danga.com/pipermail/memcached/attachments/20070413/5c25209b/attachment.htm