New user questions

Thu Apr 12 18:51:56 UTC 2007

On Apr 12, 2007, at 9:59, Cal Heldenbrand wrote:

> 1)  is it better to have a large number of variables with small  
> values, or a smaller amount of variables with larger values?  I ran  
> a test of 300,000 variables each 26 bytes in length.  A set() loop  
> for the whole test took around 20 seconds.  8 variables at around  
> 1MB a piece took 0.287 seconds.  I realize that there might be some  
> overhead during each iteration, but this is quite a time  
> difference.  (  strlen() is called 2x for each iteration)   The  
> performance consideration here was to create one large value with  
> comma separated ID strings, insert them to memcache, then pull them  
> back and run a big split on the string.  This would still require  
> some client side processing time, but it would be nice from a  
> programming perspective to be able to add 300,000 variables in a  
> quick amount of time.

Does the web servers do all the set()'s ?    If you need tens of  
thousands of values set and get per request, then yes - you  
definitely need to aggregate them.

If you are doing the set()s in a separate process, then you are  
probably using memcached as a database rather than a cache and you  
should likely try to rethink it.

> 2)   I'm still trying to get into the mindset that memcache is to  
> be used as a volatile cache, not a long term session storage  
> space.  Still, it's an attractive idea -- has anyone created a  
> mirrored cache system?

MySQL Cluster/NDB.

> I was thinking, if I have 30 web machines with 2GB of spare memory  
> a piece, I could run two memcached procs @ 1GB each, then create an  
> API wrapper to write/read to the two separate clusters.  The only  
> consideration is the probability that the hashing algorithm might  
> choose the two mirrored variables to store on one machine, killing  
> the redundancy.

You might have to tweak your memcached client slightly, but it should  
be easy enough to test.

> This might be easier to implement in the daemon...  or am I  
> completely thinking down the wrong path on this one?

Most users are happier when they figure out how to use it as a  
cache.   One of the big issues is that if you don't treat the cache  
as a cache you will have a harder time keeping track of "where the  
real data is".  When the cache is just the cache, it comes more natural.

> Does the availability of cache data (hit/miss ratios) have a large  
> effect on overall performance?

That depends on how expensive a cache miss is.  :-)

  - ask

-- 
http://develooper.com/ - http://askask.com/