New user questions

Thu Apr 12 16:59:28 UTC 2007

Hi everyone,

I've been researching / experimenting with memcache for a few days now, and
I think it's the best thing since sliced bread!  One of those great ideas
that made me think, "Why didn't *I* think of that!?"

A bit of background on our environment -- FBS Data Systems ( fbsdata.com )
creates web based application software for the Real Estate industry, and we
do somewhere around 20 million hits a day.  The architecture is pretty
standard, 30 load balanced web servers, and 8 big DB2 servers.  Adding more
memory to our web servers is much cheaper than ~$12k per CPU for DB2.

I've set up a test environment and I have a few questions on
implementation.   (This might be a bit long, so thanks in advance for
reading / answering questions!)

I have two memcached procs running, one local to the client, and one remote
across gig ethernet.

1)  is it better to have a large number of variables with small values, or a
smaller amount of variables with larger values?  I ran a test of 300,000
variables each 26 bytes in length.  A set() loop for the whole test took
around 20 seconds.  8 variables at around 1MB a piece took 0.287 seconds.  I
realize that there might be some overhead during each iteration, but this is
quite a time difference.  (  strlen() is called 2x for each iteration)   The
performance consideration here was to create one large value with comma
separated ID strings, insert them to memcache, then pull them back and run a
big split on the string.  This would still require some client side
processing time, but it would be nice from a programming perspective to be
able to add 300,000 variables in a quick amount of time.

Is there some efficiency tweaking that I'm missing on the memcached server?
(Side tangent question -- is it possible to increase the max value length of
1MB?)

2)   I'm still trying to get into the mindset that memcache is to be used as
a volatile cache, not a long term session storage space.  Still, it's an
attractive idea -- has anyone created a mirrored cache system?  I was
thinking, if I have 30 web machines with 2GB of spare memory a piece, I
could run two memcached procs @ 1GB each, then create an API wrapper to
write/read to the two separate clusters.  The only consideration is the
probability that the hashing algorithm might choose the two mirrored
variables to store on one machine, killing the redundancy.  This might be
easier to implement in the daemon...  or am I completely thinking down the
wrong path on this one?   Does the availability of cache data (hit/miss
ratios) have a large effect on overall performance?

3)  I don't know if this is the right place to ask this -- I'm using
libmemcache.  The mc_set() prototype has a 'flags' parameter, but everywhere
I see it set to 0 with no documentation.  Anyone know what these are for,
and any documentation on this?

4)  I've been testing the event of the memcached servers being full.
Initially I was thinking along the functionality of the -M parameter, to
tell the client it's full and have some sort of contingency based on that...
however I'm thinking this is in the mentality of #2, trying to save data
that shouldn't be saved.  I did notice that given a short expiration time on
variables, the -M option didn't seem to actually delete them, it kept giving
out of memory errors on successive set operations.  Is this a bug or normal
behavior?  In any event, I decided it's probably best to leave the daemon at
default behavior to clean up after itself, so this is just more of a
curiosity.

Thanks, and I hope to add us to your list of users in the near future!

--Cal

-- 
Cal Heldenbrand
   FBS Data Systems
   E-mail:  cal at fbsdata.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.danga.com/pipermail/memcached/attachments/20070412/53fc2b3a/attachment.htm