Hi everyone,<br><br>I've been researching / experimenting with memcache for a few days now, and I think it's the best thing since sliced bread! One of those great ideas that made me think, "Why didn't *I* think of that!?"
<br><br>A bit of background on our environment -- FBS Data Systems ( <a href="http://fbsdata.com">fbsdata.com</a> ) creates web based application software for the Real Estate industry, and we do somewhere around 20 million hits a day. The architecture is pretty standard, 30 load balanced web servers, and 8 big DB2 servers. Adding more memory to our web servers is much cheaper than ~$12k per CPU for DB2.
<br><br>I've set up a test environment and I have a few questions on implementation. (This might be a bit long, so thanks in advance for reading / answering questions!)<br><br>I have two memcached procs running, one local to the client, and one remote across gig ethernet.
<br><br>1) is it better to have a large number of variables with small values, or a smaller amount of variables with larger values? I ran a test of 300,000 variables each 26 bytes in length. A set() loop for the whole test took around 20 seconds. 8 variables at around 1MB a piece took
0.287 seconds. I realize that there might be some overhead during each iteration, but this is quite a time difference. ( strlen() is called 2x for each iteration) The performance consideration here was to create one large value with comma separated ID strings, insert them to memcache, then pull them back and run a big split on the string. This would still require some client side processing time, but it would be nice from a programming perspective to be able to add 300,000 variables in a quick amount of time.
<br><br>Is there some efficiency tweaking that I'm missing on the memcached server? (Side tangent question -- is it possible to increase the max value length of 1MB?)<br><br>2) I'm still trying to get into the mindset that memcache is to be used as a volatile cache, not a long term session storage space. Still, it's an attractive idea -- has anyone created a mirrored cache system? I was thinking, if I have 30 web machines with 2GB of spare memory a piece, I could run two memcached procs @ 1GB each, then create an API wrapper to write/read to the two separate clusters. The only consideration is the probability that the hashing algorithm might choose the two mirrored variables to store on one machine, killing the redundancy. This might be easier to implement in the daemon... or am I completely thinking down the wrong path on this one? Does the availability of cache data (hit/miss ratios) have a large effect on overall performance?
<br><br>3) I don't know if this is the right place to ask this -- I'm using libmemcache. The mc_set() prototype has a 'flags' parameter, but everywhere I see it set to 0 with no documentation. Anyone know what these are for, and any documentation on this?
<br><br>4) I've been testing the event of the memcached servers being full. Initially I was thinking along the functionality of the -M parameter, to tell the client it's full and have some sort of contingency based on that... however I'm thinking this is in the mentality of #2, trying to save data that shouldn't be saved. I did notice that given a short expiration time on variables, the -M option didn't seem to actually delete them, it kept giving out of memory errors on successive set operations. Is this a bug or normal behavior? In any event, I decided it's probably best to leave the daemon at default behavior to clean up after itself, so this is just more of a curiosity.
<br><br>Thanks, and I hope to add us to your list of users in the near future!<br><br>--Cal<br><br><br clear="all"><br>-- <br>Cal Heldenbrand<br> FBS Data Systems<br> E-mail: <a href="mailto:cal@fbsdata.com">cal@fbsdata.com
</a>