Oh, I'm so bad. I forgot to mention that I cheat a bit in the example I gave. It doesn't demonstrate the multi_get. Which you should use in a situation like this. I dont use it this time because my GetArticle() function is able to cheat a bit because the peice of code that runs before this had already pulled down all the article information for what it needed to do, so a lot of it is already in a local hash array. So it doesn't need to ask memcache for maybe 90% of the articles that will generate XML in this example.
<br><br>Sorry for any confusion there... especially since I started off talking about my solution for caching paged data, and then I gave you some example code for something completely different. Oh well. You get what you pay for, eh?
<br><br><div><span class="gmail_quote">On 9/21/07, <b class="gmail_sendername">Clint Webb</b> <<a href="mailto:webb.clint@gmail.com">webb.clint@gmail.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Well, not really arrays in memory. Lists in memcache. It works well because 99% of requests will always be within the 100. That is why it exists as apposed to just having a 500,1000,1500, etc.<br><br>But lets say a user has browsed up to the 800th entry. When getting the key from memcache, it only gets the one (this one is called "Articles:top:limit=1000"). Which contains all the ID's of the current articles in the order that they would be displayed. I skip the first 800 entries in that list, and display the next 15. The actual contents of the article itself is in another cache entry ("Article:aid=800" for example). Depending on how I was accessing this information, I would produce either xml or html and would generate a key called "xml:Article-list:800-814" or "html:Article-list:800-814").
<br><br>This might not really be the most optimal solution but it worked well enough so far that I havent gone back to look at doing any optimisations. <br><br>One optimization I did think of doing.. The list of ID's in the "Articles:top:limit=1000" is just a comma delimited string of ID's. I was planning on either storing the ID's in a fixed width spacing so that I could determine where in the string the 800th entry will be and just skip to that bit. Or I was thinking of storing a serialized array instead. But this piece of code has been quick enough so far that I haven't bothered.
<br><br>As a sample for ya's... in perl, if you have a comma delimited string, its fairly easy to drop that into an array. In fact, heres a peice of code that accesses part of it. The GetTopArticles(), GetArticle(), GetUsernameByUserID() and GetArticleScore() functions try and get the data from the cache, if not, it gets it from the database. This example doesn't illustrate the paging because this is for the public API that doesnt do paging.
<br><br> &ConnectCache;<br> my ($xml) = $cache->get("xml:Top:$limit");<br> unless ($xml) {<br><br> SiteLog("api:Articles - XML Top:$limit not found in cache. Adding.");<br><br> $xml = "<root>\n";
<br> $xml .= "\t<result>1</result>\n";<br><br> my ($list) = GetTopArticles($limit);<br> my (@top) = split(/,/, $list);<br> foreach $id (@top) {<br> my ($article) = GetArticle($id);
<br> unless ($article) { SiteLog("api.cgi: Wasnt able to get article details for aid=$id"); }<br> else {<br><br> my ($d_userID) = 0 + $article->{'userID'};<br> my ($d_username) = GetUsernameByUserID($d_userID);
<br> my ($v_score) = GetArticleScore($id);<br><br> $xml .= "\t<article>\n";<br> $xml .= "\t\t<id>$id</id>\n";<br> $xml .= "\t\t<uid>$d_userID</uid>\n";
<br> $xml .= "\t\t<username>".xml_quote($d_username)."</username>\n";<br> $xml .= "\t\t<title>".xml_quote($article->{'title'})."</title>\n";
<br> $xml .= "\t\t<content>".xml_quote($article->{'content'})."</content>\n";<br> $xml .= "\t\t<url>".xml_quote($article->{'url'})."</url>\n";
<br> $xml .= "\t\t<score>$v_score</score>\n";<br> $xml .= "\t</article>\n";<br> }<br> }<br><br> $xml .= "</root>\n";
<br> $cache->add("xml:Top:$limit", "$xml", 600);<br> }<br> &cgiXML;<br> print "$xml";<br><br><br><br><br><div><span class="q"><span class="gmail_quote">On 9/21/07, <b class="gmail_sendername">
K J</b> <<a href="mailto:sanbat@gmail.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">sanbat@gmail.com</a>> wrote:</span></span><div><span class="e" id="q_1152737a4235750c_2"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<span>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0px 0px 0px 0.8ex; padding-left: 1ex;">Only you could answer that definitively, but I would guess that it would be better to get the lot. Depends how often your data changes.
<br><br>On my site, people see the first 15 entries, but I put the first 100 in one cache key, and the first 500 in a second cache key if needed. I get the first 15 out of the hundred, and if they want more, I iterate though it until I need more than 100. On the rare occassion that anyone gets past the 500 mark I just go straight to the database, and then add back to the cache.
<br><br>I've split it up into 100 and 500 because most people would only ever look at less than the first 100 entries. if they do manage to look past the first 100, then I have the first 500 cached in another key. Keep in mind, this is not first 100 next 500 to make a total of 600 articles. The first 100 are also duplicated in the 500 list. The 500 entry list is generated only the first time it is needed, and the exact same routine also creates the 1000 entry key if that is ever needed, and so on. There is no built in limit, it could end up being a key for a 20000 entry list fall all I know.
<br><br>Every situation is different. I suggest you build some test cases and test it under various situations and see what works for you. There are some parts of my site that dont use memcache at all and simply go to the database directly every time, but I did it that way because for that particular problem a cached solution would be clunky, and memcache just didnt fit well. But apart from those special cases, I cache almost everything. I cache the little bits of data (such as key for each IP address that hits the site, I increment a counter each time they hit, and give it an expiry), all the small elements of data, all the bigger elements made up of the smaller elements, all the rendered XML and some of the rendered HTML. My database is mostly idle :)
</blockquote>
<div> </div></span>
<div>I'm wondering about the 100, then the 500. Are you creating a new array at certain intervals? For instance suppose a user keeps paging through the results and end up at result 800. Would you then have 3 arrays like this?
</div>
<div>- 100 array</div>
<div>- 500 array</div>
<div>- 1000 array</div>
<div><br> </div>
</blockquote></span></div></div><br><br clear="all"><div><span class="e" id="q_1152737a4235750c_4"><br>-- <br>"Be excellent to each other"
</span></div></blockquote></div><br><br clear="all"><br>-- <br>"Be excellent to each other"