Altering queries?

Fri Sep 21 08:52:35 UTC 2007

Oh, I'm so bad.  I forgot to mention that I cheat a bit in the example I
gave.  It doesn't demonstrate the multi_get.  Which you should use in a
situation like this.  I dont use it this time because my GetArticle()
function is able to cheat a bit because the peice of code that runs before
this had already pulled down all the article information for what it needed
to do, so a lot of it is already in a local hash array.  So it doesn't need
to ask memcache for maybe 90% of the articles that will generate XML in this
example.

Sorry for any confusion there... especially since I started off talking
about my solution for caching paged data, and then I gave you some example
code for something completely different.  Oh well.  You get what you pay
for, eh?

On 9/21/07, Clint Webb <webb.clint at gmail.com> wrote:
>
> Well, not really arrays in memory.  Lists in memcache.   It works well
> because 99% of requests will always be within the 100.  That is why it
> exists as apposed to just having a 500,1000,1500, etc.
>
> But lets say a user has browsed up to the 800th entry.  When getting the
> key from memcache, it only gets the one (this one is called
> "Articles:top:limit=1000").  Which contains all the ID's of the current
> articles in the order that they would be displayed.  I skip the first 800
> entries in that list, and display the next 15.  The actual contents of the
> article itself is in another cache entry ("Article:aid=800" for example).
> Depending on how I was accessing this information, I would produce either
> xml or html and would generate a key called "xml:Article-list:800-814" or
> "html:Article-list:800-814").
>
> This might not really be the most optimal solution but it worked well
> enough so far that I havent gone back to look at doing any optimisations.
>
> One optimization I did think of doing..  The list of ID's in the
> "Articles:top:limit=1000" is just a comma delimited string of ID's.  I was
> planning on either storing the ID's in a fixed width spacing so that I could
> determine where in the string the 800th entry will be and just skip to that
> bit.  Or I was thinking of storing a serialized array instead.  But this
> piece of code has been quick enough so far that I haven't bothered.
>
> As a sample for ya's... in perl, if you have a comma delimited string, its
> fairly easy to drop that into an array.  In fact, heres a peice of code that
> accesses part of it.  The GetTopArticles(), GetArticle(),
> GetUsernameByUserID() and GetArticleScore() functions try and get the data
> from the cache, if not, it gets it from the database.   This example doesn't
> illustrate the paging because this is for the public API that doesnt do
> paging.
>
>     &ConnectCache;
>     my ($xml) = $cache->get("xml:Top:$limit");
>     unless ($xml) {
>
>         SiteLog("api:Articles - XML Top:$limit not found in cache.
> Adding.");
>
>         $xml = "<root>\n";
>         $xml .= "\t<result>1</result>\n";
>
>         my ($list) = GetTopArticles($limit);
>         my (@top) = split(/,/, $list);
>         foreach $id (@top) {
>             my ($article) = GetArticle($id);
>             unless ($article) { SiteLog("api.cgi: Wasnt able to get
> article details for aid=$id"); }
>             else {
>
>                 my ($d_userID) = 0 + $article->{'userID'};
>                 my ($d_username) = GetUsernameByUserID($d_userID);
>                 my ($v_score) = GetArticleScore($id);
>
>                 $xml .= "\t<article>\n";
>                 $xml .= "\t\t<id>$id</id>\n";
>                 $xml .= "\t\t<uid>$d_userID</uid>\n";
>                 $xml .=
> "\t\t<username>".xml_quote($d_username)."</username>\n";
>                 $xml .=
> "\t\t<title>".xml_quote($article->{'title'})."</title>\n";
>                 $xml .=
> "\t\t<content>".xml_quote($article->{'content'})."</content>\n";
>                 $xml .=
> "\t\t<url>".xml_quote($article->{'url'})."</url>\n";
>                 $xml .= "\t\t<score>$v_score</score>\n";
>                 $xml .= "\t</article>\n";
>             }
>         }
>
>         $xml .= "</root>\n";
>         $cache->add("xml:Top:$limit", "$xml", 600);
>     }
>     &cgiXML;
>     print "$xml";
>
>
>
>
> On 9/21/07, K J <sanbat at gmail.com> wrote:
> >
> > Only you could answer that definitively, but I would guess that it would
> > > be better to get the lot.  Depends how often your data changes.
> > >
> > > On my site, people see the first 15 entries, but I put the first 100
> > > in one cache key, and the first 500 in a second cache key if needed.  I get
> > > the first 15 out of the hundred, and if they want more, I iterate though it
> > > until I need more than 100.  On the rare occassion that anyone gets past the
> > > 500 mark I just go straight to the database, and then add back to the
> > > cache.
> > >
> > > I've split it up into 100 and 500 because most people would only ever
> > > look at less than the first 100 entries.  if they do manage to look past the
> > > first 100, then I have the first 500 cached in another key.  Keep in mind,
> > > this is not first 100 next 500 to make a total of 600 articles.  The first
> > > 100 are also duplicated in the 500 list.  The 500 entry list is generated
> > > only the first time it is needed, and the exact same routine also creates
> > > the 1000 entry key if that is ever needed, and so on.  There is no built in
> > > limit, it could end up being a key for a 20000 entry list fall all I know.
> > >
> > > Every situation is different.  I suggest you build some test cases and
> > > test it under various situations and see what works for you.  There are some
> > > parts of my site that dont use memcache at all and simply go to the database
> > > directly every time, but I did it that way because for that particular
> > > problem a cached solution would be clunky, and memcache just didnt fit
> > > well.  But apart from those special cases, I cache almost everything.  I
> > > cache the little bits of data (such as key for each IP address that hits the
> > > site, I increment a counter each time they hit, and give it an expiry), all
> > > the small elements of data, all the bigger elements made up of the smaller
> > > elements, all the rendered XML and some of the rendered HTML.  My database
> > > is mostly idle :)
> >
> >
> > I'm wondering about the 100, then the 500.  Are you creating a new array
> > at certain intervals? For instance suppose a user keeps paging through the
> > results and end up at result 800.  Would you then have 3 arrays like this?
> > - 100 array
> > - 500 array
> > - 1000 array
> >
> >
> >
>
>
>
> --
> "Be excellent to each other"
>

-- 
"Be excellent to each other"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.danga.com/pipermail/memcached/attachments/20070921/72066363/attachment.html