Altering queries?

Fri Sep 21 08:38:19 UTC 2007

Well, not really arrays in memory.  Lists in memcache.   It works well
because 99% of requests will always be within the 100.  That is why it
exists as apposed to just having a 500,1000,1500, etc.

But lets say a user has browsed up to the 800th entry.  When getting the key
from memcache, it only gets the one (this one is called
"Articles:top:limit=1000").  Which contains all the ID's of the current
articles in the order that they would be displayed.  I skip the first 800
entries in that list, and display the next 15.  The actual contents of the
article itself is in another cache entry ("Article:aid=800" for example).
Depending on how I was accessing this information, I would produce either
xml or html and would generate a key called "xml:Article-list:800-814" or
"html:Article-list:800-814").

This might not really be the most optimal solution but it worked well enough
so far that I havent gone back to look at doing any optimisations.

One optimization I did think of doing..  The list of ID's in the
"Articles:top:limit=1000" is just a comma delimited string of ID's.  I was
planning on either storing the ID's in a fixed width spacing so that I could
determine where in the string the 800th entry will be and just skip to that
bit.  Or I was thinking of storing a serialized array instead.  But this
piece of code has been quick enough so far that I haven't bothered.

As a sample for ya's... in perl, if you have a comma delimited string, its
fairly easy to drop that into an array.  In fact, heres a peice of code that
accesses part of it.  The GetTopArticles(), GetArticle(),
GetUsernameByUserID() and GetArticleScore() functions try and get the data
from the cache, if not, it gets it from the database.   This example doesn't
illustrate the paging because this is for the public API that doesnt do
paging.

    &ConnectCache;
    my ($xml) = $cache->get("xml:Top:$limit");
    unless ($xml) {

        SiteLog("api:Articles - XML Top:$limit not found in cache.
Adding.");

        $xml = "<root>\n";
        $xml .= "\t<result>1</result>\n";

        my ($list) = GetTopArticles($limit);
        my (@top) = split(/,/, $list);
        foreach $id (@top) {
            my ($article) = GetArticle($id);
            unless ($article) { SiteLog("api.cgi: Wasnt able to get article
details for aid=$id"); }
            else {

                my ($d_userID) = 0 + $article->{'userID'};
                my ($d_username) = GetUsernameByUserID($d_userID);
                my ($v_score) = GetArticleScore($id);

                $xml .= "\t<article>\n";
                $xml .= "\t\t<id>$id</id>\n";
                $xml .= "\t\t<uid>$d_userID</uid>\n";
                $xml .=
"\t\t<username>".xml_quote($d_username)."</username>\n";
                $xml .=
"\t\t<title>".xml_quote($article->{'title'})."</title>\n";
                $xml .=
"\t\t<content>".xml_quote($article->{'content'})."</content>\n";
                $xml .= "\t\t<url>".xml_quote($article->{'url'})."</url>\n";
                $xml .= "\t\t<score>$v_score</score>\n";
                $xml .= "\t</article>\n";
            }
        }

        $xml .= "</root>\n";
        $cache->add("xml:Top:$limit", "$xml", 600);
    }
    &cgiXML;
    print "$xml";

On 9/21/07, K J <sanbat at gmail.com> wrote:
>
> Only you could answer that definitively, but I would guess that it would
> > be better to get the lot.  Depends how often your data changes.
> >
> > On my site, people see the first 15 entries, but I put the first 100 in
> > one cache key, and the first 500 in a second cache key if needed.  I get the
> > first 15 out of the hundred, and if they want more, I iterate though it
> > until I need more than 100.  On the rare occassion that anyone gets past the
> > 500 mark I just go straight to the database, and then add back to the
> > cache.
> >
> > I've split it up into 100 and 500 because most people would only ever
> > look at less than the first 100 entries.  if they do manage to look past the
> > first 100, then I have the first 500 cached in another key.  Keep in mind,
> > this is not first 100 next 500 to make a total of 600 articles.  The first
> > 100 are also duplicated in the 500 list.  The 500 entry list is generated
> > only the first time it is needed, and the exact same routine also creates
> > the 1000 entry key if that is ever needed, and so on.  There is no built in
> > limit, it could end up being a key for a 20000 entry list fall all I know.
> >
> > Every situation is different.  I suggest you build some test cases and
> > test it under various situations and see what works for you.  There are some
> > parts of my site that dont use memcache at all and simply go to the database
> > directly every time, but I did it that way because for that particular
> > problem a cached solution would be clunky, and memcache just didnt fit
> > well.  But apart from those special cases, I cache almost everything.  I
> > cache the little bits of data (such as key for each IP address that hits the
> > site, I increment a counter each time they hit, and give it an expiry), all
> > the small elements of data, all the bigger elements made up of the smaller
> > elements, all the rendered XML and some of the rendered HTML.  My database
> > is mostly idle :)
>
>
> I'm wondering about the 100, then the 500.  Are you creating a new array
> at certain intervals? For instance suppose a user keeps paging through the
> results and end up at result 800.  Would you then have 3 arrays like this?
> - 100 array
> - 500 array
> - 1000 array
>
>
>

-- 
"Be excellent to each other"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.danga.com/pipermail/memcached/attachments/20070921/5de01d22/attachment.html