Levels of Caching

Andy Bakun memcached at thwartedefforts.org
Thu Feb 1 22:31:22 UTC 2007


On Thu, 2007-02-01 at 20:51 +0200, Reinis Rozitis wrote:
> It fully depends on your website.
> 
> For example if you run a site with userbase and where users log in (some 
> social network) - you cant really make a static cache because every page is 
> different for every user (rewriting cache on each pageview isnt really 
> usefull). In this case even the squids are useless and just one extra layer. 
> You can of course cache some parts / variables / datasets of the output 
> (where memcached comes in).

I second the above.

I experienced diminishing returns in trying to cache the _results_ of
page generation (the HTML or HTML fragments), like squid does, and saw
significant gains when caching the _inputs_ to the page generation.
This ended up seeing additional gains overall because _inputs_ to page
generation can be reused in other parts of pages, but not all HTML
fragments can be reused without significant forethought to proper
element nesting and stylesheet/class/id use.  This is pretty much what
the memcached site suggests you do, but on a query-by-query basis
(which, if you have performance metrics that say your database is the
slowest part, is really the best way to start speeding up your site).

If you have already written your site using some kind of MVC framework
(or even not being strictly MVC, but have some kind of Model of the
database abstracted out), adding memcached as an additional layer is
trivial, even more so if you actually define your models as objects
(because with proper data hiding, the cache doesn't even need to store
the data in the same format as the data that comes out of the database).

For example, I had a class named "product", where the constructor would
take a primary key to the products table.  That primary key was used in
a query to pull data out of the database to populate fields on the
object.  To add a memcached layer, I just hit memcached before hitting
the database in the constructor.

Specifically in PHP with Smarty, the Smarty templates were given
objects, and the templates would call the accessor methods directly.
Smarty, in the documentation, suggests that you pre-assign variables to
the template, effectively creating a code<->template contract, but this
revealed too much of the caching system to the template designers.

I also had implemented "delayed loading": the act of loading the data
(from memcached or the database) was delayed until accessor methods were
called for the individual product attributes/fields.  This was a big win
in terms of understandability of the code and data hiding because, for
example, the product search functionality (in the Controller) could
return a list of product objects (who know their primary key, but don't
have any data loaded yet) without needing to know how the View was going
to paginate the results.  If page 2 was being displayed, none of the
product data was retrieved for the products on page 1, even though the
entire list of products looked like it was available to the View (this
helped tremendously when things like the number-of-products-per-page
kept being tweaked by management, which ended up causing issues with the
layout, because the Model and Controller didn't need to be visited or
changed).  Of course, the results of the search query for "widgets"
returning a list of product primary keys was cached in memcached so
future searches for "widgets" wouldn't need to hit the database either.
Every page of the search results were cached after the first page was
viewed, resulting in huge gains on popular searches.

Additionally, you can run background jobs to pre-populate the cache with
know values and _still_ have personalization.  If you know that users
are going to start searching for a specific set of keywords (perhaps
based on current events or purchased advertising), you can insert known
results into the cache before the first actual search even happens.
This is much easier because the _input_ data is a known format, vs the
_output_ HTML could take a variety of forms.

As a side note, I had the Smarty-specific "compile dir" set, but turned
off its "template cache dir" because I didn't want the _results_ cached.
Caching the _output_ of the templates didn't work very well on a site
that had many personalization elements.  If one is using PHP, with
Smarty or not, performance can be increased by using a "PHP
Accelerator".  As someone else pointed out, that is different than the
Smarty "caching" and "compiled templates" functionality.

-- 
Andy Bakun <memcached at thwartedefforts.org>



More information about the memcached mailing list