IDEA: Hierarchy of caches for high performance AND high
capacity.
Perrin Harkins
perrin at elem.com
Thu Nov 2 20:38:44 UTC 2006
On Wed, 2006-11-01 at 14:53 -0800, Kevin Burton wrote:
> Yes......... but if the data isnt' in the local cache it won't really
> slow down the system very much and for certain types of applications
> the speedup might be significant. Having benchmarks of the local
> cache is important to figure out if it's contributing to a performance
> boost.
The gist of it is that things that are fetched from the local cache will
be faster, things that are fetched from memcached will be slower (due to
looking in the local cache first), things that are fetched from the disk
cache will be faster (since they aren't coming from the database), and
things that come from the database will be slower, since they have to
wait for three cache fetches. Updates and inserts will all be much
slower since they have to be written to four places.
I think that a better approach is to just use multiple caches for
different things, rather than the sort of hierarchical approach you
suggest. If you designate certain types of data for each cache and
don't write the same data to multiple ones, you avoid the mess of
talking to multiple caches for each get or set.
> BigTable isn't really a distributed hash. It provides a
> complex data
> access API and is heavily oriented towards redundancy and
> failover.
> It's a closer cousin to MySQL Cluster than to Memcached.
>
> Sort of........ it's cell/row based mechanism so you can view it as a
> map/dictionary. There's no SQL or sorting indexes so I think you have
> to build that out on top....
The Wikipedia summary isn't bad: http://en.wikipedia.org/wiki/BigTable
It stores data in sorted order by row key. They have a custom language
for querying, called Sawzall. The biggest difference though, is the
emphasis on redundancy. With memcached, the more servers you add to a
cluster, the more likely you are to experience data loss (more servers +
no redundancy = more failures), while BigTable works very hard to avoid
this by using multiple copies, commit logs, etc.
- Perrin
More information about the memcached
mailing list