Listing all keys on a server: why?

Thu Jul 27 23:49:06 UTC 2006

Simply put You are using (trying to use) memcached for something its not
really ment. And I kinda somehow dont like such ideas (like sorting and so)
if their implementation affects the existing performance and speed of the
product.

I'll give and example how we do the things.
If you need sorting or persistant objects / lists use DB or in our case we
have an extra written service which just holds item IDs which can be binary
sorted and splited.
The service has a simple interface which according to given params (like
sorting field/offset) returns a list (a php array) which can directly be
sent to memcached and you get Objects full data.

> Ideally, I would like to see $mc->getKeys().  It would return every
> non-expired key with statistics on each (including the value, number
> of gets (if available or even remotely feasible), and the expiration
> time)

I cant really see some nice solution here. We now have mc server with such
stats:

[uptime] => 4888949
[curr_items] => 12994392
[total_items] => 4532838543
[bytes] => 4179396213

Dump 13Mio keys? Keep statistics of each hit? Seems a bit insane..

Still you can try http://meta.wikimedia.org/wiki/Tugelacache  wich has some
off the mentioned feature requests and ideas..

rr

----- Original Message ----- 
From: Serhat Sakarya
To: Jacob Coby
Cc: memcached at lists.danga.com
Sent: Friday, July 28, 2006 12:52 AM
Subject: Re: Listing all keys on a server: why?

How would you deal with entries that depend indirectly on user jsmith?

A situation I'm trying to solve involves a gallery of up to 50k items that
can be sorted by a limited number of criteria. Memcached could store each
page (of 12 items) as function of order type and page number. The problem is
that whenever any of these items change, all relevant pages must be
invalidated (e.g. pushing an item to rank #1 will invalidate all following
pages).

One approach I'm considering involves 1. adding timestamps for each cached
entry and 2. keeping track of invalidation times for each group:

# cached entries as function of order and page
"xxx-rank-5" --> { 1000s, array(...) }         // invalidated
"xxx-rank-20" --> { 1080s, array(...) }       // valid
"xxx-rank-123" --> { 900s, array(...) }       // invalidated
"xxx-name-500" --> { 1040s, array(...) }   // valid
....etc

# meta-entries
"group-xxx-rank" --> { 1050s }
"group-xxx-name" --> { 1010s }

In this scenario, each page load would involve two memcache requests
followed by a timestamp delta check. If invalid, the entry can be replaced
with fresh values.

Such an approach might emulate the namespaces concept a bit...

The general problem with this case is that when there are many changes (e.g.
lots of people voting), the effective TTL for each entry decreases and so
does performance.

Maybe this is a bit of an extreme case though... however, I'm somewhat
curious about the merits of writing a custom storage engine (ab)using the
memcache protocol to return dynamically generated data in response to "get"
requests. This may give many more possibilities for optimization, especially
in those cases where a full-featured RDBMS is not necessary (e.g. keeping
track of visitor activity).

Regards,

Serhat

On 7/27/06, Jacob Coby < jcoby at listingbook.com> wrote:
I think a lot of that comes from the lack of namespaces in memcached. You
can't just say 'invalidate all cache entries for user jsmith.'  ...