Large lists

Mon Nov 5 14:26:37 UTC 2007

>
> In a case like this, I don't trust a cache miss to indicate presence
> or absence of state. I prefer to keep separate key spaces: one for the
> positive case and one for the negative. Keep a representation of The
> Truth in media with a stronger guarantee of data integrity [1]; as
> states change, write through to memcached when recording The Truth and
> upon cache misses, consult The Truth.
>
>
> [1] This could be an RDBMS, a Berkeley DB or something else. Which one
> is correct depends on your scale, performance and concurrency
> requirements.

I'm already using a DB to store the members list.  If you're proposing to
use another layer of DB, wouldn't it be better to just do a DB query in the
first place?

On Nov 5, 2007 10:16 PM, Ian Kallen <spidaman.list at gmail.com> wrote:

> In a case like this, I don't trust a cache miss to indicate presence
> or absence of state. I prefer to keep separate key spaces: one for the
> positive case and one for the negative. Keep a representation of The
> Truth in media with a stronger guarantee of data integrity [1]; as
> states change, write through to memcached when recording The Truth and
> upon cache misses, consult The Truth.
>
>
> [1] This could be an RDBMS, a Berkeley DB or something else. Which one
> is correct depends on your scale, performance and concurrency
> requirements.
>
> On 11/5/07, K J <sanbat at gmail.com> wrote:
> >
> > > However, memcached semantics don't quite give you what you want.
> > > Depending on whether you can reasonably get a configuration to do what
> > > you want, it might be easier to think of memcached as a bloom filter
> > > than as a set in this case.  That is, if you negatively cache things
> > > that *aren't* part of your list, then the presence of a key will tell
> > > you for certain that a particular key is not a member, but the absence
> > > of a key would mean that you don't know (or perhaps memcached *did*
> > > know, but had since forgotten).
> >
> >
> > I had thought of doing a bloom filter on it as well.  The problem here
> is,
> > the membership list might change sometimes, and reading info on bloom
> > filters, it's not really well-suited for dynamically changing lists?
> >
> > > You could optionally preload objects that are likely to be used if
> > > you think the natural population wouldn't do it effectively (you can
> > > measure this with stats).
> > >
> >
> > Suppose I cache 10,000 recently-logged in members.  Also, suppose 50% of
> > traffic actually come from these users.  Then, this cache would have a
> high
> > hit ratio when testing for membership.
> >
> > However, what about the non-members?  For instance let's say 40% of the
> > traffic come from non-members.  This would mean there'd need to be a
> full
> > listing of members to check?
> >
> > Hmmm an interesting thought did just come across my mind.  Let me hear
> your
> > thoughts:
> >
> > Cache 10,000 most recently logged-in members
> > Bloom filter on the entire list
> > This way, you can test for negatives (bloom filter), and if there's a
> > positive, check the 10,000 most recently logged-in users.  If that still
> > yields nothing, then do a database query.  In effect, only a small
> minority
> > of checks would require a trip to the DB.
> >
> >
> >
> >
> >
> > >
> > >
> > >
> > > On Nov 4, 2007, at 23:32, J A wrote:
> > >
> > > > I have a fairly large members list that I want to keep in memcache.
> > > > What I do with this list is query it against particular user IDs to
> > > > see if they are a member of that list or not.  If they are they get
> > > > certain priviledges.
> > > >
> > > > The problem is, this list has gotten to the point of saturating the
> > > > PHP's memory when fetching the MySQL query the first time.
> > > >
> > > > Is there a way to do this more effectively, for instance,
> > > > partitioning the list into separate smaller lists, grouped by time
> > > > of login?  I'm thinking of this, as users who have logged in in the
> > > > past 3 months are more likely to be in the list anyway.
> > >
> > >
> > >        It'd be easier to not think of it as a list if you're just
> testing
> > > for membership.  All you want to know is if a particular object is an
> > > element of a particular set.  You could do this by key convention if
> > > you batch populate the records.
> > >
> > >        However, memcached semantics don't quite give you what you
> want.
> > > Depending on whether you can reasonably get a configuration to do what
> > > you want, it might be easier to think of memcached as a bloom filter
> > > than as a set in this case.  That is, if you negatively cache things
> > > that *aren't* part of your list, then the presence of a key will tell
> > > you for certain that a particular key is not a member, but the absence
> > > of a key would mean that you don't know (or perhaps memcached *did*
> > > know, but had since forgotten).
> > >
> > >        You could, of course, record the status either way so as to
> tell
> > the
> > > difference between not knowing and knowing whether it's a member or
> > > not.  This is probably best suited to your needs.
> > >
> > >        You could optionally preload objects that are likely to be used
> if
> > > you think the natural population wouldn't do it effectively (you can
> > > measure this with stats).
> > >
> > > --
> > > Dustin Sallings
> > >
> > >
> > >
> > >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.danga.com/pipermail/memcached/attachments/20071105/78ffd30f/attachment.html