Large lists

K J sanbat at gmail.com
Mon Nov 5 11:34:11 UTC 2007


>
> However, memcached semantics don't quite give you what you want.
> Depending on whether you can reasonably get a configuration to do what
> you want, it might be easier to think of memcached as a bloom filter
> than as a set in this case.  That is, if you negatively cache things
> that *aren't* part of your list, then the presence of a key will tell
> you for certain that a particular key is not a member, but the absence
> of a key would mean that you don't know (or perhaps memcached *did*
> know, but had since forgotten).


I had thought of doing a bloom filter on it as well.  The problem here is,
the membership list might change sometimes, and reading info on bloom
filters, it's not really well-suited for dynamically changing lists?


> You could optionally preload objects that are likely to be used if
> you think the natural population wouldn't do it effectively (you can
> measure this with stats).
>
Suppose I cache 10,000 recently-logged in members.  Also, suppose 50% of
traffic actually come from these users.  Then, this cache would have a high
hit ratio when testing for membership.

However, what about the non-members?  For instance let's say 40% of the
traffic come from non-members.  This would mean there'd need to be a full
listing of members to check?

Hmmm an interesting thought did just come across my mind.  Let me hear your
thoughts:

   - Cache 10,000 most recently logged-in members
   - Bloom filter on the entire list

This way, you can test for negatives (bloom filter), and if there's a
positive, check the 10,000 most recently logged-in users.  If that still
yields nothing, then do a database query.  In effect, only a small minority
of checks would require a trip to the DB.





>
> On Nov 4, 2007, at 23:32, J A wrote:
>
> > I have a fairly large members list that I want to keep in memcache.
> > What I do with this list is query it against particular user IDs to
> > see if they are a member of that list or not.  If they are they get
> > certain priviledges.
> >
> > The problem is, this list has gotten to the point of saturating the
> > PHP's memory when fetching the MySQL query the first time.
> >
> > Is there a way to do this more effectively, for instance,
> > partitioning the list into separate smaller lists, grouped by time
> > of login?  I'm thinking of this, as users who have logged in in the
> > past 3 months are more likely to be in the list anyway.
>
>
>        It'd be easier to not think of it as a list if you're just testing
> for membership.  All you want to know is if a particular object is an
> element of a particular set.  You could do this by key convention if
> you batch populate the records.
>
>        However, memcached semantics don't quite give you what you want.
> Depending on whether you can reasonably get a configuration to do what
> you want, it might be easier to think of memcached as a bloom filter
> than as a set in this case.  That is, if you negatively cache things
> that *aren't* part of your list, then the presence of a key will tell
> you for certain that a particular key is not a member, but the absence
> of a key would mean that you don't know (or perhaps memcached *did*
> know, but had since forgotten).
>
>        You could, of course, record the status either way so as to tell
> the
> difference between not knowing and knowing whether it's a member or
> not.  This is probably best suited to your needs.
>
>        You could optionally preload objects that are likely to be used if
> you think the natural population wouldn't do it effectively (you can
> measure this with stats).
>
> --
> Dustin Sallings
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.danga.com/pipermail/memcached/attachments/20071105/9312819f/attachment.html


More information about the memcached mailing list