<div>
<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">In a case like this, I don't trust a cache miss to indicate presence<br>or absence of state. I prefer to keep separate key spaces: one for the
<br>positive case and one for the negative. Keep a representation of The<br>Truth in media with a stronger guarantee of data integrity [1]; as<br>states change, write through to memcached when recording The Truth and<br>upon cache misses, consult The Truth.
<br><br><br>[1] This could be an RDBMS, a Berkeley DB or something else. Which one<br>is correct depends on your scale, performance and concurrency<br>requirements.</blockquote></div>
<div> </div>
<div>I'm already using a DB to store the members list. If you're proposing to use another layer of DB, wouldn't it be better to just do a DB query in the first place?</div>
<div><br><br> </div>
<div class="gmail_quote">On Nov 5, 2007 10:16 PM, Ian Kallen <<a href="mailto:spidaman.list@gmail.com">spidaman.list@gmail.com</a>> wrote:<br>
<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">In a case like this, I don't trust a cache miss to indicate presence<br>or absence of state. I prefer to keep separate key spaces: one for the
<br>positive case and one for the negative. Keep a representation of The<br>Truth in media with a stronger guarantee of data integrity [1]; as<br>states change, write through to memcached when recording The Truth and<br>upon cache misses, consult The Truth.
<br><br><br>[1] This could be an RDBMS, a Berkeley DB or something else. Which one<br>is correct depends on your scale, performance and concurrency<br>requirements.<br>
<div>
<div></div>
<div class="Wj3C7c"><br>On 11/5/07, K J <<a href="mailto:sanbat@gmail.com">sanbat@gmail.com</a>> wrote:<br>><br>> > However, memcached semantics don't quite give you what you want.<br>> > Depending on whether you can reasonably get a configuration to do what
<br>> > you want, it might be easier to think of memcached as a bloom filter<br>> > than as a set in this case. That is, if you negatively cache things<br>> > that *aren't* part of your list, then the presence of a key will tell
<br>> > you for certain that a particular key is not a member, but the absence<br>> > of a key would mean that you don't know (or perhaps memcached *did*<br>> > know, but had since forgotten).<br>>
<br>><br>> I had thought of doing a bloom filter on it as well. The problem here is,<br>> the membership list might change sometimes, and reading info on bloom<br>> filters, it's not really well-suited for dynamically changing lists?
<br>><br>> > You could optionally preload objects that are likely to be used if<br>> > you think the natural population wouldn't do it effectively (you can<br>> > measure this with stats).<br>> >
<br>><br>> Suppose I cache 10,000 recently-logged in members. Also, suppose 50% of<br>> traffic actually come from these users. Then, this cache would have a high<br>> hit ratio when testing for membership.<br>
><br>> However, what about the non-members? For instance let's say 40% of the<br>> traffic come from non-members. This would mean there'd need to be a full<br>> listing of members to check?<br>><br>
> Hmmm an interesting thought did just come across my mind. Let me hear your<br>> thoughts:<br>><br>> Cache 10,000 most recently logged-in members<br>> Bloom filter on the entire list<br>> This way, you can test for negatives (bloom filter), and if there's a
<br>> positive, check the 10,000 most recently logged-in users. If that still<br>> yields nothing, then do a database query. In effect, only a small minority<br>> of checks would require a trip to the DB.<br>>
<br>><br>><br>><br>><br>> ><br>> ><br>> ><br>> > On Nov 4, 2007, at 23:32, J A wrote:<br>> ><br>> > > I have a fairly large members list that I want to keep in memcache.<br>
> > > What I do with this list is query it against particular user IDs to<br>> > > see if they are a member of that list or not. If they are they get<br>> > > certain priviledges.<br>> > >
<br>> > > The problem is, this list has gotten to the point of saturating the<br>> > > PHP's memory when fetching the MySQL query the first time.<br>> > ><br>> > > Is there a way to do this more effectively, for instance,
<br>> > > partitioning the list into separate smaller lists, grouped by time<br>> > > of login? I'm thinking of this, as users who have logged in in the<br>> > > past 3 months are more likely to be in the list anyway.
<br>> ><br>> ><br>> > It'd be easier to not think of it as a list if you're just testing<br>> > for membership. All you want to know is if a particular object is an<br>> > element of a particular set. You could do this by key convention if
<br>> > you batch populate the records.<br>> ><br>> > However, memcached semantics don't quite give you what you want.<br>> > Depending on whether you can reasonably get a configuration to do what
<br>> > you want, it might be easier to think of memcached as a bloom filter<br>> > than as a set in this case. That is, if you negatively cache things<br>> > that *aren't* part of your list, then the presence of a key will tell
<br>> > you for certain that a particular key is not a member, but the absence<br>> > of a key would mean that you don't know (or perhaps memcached *did*<br>> > know, but had since forgotten).<br>> >
<br>> > You could, of course, record the status either way so as to tell<br>> the<br>> > difference between not knowing and knowing whether it's a member or<br>> > not. This is probably best suited to your needs.
<br>> ><br>> > You could optionally preload objects that are likely to be used if<br>> > you think the natural population wouldn't do it effectively (you can<br>> > measure this with stats).
<br>> ><br>> > --<br>> > Dustin Sallings<br>> ><br>> ><br>> ><br>> ><br>><br>><br></div></div></blockquote></div><br>