Few queries on atomicity of requests

Thu Jun 19 02:02:14 UTC 2008

For some approaches on how to avoid the "dog pile effect" on your
database, take a look at:

http://highscalability.com/strategy-break-memcache-dog-pile

Ryan

On Wed, Jun 18, 2008 at 8:43 PM, Dustin Sallings <dustin at spy.net> wrote:
>
> On Jun 18, 2008, at 14:29, Joyesh Mishra wrote:
>
> 1 function get_foo (int userid) {
> 2    result = memcached_fetch("userrow:" + userid);
> 3    if (!result) {
> 4        result = db_select("SELECT * FROM users WHERE userid = ?", userid);
> 5        memcached_add("userrow:" + userid,  result);
> 6    }
> 7    return result;
> 8}
>
> 9 function update_foo(int userid, string dbUpdateString) {
> 10    result = db_execute(dbUpdateString);
> 11    if (result) {
> 12        data =
>  createUserDataFromDBString(dbUpdateString);
> 13        memcached_set("userrow:" + userid, data);
> 14    }
> 15}
>
> *******
>
> Imagine a table now getting queried on 2 columns say userid and username
>
> Q1:
> If we have 100 processes each executing the get_foo function, and lets say
> memcached does not have the key. As there would be a delay between executing
> Line 2 and Line 5,
> there would be atleast dozens of processes querying the db and executing
> Line 5 creating more
> bottleneck on the memcached server - How does it scale then (Imagine a
> million processes now getting triggered)?
> I understand it is the initial load factorbut how do you take this into
> account while starting up the memcached servers?
>
> The bottleneck isn't on the memcache server, it's on your DB.  In that case,
> sounds like you've got a really popular user.  :)
>
> You may have a bit of a thundering herd problem.  If it's too intense, you
> can create (or find) a locking mechanism to prevent the thundering herd from
> thundering too hard.
>
> Q2:
> Now imagine, you have 100 processes again querying the key out of which 50
> execute get_foo() and 50 update_foo().
> And lets say the key is not there on memcached server. Imagine T1 doing a
> select operation
>  followed
> by T2 doing an update. T1 is in Line4 doing the select and *GOING* to add
> the key to cache, while T2
> goes ahead and updates the DB and executes Line 13 (i.e. updates the cache).
> Now if T1 executes Line 5
> it would have stale results (in such a case memcache_add fails basically -
> but is it a sufficient guarantee
> that such a case would never arise?)
>
> I don't know what API  you're using, but memcached's add fails if a value is
> already in the cache for the specified key.
>
> Q3:
> Now we have 2 queries say:
> select * from users where userid = abc;
> select * from users where username = xyz;
>
> Users
> |userid|username|userinfo|
>
> and I want memcached to improve the query performance
>
> I had 2 approaches:
> 1. Cache1: Key=userid Value=User_Object
>    Cache2: Key=username Value=userid
>
> 2. Cache1: Key=userid Value=User_Object
>    Cache2: Key=username Value=User_Object
>
> Do you see potential flaws in any of these approaches? I tried to trace the
> flaws in the first one using
> various db calls, still would ask if you guys
>  have seen it before.
>
> If you're really concerned about stale objects here, you can use CAS.  For
> most of these issues, `get || add' combinations give you a reasonable level
> of atomicity.  Most of the time, however, it really doesn't matter.
>
> I would like to know in detail how memcached server handles queueing of
> these requests and atomicity of requests. If there are any posts/info on it,
> please let me know.
>
> There's no real queue other than connection management threads huddled
> around the storage mutex.  At the point where memcached says you've written,
> it's done.
>
> --
> Dustin Sallings
>