<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.2900.3059" name=GENERATOR></HEAD>
<BODY>
<DIV><SPAN class=756562212-22052007></SPAN><FONT face=Arial color=#0000ff
size=2><SPAN class=756562212-22052007>Nathan Nobbe,</SPAN></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><<SPAN
class=756562212-22052007> <<FONT face="Times New Roman" color=#000000
size=3>what i dont understand is when the data profile of a particular table
changes, and the variable in the memcache that represents that
</FONT></SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007><FONT face="Times New Roman" color=#000000
size=3><< table becomes stale, how can the variable be updated
efficiently? </FONT></SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT face="Times New Roman"
color=#000000 size=3><SPAN
class=756562212-22052007></SPAN></FONT></FONT></FONT> </DIV>
<DIV><FONT><FONT><FONT face=Arial color=#0000ff size=2><SPAN
class=756562212-22052007>This post is rather lengthy, I will answer your
question directly first, then provide details below, so you don't have to read
the whole thing unless you want to. </SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT><FONT><FONT face=Arial color=#0000ff size=2><SPAN
class=756562212-22052007></SPAN></FONT></FONT></FONT> </DIV>
<DIV><FONT><FONT><FONT face=Arial color=#0000ff size=2><SPAN
class=756562212-22052007>First, let me warn you not to use triggers for this,
they will hammer your performance. Some database systems, like
Oracle, write all changes to log files that can be read outside of running
queries against the database. You can have an object in a separate
thread in a background process which reads the log files. Or,
you can put operating system level events on the log files and implement event
handlers so that your cache managers are notified when the log file changes,
this is how I do it for Oracle. The other option is to poll the database
on some period using queries. Polling will only work if you take steps to
cut down what has to be looked at, and if the polling query only gets the data
which changed and nothing else. For this to be efficient you have to have
a way to first get the primary keys of what changed, then run a second
query to get the changes. The query which gets the primary keys has
to be simple. </SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT><FONT><FONT face=Arial color=#0000ff size=2><SPAN
class=756562212-22052007></SPAN></FONT></FONT></FONT> </DIV>
<DIV><FONT><FONT><FONT face=Arial color=#0000ff size=2><SPAN
class=756562212-22052007>I describe what I think is required to write such an
intelligent polling system below. I call this intelligent caching and
cache forwarding. Please note, that even when using cache forwarding,
there still must exist at the point where the cache touches the database a
process which is examining the database (polling) looking for changes. The
key is to make this polling mechanism very efficient. The cache managers
have to run in separate threads in a background
process.</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007><FONT face=Arial color=#0000ff
size=2></FONT></SPAN></FONT></FONT></FONT> </DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007>There are a couple of concepts that help with managing
caches. The way I do it is I use the cache as a buffering mechanism so
that application code never goes directly to the RDBMS, it always goes to the
cache. I have cache managers which run in a separate thread in a
background process that synch the caches wih the RDBMS and with other
caches. The cache managers add an overhead to the entire system but the
end user does not feel this overhead because the end user is never waiting on
the RDBMS, the end user always goes directly to the
cache.</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007></SPAN></FONT></FONT></FONT> </DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007>Building a caching system which ensures that the data
is only stale within acceptable limits is not a trivial task. It
requires modeling the characteristics of the data at a level that no modeling
tool that I know of supports. You then have to build processors that can
move the data around depending on the characteristics, liveness being the most
important characteristic.</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007></SPAN></FONT></FONT></FONT> </DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007>The characteristics which are important are:
liveness, concurrency, distribution, notification, and mutability.
Describing these characteristics for each data element makes it possible to
build very efficient polling mechanism because the poller will not have to
look at everytihng in the database and will not have to move everything
everywhere. </SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007></SPAN></FONT></FONT></FONT> </DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007>Liveness describes how stale the data can be and still
be acceptable. Liveness is expressed in milliseconds, A value of 0 means
the data has to be realtime. By stale I mean that changes to the cache
have not yet been committed to the underlying database, or changes to the
underlying database have not yet been reflected in this cache. I
call the process of reflecting changes in the database with the cache
synchronization. Synchronization also describes updating one cache
with the changes in another cache. I have seen very few data
elements which had a liveness of 0 (meaning the data had to be
realtime).</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007></SPAN></FONT></FONT></FONT><FONT face=Arial><FONT
color=#0000ff><FONT size=2><SPAN
class=756562212-22052007></SPAN></FONT></FONT></FONT> </DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007>Concurrency describes the likelihood that two or more
users are updating this specific data at the same time. Concurrency
applies to both the table level, the row level, and the column level. I
added the column level after seeing many cases where for a given row only
certain columns were every updated. Concurrency can be further described
by listing the user roles which are likely to be concurrent with each other or
for which users in that role are likely to be concurrant with each other.
Let's say that you have a bank account and you are updating your home address
online. The likelihood that another user is updating your home address is zero
(well, it better be zero or the bank has a big problem). The likelihood that a
bank employee is updating your home address is small, but not zero. And, I
don't have to worry about all bank employees because the bank employees
authorized to make this change will be playing certain roles.
The likelihood that another person who is authorized to use your account
is updating your home address is also small, but not zero. Knowing not
just that concurrency is likely, but also which users are likely to be
performing actions that are concurrent with each other opens up some interesting
possibilities. If I have marked a table as being concurrent and then I
define the roles which are concurrent, I can show the users which have these
roles what all the users in their role are doing in the cases where there are
potential concurrency conflicts (deadlocks). I can also design my
applications so that changes made by one user can be broadcast to another user
without needing to broadcast these changes to everyone. Back to the bank
example, I only have to worry about concurrency if certain bank employees are
online or if a user authorized to use your account is
online.</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007></SPAN></FONT></FONT></FONT> </DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007>Distribution describes the necessity of pushing the
contents of one cache to caches in other tiers in the N-tier architecture.
Distribution also describes the tiers, what caches or persistent stores live on
each tier, and how the caches and persistent stores relate to each
other.</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007></SPAN></FONT></FONT></FONT> </DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007>Notification answers the question, "if something does
change, when will a concurrent user care"? The answer is either while
committing updates to data to the underlying database, or while viewing
data. I have to use an example to explain this. Say you are on a web
page where results of a query are being displayed. This web page is not
realtime because at no point can you claim that what is on the web page is a
100% reflection of what is in the database. Now, let's say that you update
one row of data on the web page. You will care if that same row was
updated previously by another user when commiting your changes. Given
two users A and B, if A runs a query, then B comes along and updates one
of the rows A is looking at and commits the changes, then A updates that same
row, A will care about B's update at the time that A commits the A's
changes. In this case what I do is display a page which says "...saving
this data will overwrite changes made by B", I then show the original data, the
changes made by B, and the changes made by A and ask A what to do, the options
are - revert to original, keep Bs changes and discard As changes, or keep As
changes and discard Bs changes. In this case A cared about Bs
updates when A was committing updates to A's changes. A database level
transaction does not help here because no programmer in their right mind would
begin a transaction on a query because the user MIGHT do an update at some
undetermined time in the future. You would be surprised how many
people will say that starting a transaction when A executes the first query is
the answer to this problem.
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007></SPAN></FONT></FONT></FONT> </DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007>The second case, that a user cares about changes to the
underlying data while viewing the data is far more problematic.
Let's say that your job is to ship products that sales people are selling on the
phone. A salesperson creates an order, you ship the order.
In front of you is a screen showing all the orders that need to be
shipped. Let's say further that this product is very time sensitive, not
shipping a placed order is a big deal, let's say it will cost you your
job. In this case you care if the Order table is updated at the time you
are viewing the Order data. So, the screen you are looking at needs to be
refreshed as updates to the underlying Order table occur. In this case the
programmer has no choice but to use a push model. Ouch. I have seen
this case in logistics management systems and order fulfillment systems.
But, I think this is the exception rather than the
rule.</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007></SPAN></FONT></FONT></FONT> </DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007>In all my years of building business applications I
have encountered data which truly had to be live very rarely.
Usually a lag time is acceptable. The length of the lag time depends on
the business processes that the data is used
in.</SPAN></FONT></FONT></FONT></DIV></SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007></SPAN></FONT></FONT></FONT> </DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007>Mutability describes the lifecycle of the data.
Mutability applies to both tables and rows. Some tables once populated
never change. Some tables are only changed by inserts, once inserted the
row never changes. Some tables have both inserts and row updates.
Mutability describes not just what kind of changes are made to tables and rows,
but how often the changes occur and how likely the changes are to occur.
Knowing this information is crucial in building an intelligent caching
system.</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007></SPAN></FONT></FONT></FONT> </DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007>Based on having the characteristics of the underlying
database's tables fully defined it is quite easy to build an intelligent cache
system based on an efficient polling
mechanism. </SPAN></FONT></FONT></FONT></DIV>
<DIV></SPAN></FONT></FONT></FONT><FONT face=Arial><FONT color=#0000ff><FONT
size=2><SPAN
class=756562212-22052007></SPAN></FONT></FONT></FONT> </DIV></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007>I use a cache forwarding mechanism. I always have
a cache at each tier in the N-tier architecture. There is one cache on the
same machine where the RDBMS lives, this cache is always a copy of what is in
the database. A cache manager runs queries in a separate thread in a
background process looking for changes. The queries are always ran against
a special timestamp table I created to find what changed, then if there are any
changes, a second query is run to actually get the changes. This cache
manager also updates the RDBMS with changes pushed to this cache.
</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007></SPAN></FONT></FONT></FONT> </DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007>There are also caches at the app server tier and the
web server tier. For each cache, a cache manager runs in the
background which ensures that all changes to the cache are pushed to all
other caches. </SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007></SPAN></FONT></FONT></FONT> </DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007>I also have a cache in the web browser (not memcached,
just javascript arrays) which is populated with data that is related to the
user. Once a user logs in, and I have the user ID I prepopulate the cache
with all rows related to the user ID. For example, on a banking
application I worked on, once the customer logged in I used the cuystomer's
social security number to select all rows which were related either directly or
indirectly to that social security number. This query returned all the
customer info plus all the account info for that customer. The cache then
fulfills queries without needing to go to the
database.</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007></SPAN></FONT></FONT></FONT> </DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007>Triggers are not a good idea because they will
significantly degrade the performance of your database. You can help
yourself a lot by the design of your database. Two rules that I always
follow in designing a database - 1) all tables have a sequential integer as
a primary key, a number that doesn't mean anything to anyone but me. I
usually call this column tablename_id; and, 2) I always have an intersection
table named timestamp_table_row_user which has five columns - table name,
primary key, username, timestamp, and timestamp_action. Timestamp_action
can be one of update, insert, or delete. I also create an archive
table for this table and periodically sweep everything over a certain age into
the archive table. </SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007></SPAN></FONT></FONT></FONT> </DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007>Some people prefer to add a timestamp column to
existing tables, I don't like this idea because you need multiple timestamp
columns, one for insert, update, delete and you cannot re-create a change
history because the update timestamp column gets overwritten everytime.
Also, I don't like having to modify every table schema with what is housekeeping
meta-data. Using an intersection table and custom drivers makes it
possible to add this approach to existing databases and existing applications
without changing them.</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007></SPAN></FONT></FONT></FONT> </DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007>Using the timestamp table makes it very simple to
determine what has changed in the database and more importantly how it
changed. Instead of using triggers to modify the timestamp table I wrote
my own jdbc drivers which take any SQL statement being executed and create a
second SQL statement which makes an entry into the timestamp table to describe
what the original SQL statement did.</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007></SPAN></FONT></FONT></FONT> </DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007>Using sequential integers for primary keys helps with
identifying new rows added to a table with a simpler mechanism than a
timestamp. All you have to do is obtain max(primarykey), store it,
then compare that to the next max(primarykey). Getting the new rows is
simple, just select * where primarykey > the first max(primarykey), or you
can build a more specific query by using the IN clause, as in SELECT * FROM
tablename WHERE primarykey IN (key1, key2, key3...). Or you can use
BETWEEN and give the stored max(primarykey) + 1 and the latest
max(primarykey).</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007></SPAN></FONT></FONT></FONT> </DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007>Obtaining updated rows requires a
timestamp.</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007></SPAN></FONT></FONT></FONT> </DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007>I hope this helps, feel free to ask any
questions.</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007></SPAN></FONT></FONT></FONT> </DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007>Steve Donelow</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007>President</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007>Object Works, Inc.</SPAN></FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=756562212-22052007></SPAN></FONT></FONT></FONT> </DIV><FONT
face=Arial><FONT color=#0000ff><FONT size=2><SPAN class=756562212-22052007>
<DIV><BR> </SPAN></FONT></FONT></FONT><BR></DIV><FONT SIZE=3><BR>
<BR>
*************************************************************************<BR>
This communication, including attachments, is<BR>
for the exclusive use of addressee and may contain proprietary,<BR>
confidential and/or privileged information. If you are not the intended<BR>
recipient, any use, copying, disclosure, dissemination or distribution is<BR>
strictly prohibited. If you are not the intended recipient, please notify<BR>
the sender immediately by return e-mail, delete this communication and<BR>
destroy all copies.<BR>
*************************************************************************<BR>
</FONT>
</BODY></HTML>