Friday, August 21, 2009

Pushing changed data to Coherence

Problem Statement - How to capture data changed in an external data source and invalidate the Cache?
I attended a customer call where we discussed this same problem and I think I could be more explanatory in a blog than on a call. Before we consider any solution one thing that must be known is that Coherence is a data source. Not a relational data source, not like a directory interface but it is a data source - of a different kind. So the problem now becomes more generic - how do we synchronize two discrete data sources? The solution will revolve around similar solutions as if you synchronize an LDAP with a relational database. Have you done that?
Following are the four architectural line of attacks that can be taken. Depending on how stale-data can an application deal with, a few or none of these solutions may or may not work. So make your own judicious decision.

  • Invalidate the data with in.
  • Source of data change propagates the change.
  • Let an external mechanism do it.
  • Whoever changed the database should also change the Cache.
Invalidating the Cache Entries With In
  • [Time to live] Coherence Cache entries have a time to live attribute. TTL defines how long an Entry should live in the cache. Based on data access frequency and expected cache hits an appropriate expiration time can be set. If over an hour period 90% of cache hits are expected to take place in first 15 mins of entry being put and are less frequently later on then a ttl of 15mins gives you a good invalidation parameter. A typical use case is load the data, process it and invalidate it as soon as it is done. This can be helpful if the same entry is being accessed multiple times during the processing.
  • [Refresh Ahead Factor] This is based on a wonderful analogy of serving fries in McDonald's. If we ask for fries that are about to be over then we get the last pieces but it triggers an asynchronous load of fries from an oven. An appropriate refresh-ahead-factor can be set in the cache configuration that will trigger an asynchronous Cache Load operation (using a CacheLoader component - must have a read-through pattern) if data are accessed after the second half of this factor of expiration. So if data changed in an external data source it will be refreshed in the Cache and next access gets the latest. The refresh ahead assumes a continuous stream of data access while lesser database (or any external data sources) updates behind the scene.
Propagation by data source where changed occurred
  • If you are already on Oracle 11g a DCN mechanism can be used where applications can directly register with the database for change-events. After receiving the change-events the application can then propagate the changes to Coherence.
  • Responsibility of change event propagation lies with the owner where change occurred.
An External Agent
  • Oracle Data Integrator has a Changed Data Capture feature that can be used to push changes from a data base to Coherence.
  • Or a simple DB Adapter. An external application polls for data changes at a regular frequency, captures the changed data set and propagates the changes to Coherence cache.
  • Oracle's BPEL PM has an inbuilt DBAdapter that can propagate the change to Coherence using an embedded Java Activity.
  • Simple and could be light weight. Polling could be heavy and needs to be considered when provisioning database loads.
After all, you did it
  • Coherence supports three heterogenous platform - Java, C++ and .NET. It is very likely that application that changed the data in an external data source is running on one of these three platforms. Application that changed data in the database can be extended to propagate the same changes to Coherence as well. Transactional successes should be considered to avoid data being propagated in Cache but didn't succeed in database.
  • ORM like Toplink has SessionEventListener that can retrieve changed data set upon a database commit and this SessionEventListener can propagate the changes to Coherence cache(s).

5 comments:

Dave Felcey said...

Hi Ashih,

Another way of pushing changes from an Oracle database to Coherence - pre-11g - is via AQ. I have a complete example on my blog http://blogs.oracle.com/felcey. BTW your piece on AOP and Coherence is very interesting.

Dave

lnoto said...

... Oracle Data Integrator has a Changed Data Capture feature that can be used to push changes from a data base to Coherence ...

ODI -CDC-> Coherence
Has any of you ever tried to do it?

D said...

If we use Coherence with its default setting with local scheme (not changes to high units and expiration delay) does it by default do any expiration or eviction of data?

We are seeing a behavior where in whatever changes we do to DB are auto reflected in front end but we never set or change the default coherence settings.

D said...

Does coherence have any default data refresh or expiration setting? Based on what I read, unless you set high units or expiration delay it should never expire any data.

We are using the default setting but we see our DB changes somehow get reflected in front end even though we have done nothing to update the cache when DB changes or expire cache. We use annotations for caching certain method calls response data

Ashish said...

Hi D - you may have refresh-ahead configured. Quite often there exist external processes that keep DB changes in sync with Coherence cache.