Problem Statement - How to capture data changed in an external data source and invalidate the Cache?
I attended a customer call where we discussed this same problem and I think I could be more explanatory in a blog than on a call. Before we consider any solution one thing that must be known is that Coherence is a data source. Not a relational data source, not like a directory interface but it is a data source - of a different kind. So the problem now becomes more generic - how do we synchronize two discrete data sources? The solution will revolve around similar solutions as if you synchronize an LDAP with a relational database. Have you done that?
Following are the four architectural line of attacks that can be taken. Depending on how stale-data can an application deal with, a few or none of these solutions may or may not work. So make your own judicious decision.
- Invalidate the data with in.
- Source of data change propagates the change.
- Let an external mechanism do it.
- Whoever changed the database should also change the Cache.
- [Time to live] Coherence Cache entries have a time to live attribute. TTL defines how long an Entry should live in the cache. Based on data access frequency and expected cache hits an appropriate expiration time can be set. If over an hour period 90% of cache hits are expected to take place in first 15 mins of entry being put and are less frequently later on then a ttl of 15mins gives you a good invalidation parameter. A typical use case is load the data, process it and invalidate it as soon as it is done. This can be helpful if the same entry is being accessed multiple times during the processing.
- [Refresh Ahead Factor] This is based on a wonderful analogy of serving fries in McDonald's. If we ask for fries that are about to be over then we get the last pieces but it triggers an asynchronous load of fries from an oven. An appropriate refresh-ahead-factor can be set in the cache configuration that will trigger an asynchronous Cache Load operation (using a CacheLoader component - must have a read-through pattern) if data are accessed after the second half of this factor of expiration. So if data changed in an external data source it will be refreshed in the Cache and next access gets the latest. The refresh ahead assumes a continuous stream of data access while lesser database (or any external data sources) updates behind the scene.
- If you are already on Oracle 11g a DCN mechanism can be used where applications can directly register with the database for change-events. After receiving the change-events the application can then propagate the changes to Coherence.
- Responsibility of change event propagation lies with the owner where change occurred.
- Oracle Data Integrator has a Changed Data Capture feature that can be used to push changes from a data base to Coherence.
- Or a simple DB Adapter. An external application polls for data changes at a regular frequency, captures the changed data set and propagates the changes to Coherence cache.
- Oracle's BPEL PM has an inbuilt DBAdapter that can propagate the change to Coherence using an embedded Java Activity.
- Simple and could be light weight. Polling could be heavy and needs to be considered when provisioning database loads.
- Coherence supports three heterogenous platform - Java, C++ and .NET. It is very likely that application that changed the data in an external data source is running on one of these three platforms. Application that changed data in the database can be extended to propagate the same changes to Coherence as well. Transactional successes should be considered to avoid data being propagated in Cache but didn't succeed in database.
- ORM like Toplink has SessionEventListener that can retrieve changed data set upon a database commit and this SessionEventListener can propagate the changes to Coherence cache(s).