Sunday, May 24, 2009

Yes Deletion is not write behind

There are a few critical words that Coherence or any similar distributed mechanisms are built around. Coherency, Consistency and Availability are a few. In a room if you ask a question to anyone about something and irrespective of who you ask you get the same reply we have a coherent system. They don't have to speak the truth but as long as they all tell the same answer it is a coherent system. Consistency is across the boundaries of a single data source. If the answer against a Coherence cluster is same as what say a relational database tells at a given time then these two systems are consistent systems. Availability is when applications have an ability to find the data irrespective of failures and unavailability. Coherence cache like any database supports Entry Insert, Update and Delete operations. While the Cache itself is coherent, Entry Insertion and Updates are about data availability while Delete is a peculiar use case and is about data consistency. Coherence write-behind is a mechanism that allows asynchronous/delayed persistence of changes in the Cache to an external data source like a Database. While inserts and updates can be delayed Entry deletion is not write-behind. Why?
Applications quite often confuse data eviction with data deletion. A poor analogy of eviction is like if someone moves from Phoenix, AZ to LA. If someone lives in Arizona and decides to leave it is similar to data eviction. Application looking at AZ data would not find him but there could be a way to find him if need be (barring a fact that its not the same object that lives at two places in real life but it does in two systems). Deletion is like that person dies. You cannot die in Arizona and live in Los Angeles. If thats the case probably LAPD has to get involved ;). Data deletion has to be conservative and needs to make sure it is consistent across all the data sources. This is why Coherence write-behind mechanism makes Entry deletion a synchronous process even if there is a delay set. The good thing is API contract is just to an extent of calling erase() method of the configured CacheStore. Even though not recommended but If your application mandates that all CRUD operations are of equal priority can be done by using some Timer service inside the erase () erase process still can offload the deletion steps to a separate thread that deletes data after a write-behind delay.

2 comments:

Robert Varga said...

Ashish,

I did not really get the last part of this post related to calling erase() from a timer service.

If your call to erase() is delayed at all (e.g. by delegating to a timer service), then a subsequent get() to the same key may come earlier than the scheduled erase() and your cache will be inconsistent, because it will contain a non-dirty entry for which the database soon will not contain anything.

Best regards,

Robert

Ashish said...

And this is precisely why it is not recommended.