Thursday, September 13, 2007

Coherence Contd.. Some more things

Oracle Coherence is all about [Fast access to Coherent] data. Having said that lets dive a little deeper. I came across a funny definition - It is the costliest implementation of java.util.Map. Coherence is also a Data Grid. Its a Data Cluster solution. The nodes in a cluster can partition data among themselves and provide access to it. It won't be possible to use any off-the-shelf Map's implementation in particular partitioned cache strategy. Why? Take the fastest data-structure for Map.. Its java.util.HashMap. To access data from it we either need a key or iterate through it. You can not Query the data like SQL does on database tables. Its also an event-less data structure. Wouldn't you like to receive events if the data has been updated or Inserted or removed? HashMap could not do that. There are also concurrency issues in a Clustered environment. Hmm, it so sounds like a database table. Even though logically but actually it is. When we talk about data in a networked environment, we should also talk about latency. One of the strongest features of Java is to move the code/processor to where the data are. Anyone remember Java Applets? Instead of passing data to the processor, the Code can be downloaded to where the data are. What does it buy us? Ask the System Administrators who manage hundreds of nodes spread across slow and unreliable network. The network latency kills applications.
Now lets talk about Coherence' most important APIs. If you had read my earlier Coherence blogs: this and this you already know how to use NamedCache. com.tangosol.net.NamedCache implements java.util.Map. Why? Because its a Cache and you would need functionalities like get and put. But, here is the difference - it also implements other interfaces like:


  • com.tangosol.util.QueryMap

  • com.tangosol.util.ConcurrentMap

  • com.tangosol.util.ObservableMap and,

  • com.tangosol.util.InvocableMap.


Rang a bell yet? Aren't these features required in a Clustered data grid? Thats what makes Coherence' NamedCache the costliest implementation of java.util.Map. Costly in a sense for features you would like to pay for.
I tend to blabber much before the real thing. So You already know how to use the Cache and Query the cache. If you have followed my sample codes before you know how to populate data in the cache. We have Objects in the Cache now, and we need to use it isn't it? What do you use data for? Just to access it? Yes but not necessarily always. You want to do computation on it. You may want to do some mathematical operations on the Objects or its' attributes. Lets take an example of a Student database. You may wanna know whats the average age or height or weight of your students. And why not? You may want to compete against the strongest football team in the league and you have to know these stats before you sign up. To do these computations, Coherence provides an API called Aggregator. The concept is simple, provide a set or a subset of data to your Aggregator to do computation on. Oracle Coherence provides a ton of off-the-shelf implementations for AbstractAggregator. The following sample code uses one of them - A AbstractDoubleAggregator.

Lets say, your Serializable Object is the following:

#SClass - Student's class
public class SClass implements ExternalizableLite {
private int height;
private String name;

public SClass() {
}

public void setName(String a) {
this.name = a;
}

public String getName() {
return name;
}

public String toString() {
return name;
}

public void readExternal(DataInput dataInput) throws IOException {
height = ExternalizableHelper.readInt(dataInput);
name = ExternalizableHelper.readSafeUTF(dataInput);
}

public void writeExternal(DataOutput dataOutput) throws IOException {
ExternalizableHelper.writeInt(dataOutput, height);
ExternalizableHelper.writeSafeUTF(dataOutput, name);
}

public void setHeight(int height) {
this.height = height;
}

public int getHeight() {
return height;
}
}

So, if you want to take a Sum of the heights of your class for the students with name starting letter [A]... What do you do? You know it. You create a Filter...

NamedCache nCache = CacheFactory.getCache ("Students");
// -- Populate the data
...
Filter filter = new LikeFilter ("getName", "A%");
Set keySet = nCache.keySet (filter);
// -- Now what? The keySet has a Set of Students with names starting with A
Double result = (Double) nCache.aggregate (keySet, new CacheAggregator ());

Now, what is this CacheAggregator? CacheAggregator is the implementation of
AbstractDoubleAggregator and here is its code:

#CacheAggregator:
import ....
public class CacheAggregator extends AbstractDoubleAggregator {
public CacheAggregator() {
super(IdentityExtractor.INSTANCE);
}

protected void init(boolean b) {
super.init(b);
m_dflResult = 0.0;
}

protected void process(Object object, boolean b) {
if (b) {
// -- What the heck is this? Scroll down for the explanation
m_dflResult += ((Double)object).doubleValue();

} else {
SClass sC = (SClass)object;
// -- Keep on adding the Heights
m_dflResult += sC.getHeight ();
}
}

protected Object finalizeResult(boolean b) {
return new Double(m_dflResult);
}
}

So, what does it do? When the aggregate () method is invoked on a Cache, the process () method is called for each Cache Entry in the keySet. The example is a little tricky. Why? Usually the input and output types for an Aggregator are the same. You provide a Student and you get a Student. In this case, you are providing a Student but at the end you are expecting is its' height. Input is SClass but output is a Double.

And here you go.. By using this example you can build more complex Computations on on your data Objects.

No comments: