Friday, March 08, 2013

Colocating data in Coherence and NoSQL

Key/Value pair based distributed systems that partition data based on some hashing of keys pose a challenge of unnecessarily incurring additional network cost in storing and reading of strongly associated data with additional challenges of atomicity of operations across this data set. Fortunately both of Oracle's best of breed big distributed data source Coherence and NoSQL support co-locating associated entries and provide a rich set of APIs to create atomic operations. In both the systems association is dictated by the Map' keys, only differently.

Association in Coherence

Two entries will be co-located in the same partition if
  1. The key of the associated entry implements a Coherence KeyAssociation interface (Or define a KeyAssociator and configure it in the cache config).
  2. If these two entries are in two different NamedCaches then they must share the same cache service.
  3. When the cache service calculates the partition Id for entries, it uses the associated key if the cache key implements the KeyAssociation interface. 
  4. Coherence provides a set of APIs typically called TransactionLite framework to access associated entries from the BackingMap as a single unit of transaction. You can refer to the blog for an example.
  5. Because Coherence keys can be complex objects it is much more flexible in terms of storing complex data structure.

Association in NoSQL

NoSQL has much simpler design but it is limited in a sense that it requires keys to be Strings.
  1. NoSQL keys are structured as multiple major and minor keys.
  2. If we are storing People in NoSQL and want everyone with the same last name to be in the same partition then, keys can be structured as:
    1. /familylastname/-/myfirstname
    2. /familylastname/-/wifesfirstname
    3. /familylastname/-/kidsfirstname
  3. Though these are three Map entries as they all share the same major key /familylastname will all end up in the same partition.
  4. NoSQL then provides APIs like multiDelete() and multiGet() to delete or retrieve multiple keys as one atomic operation.
Working with Coherence and NoSQL together  

Oracle's Elastic Charging Engine is one such product that uses Coherence and NoSQL together with Coherence as a rating engine and NoSQL as its fast persistence store for the processed RatedEvents. That said aside, such integration is fast becoming very common and following pieces of codes demonstrate some of its use cases.

Code Samples

Lets start with an interface:
public interface NoSQLKeyAssociation extends KeyAssociation {
    String getRootKey();
}
public abstract class PartitionAwareNoSQLCacheStore extends AbstractCacheStore
 {
    private KVStore kvStore;

    public PartitionAwareNoSQLCacheStore(String kvServers) {
        KVStoreConfig kvStoreConfig = new KVStoreConfig("addressStore", kvServers);
        kvStoreConfig.setConsistency(Consistency.NONE_REQUIRED);
        kvStore = KVStoreFactory.getStore(kvStoreConfig);
    }

    @Override
    public void store(Object key, Object value) {
        storeAll(Collections.singletonMap(key, value));
    }

    @Override
    public void storeAll(Map map) {
        Map<NoSQLKeyAssociation, Object> transformedMap = transformer(map);

        Set<Map.Entry<NoSQLKeyAssociation,Object>> set;
        set = transformedMap.entrySet();
        ArrayList majorComponents;
        ArrayList minorComponents;

        for (Map.Entry entry : set) {
            majorComponents = new ArrayList();
            minorComponents = new ArrayList();
            NoSQLKeyAssociation associationKey = entry.getKey();
            majorComponents.add((String)associationKey.getAssociatedKey());
            minorComponents.add(associationKey.getRootKey());

            Key key = Key.createKey(majorComponents, minorComponents);
            byte [] bytes = ExternalizableHelper.toByteArray(entry.getValue())
            kvStore.put(key, Value.createValue(bytes));
        }

    }

    public abstract Map transformer(Map entries);
    
}
The method transformer( ) allows for the Cache Store's concrete class to either transform the Map entries in some form or return the Map as is if the keys are already of type NoSQLKeyAssociation. If not this method gives an opportunity to convert the key to a type of NoSQLKeyAssociation, for one.

In example where we need to store a Customer and all his Addresses a key similar to the following can be used, both to store these values in Coherence' NamedCache (CUSTOMER and ADDRESS caches) and also to NoSQL and in each system co-located together.

public class CustomerAddressAssociationKey implements NoSQLKeyAssociation {
    String customerId;
    String addressType;

    public CustomerAddressAssociationKey(String customerId, String addressType) {
        this.customerId = customerId;
        this.addressType = addressType;
    }

    @Override
    public Object getAssociatedKey() {
        return customerId;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;

        CustomerAddressAssociationKey that = (CustomerAddressAssociationKey) o;

        if (!addressType.equals(that.addressType)) return false;
        if (!customerId.equals(that.customerId)) return false;

        return true;
    }

    @Override
    public int hashCode() {
        int result = customerId.hashCode();
        result = 31 * result + addressType.hashCode();
        return result;
    }

    @Override
    public String getRootKey() {
        return addressType;
    }

}

If two addresses HomeAddress (addressType = HOME) and WorkAddress (addressType = WORK) are created for a Customer with customerId = 1 then they will be stored as following:

In Coherence, Customer and his two addresses will be co-located as:
CUSTOMER Cache:
  • <1, Customer>
ADDRESS Cache:
  • <<customeraddressassociationkey<HOME>, HomeAddress>
  • <<customeraddressassociationkey<WORK>, WorkAddress>
In NoSQL Customer's both address co-located as:
/1/-/HOME, byte[] for HomeAddress
/1/-/WORK, byte[] for WorkAddress

Enjoy!



No comments: