Saturday, October 31, 2009

Why measuring exact size in memory could be a futile exercise?

Coherence being an In-memory data grid, it is important to provision the hardware right. Many factors play different roles - Total RAM on the box, avoiding paging, providing linear scalability without stepping on Out of Memory errors and so on. Now the problem is how to measure how many additional nodes (Cache Servers) one would need and in result how many new boxes when we have to scale out? Also, if Indexes are created how to measure the additional space required and how to do it right?
There are two ways to measure things - First, like measuring Gold and Second like measuring Onions. Onions are always approximate. Coherence data sizing is like measuring Onions and its not like you can not measure it like Gold - accurate and precise but in most cases it is not needed. Why? Because the dynamic/auto provisioning nature of the cluster and cheaper memory by the day. It is much easier to approximate the size and add new nodes or boxes in the cluster than to be a Mathematician and calculate the size in bytes. If you are an Operations person you need quick and almost correct formulas. If you are a Coherence enthusiast you might already know it - On a 32-bit machine 1.2GB of RAM is needed to run a JVM with 1GB heap. Off of 1GB heap having only 375MB of space for primary data storage in distributed data scheme with one backup count. Keeping 30% of scratch space left per JVM to keep the GC profile in check and so on. What about Indexes? That's easy too.. Account for 30% overhead for each Index added. So watch for how many Indexes are added as it is easy to cross the data size itself. Are these numbers accurate? Nope. They are not meant to be either. Are they simple? Yes and close to correct. After all when it comes to provisioning a system like Coherence its okay to just measure it like Onions.

No comments: