NetFusion Reactor – what you see is not what you get

I am a big fan of Freakonomics books by Dubner and Levitt. Get one if you haven’t read it yet. It is worth every penny you pay and every minute of your time spent reading it. Awesome! All Freakonomics stories drill down to a couple of ideas, to something like ‘a reality differs from initial perception’ and ‘don’t trust your eyes and instincts blindly’.

What you see is not what you get

A short introduction: a client of mine runs a memory-hungry legacy application at the end of its lifecycle on a cluster of Coldfusion servers behind a JK loadbalancer. Each server hosts a couple of Coldfusion 9.0 instances running on Tomcat 6. All instances are monitored using a standard NetFusion Reactor software. Its Enterprise dashboard is a mighty sight displaying a long flickering row of red-orange-blue cubes, each representing a single Coldfusion instance. Something like that:

Each cube is decorated with four meters for (M)emory, (C)PU, (R)equest time and (D)atabase response time. Cubes change their color from blue to orange and then to red when any of these four parameters hit a certain number. Let’s say, that our monitoring JVM memory thresholds are set to 85% and 95%. That means that an instance cube will turn orange when the JVM memory hits 85% and red when it climbs to 95%. At that very moment the garbage collector kicks in resulting in memory drop to something comfortable like 50-60%. Our cube will turn blue again. Yay! What a great idea. It’s all visual and very very fancy. Enjoy the ride.

And then the unexpected happens. You decide to double the cluster capacity by adding additional instances. You add these instances to your Reactor’s Enterprise dashboard. You are proud of seing another set of cubes flickering on your screen. And then you notice that JVM memory levels on all instances start creeping up to some unusually high numbers. Your dashboard suddenly turns orange. You can’t find any errors in the log. You check all possible theories and even take new servers down to exclude any possible interference. Nothing! Everything still works but the feeling of uneasiness is rather disquieting. What could have gone wrong? Any ideas?

Suddenly, you see some unexpected behaviour: I saw all instance cubes turn blue when the traffic peaked and slowly degrade to orange when the traffic eased. Bingo! There was nothing wrong with the cluster. There was nothing wrong with the JVM memory. We should have expected these results.

Let me try to explain what happens:

By adding new instances we spread the load over a larger number of servers, decreasing a load on each server. That results in memory growth curve being far less steep. In the old situation JVM memory footprint grew fast, passing the warning-coloured stages quickly. Then the garbage collector kicked in and everything turned blue, starting a new cycle.

In the new situation, the memory growth curve became less steep, the garbage collector is called less often and the memory levels can easily stay high for longer periods of time, turning our cubes orange and making us feel queasy. What you see is thus is not what you get. Lessons are learned.