Over the last wee while I’ve been testing JBoss apps virtualised under RHEV, and this week I had a bizarre experience: my team-mates and I had been puzzling over the high standard deviations (and hence eccentric behaviour) of our web app, which wasn’t even using all the available JVM heap or virtual processors assigned to it. While I was off in meetings, the rest of the team doubled the number of vCPUs, and the SD improved significantly, but more importantly, the utilisation of each vCPU improved. This was odd, and, on the face of it, inexplicable. If you’re only half-to-three-quarters utilising 4 vCPUs, why would you get better utilisation when you doubled that number? And if you weren’t CPU-bound before, why would increasing the amount of virtual processors improve matters?
We threw around some hypothesis and worked up some lines of investigation, which boiled down to “more stats from the hypervisors, please”, when I had a thought.
These symptoms tickled my memory banks: a few weeks ago I’d been reading about bizarre misbehaviour of large MySQL instances on modern x86 NUMA architectures, when the processes got to the point that they were so large that they grew larger than the bank of memory with affinity for a given processor; there’s some write-ups here, but it boils down to this: if you don’t tell the kernel that you want it to ignore its normal best-guess behaviour about the penalties involved in the NUMA topologies, you’ll see weird performance problems. So, for shits and giggles, I suggested we shrink the JVM and guest under the size of a single bank of memory: almost halved the heap and guest sizes, and, at the same time, took the number of vCPUs back to 4. Result? It ran faster, and with a substantially better standard deviation of results.
(Of course, to confirm this theory the real test will be what happens if we use numactl hints to force the KVM process to behave as we want.)
Less, it would appear, is more. And you need to understand what lies beneath your virtual layer.