Thursday, October 30, 2008

Linux: A (Less) Terrible Choice For Java Continuous Integration

Where we last left our intrepid developer, he was floating in a sea of Bamboo+Perforce misery, blithely assuming that moving from Solaris 10 to Linux would solve all of his problems. Oh, what a blissful world he would live in! What joy he would have no longer having to deal with Solaris swap space reservation woes!

Since then, though, he's gotten access to two (almost) the same machines, one running Solaris 10 x86, and one running RHEL 5.2. And while he's vindicated, he's nowhere near vindicated enough for his liking.

Executive Summary: Runtime.exec() performance under Linux is superior to that of Solaris 10 x86, but nowhere near as superior as it should be.

Inspect the following micro-benchmark (and apologies that I don't have the nifty code viewing tools that other bloggers do):

import java.util.concurrent.*;

public class ForkTest
{
public static void main(String[] args) throws InterruptedException {
int nThreads = Integer.parseInt(args[0]);
int nSlabs = Integer.parseInt(args[1]);
byte[][] bytes = new byte[nSlabs][];
for(int i = 0; i < nSlabs; i++) {
bytes[i] = new byte[80 * 1024 * 1024];
}
ExecutorService executor = Executors.newFixedThreadPool(Integer.parseInt(args[0]));
long start = System.currentTimeMillis();
for(int i = 0; i < 1000; i++) {
executor.execute(new Runnable() {
public void run() {
try { Runtime.getRuntime().exec("/bin/false").waitFor();}
catch (Throwable t) { t.printStackTrace(System.err); }
}
});
}
executor.shutdown();
executor.awaitTermination(10L, TimeUnit.MINUTES);
long end = System.currentTimeMillis();
double secs = ((double)(end - start)) / 1000.0;
System.out.println("" + nThreads + " - Forking 1000 times took " + secs + " secs");
}
}

Essentially what the test is doing is:
  • Creating a fixed size (-Xms and -Xmx set to the same value) heap
  • Allocating some slabs of memory (to fill up the heap) (where I refer to "empty heap" tests, this was set to 0)
  • Creating an ExecutorService with a certain number of threads
  • Running through 1000 tasks, where each task involved running /bin/false in a sub-process and waiting for it to terminate

I felt that this was probably the best way that I could possibly test whether the behavior that I felt was causing Bamboo to perform badly with Perforce repositories would also affect Linux. Turns out I'm half right; Linux will still suck, but suck 50% less.

General Parameters
Both machines were Sun X4100 (non-M2) servers with two dual-core Opterons (one a pair of 275s and one a pair of 285s), and 4GB physical RAM. All tests were done on 1.6.0_10.

Empty Heap Comparison
Here's the first test: Run through everything with no slabs allocated (empty heap) and see how fast we can go. Results in this graph, but the highlight here is that Solaris was very little affected by the size of the heap, but was always slower than Linux, by roughly a factor of 2.


Full Heap Comparison
Next test was to fill up the heaps and then try. Here you can see that the heap size completely determines performance, but Linux is always better (factor of 2 again).


Full/Empty Comparison
Here are just the Linux values plotted out, and it's pretty clear what's going on.


Interesting Observations
Note that the sweet spot here is two threads. No more, no fewer. On Linux, Solaris, empty, full, doesn't matter. You want to fork as fast as you can? Have two threads doing it. Admittedly, these tests were on 2-socket (4 core total) machines, but when I repeated this on Solaris on one of our 8-socket x4600 machines (16 core total), I ended up with the exact same thing: 2 threads was always ideal.

Uninteresting Observations
"Hey, Kirk, you just proved that forking an empty virtual space is faster than a full virtual space! Big whoop! You're such a Java-specific Moran that you forgot all that from your 31337 days!"

Well, no, not really. What I specifically established is that:
  • On neither Linux nor on Solaris are the Sun-provided JVMs using any of the fast-subprocess-spawn operations available to them.
  • This is a really clear win for anyone working on CI systems to nag Sun or the OpenJDK crowd to get changed and fully tested.
  • Linux is still a factor of 2 ahead of Solaris here. I would have hoped a fast-spawn implementation would be a factor of 10, but I'll take a factor of 2 gladly.
  • There is definitely something happening in a fork+exec pattern on Linux which is VM-specific, which means that our suppositions earlier that Linux is going to do optimistic copying aren't panning through to eliminate the costs of a fork with a large amount of allocated memory.


Recommendation
So let's say you have a large, long-running Java server which benefits from having a relatively large heap (like, oh, I dunno, a Continuous Integration server), and you have to shell out constantly because a vendor refuses to support you well (speaking of which, I've actually formally asked Perforce to document the protocol).

If you really want to avoid the whole C++ thing, essentially you should be:
  • Forking to a second JVM instance to run a small Java application.
  • That small Java application should itself shell out to your command-line application (remember: on Linux with a 128m empty heap you can get up to 175 Runtime.exec()/sec, which is not too shabby)
  • Have that small java application just pass stdin/stdout to the parent application.

Yes, this seems completely retarded. I can't believe I'm recommending it. But it would actually work as a consistent approach to the Perforce+Continuous Integration problem.
blog comments powered by Disqus