Stalwarts of Tech – An Interview with Aleksey Shipilёv – Oracle’s Java Performance Geek

About this series

This is the next interview in a regular series of interviews with stalwarts of the technology industry. We wanted to highlight many of the unsung heroes of the technology industry, the people and projects that have made huge impacts in our lives as developers and technologists!

Aleksey Shipilёv – Oracle’s Java Performance Geek

We’re really pleased to have Aleksey Shipilёv from Oracle to continue this series as he has been a major force in bringing solid, scientific rigour to the Java performance space, in particular the Black Box that is the JVM. In particular, Aleksey is behind JMH, JCStress and Java Object Layout (JOL), three popular Java tools which many of us in the performance space use to prove (or disprove!) performance characteristics of low level libraries and their interactions with the JVM. So without further ado, let’s find out about Aleksey’s contributions, his history with performance tuning, motivations behind his three tools JMH, JCStress and JOL some deep dive details about the JVM!

1. Would you like to introduce yourself Aleksey?

I am the performance guy by trade. Started at Intel working on Apache Harmony performance, then shifted to Sun doing OpenJDK performance and of course transitioned to Oracle during the take over. My primary passion these days is exploring interesting areas and dark corners of Java and the JVM, and leaving the breadcrumb trail of tools after I’m done there. Benchmarking, concurrency, and other hardcore things are my primary focuses these days.

2. So you have been called the Java Benchmarking Expert, what does writing a good benchmark entail?

First, a good benchmark must come with an understanding of its goals.

If you do a benchmark to prove a point in a holy war and/or marketing game, then the best benchmark is the one that gets the data you like! SPEC is arguably successful in rigorous benchmarking by developing through the consensus among fierce competitors. Other benchmarks, short of clear-cut ways to understand the best configuration for all specimens, are very biased by construction, which greatly undermines their usefulness. This is why I am very suspicious about cross-language/cross-implementation benchmarks done by the developers of a particular implementation participating in comparison.

Excellent benchmarks, in my view, are those highlighting something about the Universe in which we are living. The exact numbers don’t usually matter there, it is important to see through them and understand why those numbers are arranged in that particular way. We can call it constructing the performance model of a system, and for me it means understanding deeply the processes which cause your system to behave those ways. Benchmarks, in this parlance, are the tools which isolate and quantify the behavior in a particular lab environment.

3. Is it true that most Java developers do not need to performance tune or heavily optimise specific libraries?

Yes. Most of the time developers should really concentrate on writing the good and understandable code, and then focus on performance, if required. I think end-to-end performance tests, while useful to control the high-level behavior of the system, get you too few of actual understanding how things work.

In most of the cases, you do the end-to-end performance testing, you fire up profilers, identify the bottlenecks, fix them, get an instant return on investment. Most of the users are perfectly happy with that cycle, and it helps to bring the good enough performance. This is what Kirk Pepperdine and I were showing at Devoxx 2012.

But when people start to crave for more, they suddenly realize the law of diminishing returns kicks in, and you can only do so much to help the system without getting a clear understanding of it. When you are cornered there, there is no other way from it rather than beginning to study the system from inside out, gaining the insight how to do non-obvious changes which will improve performance.

Right there, you will need something that can do the narrow-scoped experiments to understand how small parts work, how they interact together, what performance effects are caused by libraries, runtime, OS, or the hardware itself. You can, of course, do the same with the end-to-end performance test, but without the magnification of narrow benchmarks you will waste lots of valuable time.

It is the similar situation library and runtime developers are stuck with: without being able to predict how their customers will use their products, they have to understand it deeply for themselves in order to convey best practices, document dangerous areas, nudge the case studies in the right directions, etc. With a big turnaround time to understand the customers’ performance issues it is most advisable to avoid them in the first place.

4. Does your library, Java Microbenchmarking Harness (JMH), have many external users?

To my surprise, in addition to many internal Oracle users, there are lots of external users. Most of our early adopters are from OpenJDK community. I frequently Google the usages of JMH APIs in the wild to see how people abuse it (really, it tells where we should do a better job documenting or forbidding things) — and every time that search yields new results.

Another thing that surprises me is how eagerly people catch up with our findings. JMH Samples seem to be a very good reference to send people for gaining a better understanding how benchmarking works. Instead of hand-waving and mile-long hypothetical arguments, we can now direct people to runnable examples to see the effects with their eyes.

It is a bit sad, though, when people believe JMH is more than a tool, and magically solves all the problems. As one of my friends sarcastically said once, This benchmark uses JMH, therefore, it is correct. We try to put that fire out before it spreads, and my worst fear is people blindly believing stuff to work — precisely the attitude you should not have in empirical studies. JMH certainly helps to alleviate the obvious pitfalls to let users allocate their time for understanding the non-obvious ones.

5. Perhaps you would like to explain how the “Java Concurrency Torture Tests” (JCStress) work?

That’s actually a funny story. I vividly remember the post on concurrency-interest about a funny volatile bug in C1, which disturbed me greatly.

This unease feeling of oh, this bug is probably (silently) affecting lots of users is very frequent among OpenJDK developers, because the scale and exposure of the project is very large. When you are facing the issue within the basic guarantees of the language, that feeling quadruples.

jcstress was born to tame my personal unease with that bug. We do actually have quite of few targeted concurrency tests, and there are lots of tests for java.util.concurrent.* which implicitly test the concurrency guarantees. Those tests, however, are probabilistic, and so to catch more unlucky outcomes you need to run them fast. This is when my performance brain switched into overdrive and started to think about the test infrastructure which can run concurrency tests fast and reliably.

In hindsight, the approach is obvious and is already used by Peter Sewell et. al while studying hardware memory models in their Litmus tests. You have a shared state, usually encapsulated in some object, and you run the threads doing things over that shared state, sometimes exhibiting races. If each thread is able to record what it saw about the particular state, we can argue about their global behavior. For example, this is a test which tests if volatile increment is atomic (we know it is not):

@JCStressTest
@State
public class VolatileIncrementAtomicityTest {
    volatile int x;

    @Actor
    public void actor1(IntResult2 r) {
        r.r1 = ++x;
    }

    @Actor
    public void actor2(IntResult2 r) {
        r.r2 = ++x;
    }
}

If we run this test with two threads, each executing its own @Actor method on the many instances of @State class, and collect (r1, r2) pairs, then we can argue that observing (r1, r2) = (1, 1) signifies the case when both threads have pre-incremented the same volatile int, and both got the 1, which implies volatile increment is not atomic.

Surprisingly, many tests can be folded in this pattern. We have discovered quite of few nasty bugs in libraries and VM, but after those were fixed, new serious bugs had not shown up yet, even though we put the stick here, there, and everywhere. These days, I use jcstress experience as the horror story for those keeping the updates back: legacy VMs lack important bugfixes, and those bugs are not ephemeral at all.

We now know quite a few people using jcstress to understand JMM and concurrency better, since it helps to easily construct the toy test cases to play with. (e.g. JMM Under the hood)

6. What kind of scenarios can your Java Object Layout (JOL) tool help us understand?

Most of users, I think, use JOL to study the actual field layout in their environments, which helps to answers, say, how many bytes does the instance of this particular class take up in memory. Sure, there is a set of rules which enables you to do the math yourself, e.g. figure out header sizes, estimate field sizes, alignments, hierarchical layouts, etc., but it is simpler to fire up the tool which will introspect it for you.

JOL is funny as well: I hacked it together to see if my @Contended changes really affect VM field layout strategy. Only then we realized it can be used to introspect object and heap layouts rigorously, without relying on what heap dumps are missing out. I think we have collected quite a few of interesting uses in JOL Samples.

My largest use of JOL, barring @Contended, was to research whether redoing the field layout strategy helps real applications.

7. Should Java developers care about how objects are laid out in memory?

Well, no! I really wish the perfect world where we can all just write the code in high-level languages without any remorse about what it takes to support them. Alas, we are not in the perfect world, and in many cases we need to keep in mind what are the costs of those abstractions. When you start to optimize for memory, you have to know what are the actual sizes of the objects you have, how they are located in memory, are there duplicates, etc.

There are also corner cases where field and object layout matters for locality: whether you make predictable walks through the memory, or whether you induce false sharing when accessing the fields, etc. But hopefully these are very specific needs most programmers do not need.

8. Is there anything else you would like to mention? Do you have a blog or twitter account?

Yes, it’s @shipilev and http://shipilev.net.

=============

Once more we’d like to thank Aleksey for his detailed and thoughtful answers and encourage all of you to experiment with JMH, JCStress and JOL to increase your understanding of how parts of the JVM really work!

Cheers,
Martijn (CEO) and the jClarity Team!

Say goodbye to Java performance problems!

No more memory leaks and application pauses!

Recent Posts