Shenandoah – A new low pause Garbage Collection algorithm for the Java Hotspot JVM

 In Garbage Collection, Java

Over the past 18 months or so, Red Hat’s Roman Kenneke and Christine Flood have been working on a new low pause Garbage Collector for the Hotspot JVM, code-named Shenandoah. It’s aimed at reducing large pause times in large to extremely large heaps (which covers the “uber JVM on Bare Metal” use case).

In 2013 there were several posts by Roman on this new GC implementation:

  1. Initial Announcement
  2. An Overview
  3. Details on its Concurrent and Parallel Marking strategy
  4. It’s use of Brooks Pointers

These posts created some interesting debates on GC related mailing lists (including our very own Friends of jClarity list).

The jClarity team participated in some of the debate and have been watching this new collector with interest, but it wasn’t until the proposal of a formal JDK Enhancement Proposal (JEP 189) and Roman’s recent talk at FOSDEM (we help run the Free Java room there every year) that we really started to to notice!

This blog post attempts to gather up the various threads about Shenandoah and our details where you can get the source code from and how to build it.

Shenandoah in a Nutshell

**This section is paraphrased / cut down from the project’s official explanation**

Shenandoah is a region-based garbage collector with a heap structure similar to G1. It has two Garbage collection phases (marking and evacuation) which are both:

  • Concurrent  – e.g. Allows application/mutator threads to continue performing work
  • Parallel – Several threads can be allocated to perform GC work.

Marking Phase

Live objects are marked, starting from the usual GC roots (thread stacks, etc).

Evacuation Phase

Regions to collect are identified and the GC relocates live objects
from those regions by copying them to new regions. During the next concurrent marking phase all references are updated to point to the evacuated objects, and at the end of this phase the evacuated regions may be reclaimed.

Brooks Forwarding Pointer usage

In order for this phase to be concurrent, both the application/mutator threads and GC threads need to know the correct location of an object. This is accomplished in Shenandoah by the use of a Brooks forwarding pointer. There are a couple of rules to make this all sane:

  1. All reads by the application/mutator threads go through a forwarding pointer.
  2. All writes to objects in regions targeted for evacuation must first
    copy the object and then write to the object in its new location.

Races between writers and GC threads to copy an object are resolved by updating the forwarding pointer using a CAS (Compare And Swap): only one copy can win and the other gets rolled back.

Unknown copy cost?

There is of course a cost involved in rule 2, having to copy the object. In the case of large objects or a high throughput of writes could lead to increased CPU time spent copying as well as ballooning heap usage. Roman and Christine will be running some performance benchmarks on this in due course.

Unknown maintenance cost

There is also a maintenance cost involved in managing all of the forwarding pointers (although pointer swizzles themselves are fairly inexpensive).

Ignores the Weak/Young Generational Hypothesis

Shenandoah doesn’t differentiate between young and old gen concepts. This is an interesting divergence from the accepted theory that most objects die young with a long tail of longer lived objects. It would be interesting to see research on modern Java applications to see whether this divergence is needed.

So now that you’ve caught up with the theory, time to build it!

Build it from source

Building it from source is like building a modern (8, 9) version of OpenJDK today. If you’re completely new to this world, then we highly recommend joining the Adopt OpenJDK programme that we help run!

These instructions currently only work for a 64bit modern Linux O/S. You can get the latest source drop via the Mercurial source control tool, e.g.

hg clone shenandoah
cd shenandoah
chmod u+x
bash ./configure --disable-zip-debug-info --with-debug-level=slowdebug --with-jvm-variants=client

The last line deviates from a std JVM build in that it creates a client only (C1) VM with some extra debug info. Depending on the speed of your machine/VM, this should take about 20-40 minutes. Once you’re done there will be an OpenJDK 8 binary with Shenandoah included.

Running Shenandoah

In order to activate Shenandoah as the GC of choice you’ll need to run your Java program using the newly build Shenandoah Java (build/linux-amd64-debug/bin/java) with the following options:

-XX:-UseCompressedOops -XX:+UseShenandoahGC -client -XX:-UseFastLocking -XX:-UseCRC32Intrinsics -XX:ParallelGCThreads=2 -XX:-UseTLAB

In our next blog post we’re going to run Shenandoah on a couple of well known Java projects (such as Eclipse, PCGen and Jenkins), as we know they are hungry monsters that typically perform a lot of object allocation.

Despite it being early days and Shenandoah needing to prove it’s capabilities with some peer reviewed benchmarks, we’re really excited to see new/old ideas being implemented and we hope that Shenandoah either directly or indirectly improves GC for Hotspot over the coming years.

Get involved!

Want to get involved in Shenandoah? Then we suggest you take a look at Roman’s latest blog post which details all of the resources for the project to date.

Martijn (CEO) and the jClarity Team!

No more memory leaks and application pauses!

Find out why my app is slow and tell me how to fix it!

Recent Posts