title |
---|
Modern JVM Garbage Collection |
- Some GC theory
- New GCs in OpenJDK HotSpot
- How to choose and tune GC in OpenJDK HotSpot
- GC in containers
- Tracks every object in JVM Heap
- Removes unused objects
⇒ Easy?
- Program throughput
- GC throughput
- Heap overhead
- Pause times
- Pause frequency
- Pause distribution
- Allocation performance
- Compaction
- Concurrency
- Scaling
- Tuning
- Warmup time
- Page release
- Portability
- Compatibility
⇒ Very hard, lots of tradeoffs
References: 1
- Empirical observation
- (Weak): Most objects die young
- Strong: The older the object, the less chance it has to die
- Notable exception: LRU caches
References: 1
- Throughput
- % of time not spent in garbage collection
- Latency
- Responsiveness of an application, affected by GC pauses
References: 1
- STW (Stop-The-World)
- All app threads are stopped during GC
- Concurrent
- All app threads are running during GC
- Parallel
- GC uses multiple threads in STW and/or concurrent phases
- JVM ergonomics selects GC if none specified 1
- Serial GC if 1 CPU or < 1.75 GiB RAM 2
- OpenJDK < 8u191 not fully aware of containers 3
- GCs ergonomics auto-tune low-level params
GC | Option | Comment |
---|---|---|
Serial | -XX:+UseSerialGC | default |
Parallel | -XX:+UseParallelGC | default |
CMS | -XX:+UseConcMarkSweepGC | |
G1 | -XX:+UseG1GC | |
Shenandoah | -XX:+UseShenandoahGC | non-mainline backport |
GC | Option | Comment |
---|---|---|
Serial | -XX:+UseSerialGC | default |
Parallel | -XX:+UseParallelGC | |
CMS | -XX:+UseConcMarkSweepGC | deprecated |
G1 | -XX:+UseG1GC | default |
Shenandoah | -XX:+UseShenandoahGC | non-mainline backport |
ZGC | -XX:+UseZGC | exp., Linux x86_64 only |
Epsilon | -XX:+UseEpsilonGC | experimental |
- No-op gc, does not collect garbage at all
- Useful for:
- Measurements
- Short living processes
- Garbage-free applications
Young Generation | Old Generation | ||
---|---|---|---|
Serial | Copy | Mark | Compact |
Parallel | Copy | Mark | Compact |
CMS | Copy | Conc Mark | Conc Sweep |
G1 | Copy | Conc Mark | Compact |
Shenandoah | — | Conc Mark | Conc Compact |
ZGC | — | Conc Mark | Conc Compact |
Stop-The-World | Concurrent with application |
References: 1
References: 1
- All expensive GC phases run concurrent
- Coordination between GC & app through barriers
- STW pauses for root set scan and cleanup
- Pauses are short and predictable
- Better scalability for large heaps
- Machine code injected by JIT compiler
- Additional metadata needed for coordination
- Throughput reduction
- Predictable
- Can be offset with more resources
- Developed by Red Hat
- Named after Shenandoah national park
- Originally based on G1 GC
- Concurrent marking with SATB like G1
- Pause times independent of heap and live-set size
- Single generation, multiple regions
- Multiple heuristics and failure modes
- Concurrent compaction:
- v1: Brooks pointers, read and write barriers
- v2: on-heap forwarding pointers, load barriers
- Barrier loop optimizations
Extra code when object reference is loaded from heap:
Object obj2 = obj.field1; // Loading an object reference from heap
// Load barrier needed here
Object obj3 = obj2; // No barrier, not a load from heap
obj.doSomething(); // No barrier, not a load from heap
int i = obj.field2; // No barrier, not an object reference
Optimized for common case
References: 1
Pseudocode:
load_reference_barrier(addr):
if in_evac_phase() and in_collection_set(addr) and !is_forwarded(addr):
new_addr = copy_object(addr)
if cas_fwd_pointer(addr, new_addr):
return new_addr
else:
return get_fwd_pointer(addr) # Another thread copied object
- Available in OpenJDK, except Oracle builds
- Fixes and updates backported to LTS JDKs
JDK | Support | Status |
---|---|---|
8 | LTS | Ready for production |
11 | LTS | Ready for production |
12-14 | STS | Experimental, discontinued |
15 | STS | Ready for production |
- Developed by Oracle, initially proprietary
- Inspired by patented Azul C4 GC
- Pause times independent of heap and live-set size
- Single generation, multiple regions
- Concurrent marking and compaction:
- Load barriers and colored pointers
- 64 bit only
- no compressed oops
- off-heap forward pointers
- Load barriers and colored pointers
- Failure modes not documented
- Windows and MacOS support in JDK 14+
Object address size was changed from 42 to 44 bits in JDK 13 1
Pseudo code:
load_reference_barrier(addr):
if color(addr) is bad:
return slow_path(addr) # mark/relocate/remap, depends on gc phase
- Available in OpenJDK, including Oracle builds
- No backports ⇒ upgrade for fixes & new features
JDK | Support | Status |
---|---|---|
11 | LTS | Experimental, discontinued |
12-14 | STS | Experimental, discontinued |
15 | STS | Production ready |
References: 1
Criteria | GC |
---|---|
Heap ≤ 100 MiB | Serial |
Single CPU, long pauses ok | Serial |
Maximum throughput, long pauses ok | Parallel |
Minimum latency, reduced throughput ok | Shenandoah or ZGC |
Minimum latency, JDK 8 or 11 LTS | Shenandoah |
Large heap, long pauses not ok | Shenandoah or ZGC |
Slow hardware, long pauses not ok | Shenandoah (or ZGC?) |
Balanced / otherwise | G1 |
GC | Throughput | Pause time |
---|---|---|
Serial | 99% | — |
Parallel | 99% | — |
G1 | 90% | 200 ms |
Shenandoah | 85% | 10 ms |
ZGC | 85% | 10 ms |
Epsilon | 100% | 0 ms |
List actual values: -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal References: 1 2 3
- Set max heap size based on live-set and alloc rate
- For Parallel and G1 and with specific requirement:
- Set max pause-time or throughput goal
- Leave other options on default
- If goal isn't met tune according to next slides
- If goal still isn't met tune according to references
- Throughput goal not met:
- Increase max heap size
- Pauses too long:
- Choose different GC
References: 1
- Throughput goal not met:
- Increase max heap size
- Remove or raise max pause-time goal
- Pauses too long:
- Set or lower max pause-time goal
References: 1
- Throughput too low:
- Increase max heap size
- Raise max pause-time goal
- Pauses too long:
- Lower max pause-time goal
- Disable string deduplication
- Throughput too low:
- Decrease number of GC threads
- Latency too high / allocation failures:
- Increase max heap size
- Increase GC threads
References: 1
- Latency too high / allocation failures:
- Increase max heap size
- Change/tune heuristics to run GC sooner
- Increase allocator thread pacing delay
References: 1
- GCs provide detailed metrics in logs 1
- GCs provide partial metrics via MBeans 2 3
- Each GC reports metrics differently
- Compare with GC logs to find what is reported
- Use tool like GCViewer to interpret logs 1
- Export metrics to Prometheus with Micrometer 2
- Supports concurrent GCs in 1.6+ 3
- Reports all GC phases as pauses
- Compare with non-GC metrics 4
- Same rules apply in containers since 8u191 1
- Serial GC if 1 CPU or < 1792 MiB RAM
- Red Hat/fabric8 images use run-java.sh 2
- Defaults to Parallel GC on JVM < 10
- Sets max heap size by default
- Optimize performance 1:
- Set min heap size = max heap size
- Enable
-XX:+AlwaysPreTouch
- Optimize cost:
- Community Edition: Serial GC
- Enterprise Edition: Serial GC, G1 GC (Linux only)
- Implemented in Java
- Native image not as efficient as JIT code
Finalization is deprecated. Instead use:
- Explicit clean-up method
- AutoClosable
- PhantomReference and ReferenceQueue
References: 1