Skip to content

Latest commit

 

History

History
117 lines (74 loc) · 5.04 KB

benchmarks.md

File metadata and controls

117 lines (74 loc) · 5.04 KB
layout title short_title
default
Benchmarks
Benchmarks

{% include oopsla.md %}

This document will only present a few of the bechmarks:

  • macrobenchmarks - large and complex programs
  • microbenchmarks - small and focused benchmarks
  • bytecode size comparisons

Macrobenchmarks

Scala Collections

To evaluate the miniboxing plugin, we implemented a mock-up of the Scala collections linked list and benchmarked the performance. The result: 1.5x-4x speedup just by adding the @miniboxed annotation. And it's worth pointing out our mock-up included all the common patterns found in the library: Builder, Numeric, Traversable, Seq, closures, tuples etc.

The benchmark we ran is fitting a linear curve to a given set of points using the Least Squares method. Basically, we made a custom library and benchmarked this code:

{% highlight scala %} val xs: List[Double] = // list of x coordinates val ys: List[Double] = // list of y coordinates

// list of (x,y) coordinates: val xys = xs.zip(ys)

// intermediary sums: val sumx = xs.sum val sumy = ys.sum val sumxy = xys.map(mult).sum val sumsquare = xs.map(square).sum

// slope and intercept approximation: val slope = (size * sumxy - sumx * sumy) / (size * sumsquare - sumx * sumx) val intercept = (sumy * sumsquare - sumx * sumxy) / (size * sumsquare - sumx * sumx) {% endhighlight %}

With infinite heap memory (no garbage collection overhead)

We ran this code with two versions of the linked list: one with the plugin activated and one with generic classes. The numbers were obtained on an i7 server machine with 32GB of RAM, and we made sure no garbage collections occured (-Xmx16g, -Xms16g):

This shows miniboxed linked lists are 1.5x to 2x faster than generic collections, despite the fact that linked lists are not contiguous, thus reducing the benefits of miniboxing. We have also tested specialization, but it ran out of memory and we were unable to get any garbage collection-free runs above 1500000 elements (we suspect this is due to bug SI-3585 Specialized class should not have duplicate fields, but haven't examined in depth).

With limited heap memory

We also wanted to test how miniboxing copes with garbage collection cycles compared to the generic library. To do so, we limited the heap size to 2G (-Xmx2g, -Xms2g, -XX:+UseParallelGC):

To summarize, on linked lists, we can expect speedups between 1.5x and 4x, despite the non-contiguous nature of the linked list.

The full description of this experiment is available here.

Spire

A separate article describes the performance of the spire numeric abstraction library when using miniboxing.

Microbenchmarks

Another important benchmark is the ArrayBuffer.reverse. This is the most difficult benchmark to get right, since the miniboxing transformation interacts with the Java Virtual Machine optimization heuristics and, if the transformation is not done correctly, miniboxing can actually hurt performance (more details in the OOPSLA paper).

These are our current results:

Where:

  • generic is the generic code
  • miniboxing is the code generated by our plugin
  • specialization is the code generated by the @specialized annotation in Scala
  • monomorphic is the code specialized by hand

After removing generic:

These benchmarks are further described in a Miniboxing OOPSLA'13 paper.

Bytecode Size

When comparing the total bytecode size for spire we see a 4.5x bytecode reduction:

More Benchmarks

The OOPSLA'13 paper presents several other benchmarks:

  • performance microbenchmarks
    • on the HotSpot JVM with the Server compiler
    • on the HotSpot JVM with the Graal compiler
  • interpreter performance microbenchmarks
  • bytecode size
  • classloader overhead
    • performance impact
    • heap consumption

The SCALA'14 paper presents:

  • a high-level overview of patterns in Scala
  • benchmarks for a mock-up of the Scala collection linked list

Conclusion

In short, miniboxed code:

  • is up to 22x faster than generic code
  • it surpasses the performance of specialization
  • is marginally slower compared to monomorphic code, with an overhead of about 10%