layout | title | short_title |
---|---|---|
default |
Benchmarks |
Benchmarks |
{% include oopsla.md %}
This document will only present a few of the bechmarks:
- macrobenchmarks - large and complex programs
- microbenchmarks - small and focused benchmarks
- bytecode size comparisons
To evaluate the miniboxing plugin, we implemented a mock-up of the Scala collections linked list and benchmarked the performance. The result: 1.5x-4x speedup just by adding the @miniboxed
annotation. And it's worth pointing out our mock-up included all the common patterns found in the library: Builder
, Numeric
, Traversable
, Seq
, closures, tuples etc.
The benchmark we ran is fitting a linear curve to a given set of points using the Least Squares method. Basically, we made a custom library and benchmarked this code:
{% highlight scala %} val xs: List[Double] = // list of x coordinates val ys: List[Double] = // list of y coordinates
// list of (x,y) coordinates: val xys = xs.zip(ys)
// intermediary sums: val sumx = xs.sum val sumy = ys.sum val sumxy = xys.map(mult).sum val sumsquare = xs.map(square).sum
// slope and intercept approximation: val slope = (size * sumxy - sumx * sumy) / (size * sumsquare - sumx * sumx) val intercept = (sumy * sumsquare - sumx * sumxy) / (size * sumsquare - sumx * sumx) {% endhighlight %}
We ran this code with two versions of the linked list: one with the plugin activated and one with generic classes. The numbers were obtained on an i7 server machine with 32GB of RAM, and we made sure no garbage collections occured (-Xmx16g
, -Xms16g
):
This shows miniboxed linked lists are 1.5x to 2x faster than generic collections, despite the fact that linked lists are not contiguous, thus reducing the benefits of miniboxing. We have also tested specialization, but it ran out of memory and we were unable to get any garbage collection-free runs above 1500000 elements (we suspect this is due to bug SI-3585 Specialized class should not have duplicate fields, but haven't examined in depth).
We also wanted to test how miniboxing copes with garbage collection cycles compared to the generic library. To do so, we limited the heap size to 2G (-Xmx2g
, -Xms2g
, -XX:+UseParallelGC
):
To summarize, on linked lists, we can expect speedups between 1.5x and 4x, despite the non-contiguous nature of the linked list.
The full description of this experiment is available here.
A separate article describes the performance of the spire numeric abstraction library when using miniboxing.
Another important benchmark is the ArrayBuffer.reverse
. This is the most difficult benchmark to get right, since the
miniboxing transformation interacts with the Java Virtual Machine optimization heuristics and, if the transformation is
not done correctly, miniboxing can actually hurt performance (more details in the OOPSLA paper).
These are our current results:
Where:
generic
is the generic codeminiboxing
is the code generated by our pluginspecialization
is the code generated by the@specialized
annotation in Scalamonomorphic
is the code specialized by hand
After removing generic
:
These benchmarks are further described in a Miniboxing OOPSLA'13 paper.
When comparing the total bytecode size for spire we see a 4.5x bytecode reduction:
The OOPSLA'13 paper presents several other benchmarks:
- performance microbenchmarks
- on the HotSpot JVM with the Server compiler
- on the HotSpot JVM with the Graal compiler
- interpreter performance microbenchmarks
- bytecode size
- classloader overhead
- performance impact
- heap consumption
The SCALA'14 paper presents:
- a high-level overview of patterns in Scala
- benchmarks for a mock-up of the Scala collection linked list
In short, miniboxed code:
- is up to 22x faster than generic code
- it surpasses the performance of specialization
- is marginally slower compared to monomorphic code, with an overhead of about 10%