Please check the following guidelines for advice on how to get reliable benchmark results when using BenchExec. For more general guidelines about benchmarking, e.g., what to do when implementing a benchmarking tool, please read our paper Reliable Benchmarking: Requirements and Solutions. There are also many other valuable resources about benchmarking, for example this list of Benchmarking Crimes and these ACM SIGSOFT Empirical Standards for Benchmarking.
For correct interpretation of results, it is important that everything important
about the benchmarking is documented and archived together with the results.
BenchExec already adds lots of information (e.g., about the benchmarking host),
but it cannot know about everything (e.g., who did the benchmarking
and for which purpose, what was the version of the benchmark set, etc.).
To document such facts, write them into a text file
and pass it to benchexec
with the parameter --decription-file <file>
.
Furthermore, we recommend to use benchexec --commit
to automatically store BenchExec results in a git repository.
New versions of BenchExec might contain features or bug fixes that make benchmarking more precise.
The container mode should not be turned off because it isolates individual runs and makes benchmarking more reliable. Also make sure to configure the container (e.g., directory access) as restrictive as possible.
Without a fixed memory limit, the amount of memory available for benchmarking is non-deterministic.
In certain situations, BenchExec will issue a warning during benchmarking, e.g., if Turbo Boost is enabled, the system overheated etc. Check the output of BenchExec for such warnings and resolve them.
When multiple runs are executed in parallel by BenchExec, this can potentially influence their performance. The details on how large this effect is and whether this is acceptable depend on the characteristics of your hardware. In many cases, one parallel run per CPU is safe, but this is not guaranteed.
Runs with I/O suffer from non-deterministic performance.
If possible, ensure that I/O is reduced to a minimum.
For example, if a tool produces large outputs,
check whether this can be disabled or at least redirected to /dev/null
.
Some tools may require specific configuration for reliable benchmarking, for example for disabling unnecessary output, making random behavior deterministic by using a constant initial seed value, or for adjusting memory usage to the memory limit. So check the tool documentation for advice on this.
This is required for container mode, but in general a newer Linux version might also include improvements for the kernel features that BenchExec uses, and better support for your hardware. For example, BenchExec's memory measurements and limits have less overhead on Linux 4.14 and newer.
On LTS versions of Ubuntu, consider the LTS Enablement Stack.
As described in this paper on benchmarking, time measurements can be imprecise if the hardware clock is not fully precise and there is no time synchronization via NTP. Time-synchronization daemons such as ntpd, systemd-timesyncd, or chrony, do not only synchronize the machine's time with external sources (which is not necessary for measurements), but also apply corrections to the frequency of the local clock, which makes time measurements more precise. Note that other solutions without such a daemon, for example simple periodic time synchronizations, do not achieve this.