From 6319d5e96716496a5860e08301041c494929ea16 Mon Sep 17 00:00:00 2001 From: Tobias Meggendorfer <tobias@meggendorfer.de> Date: Sun, 24 Mar 2024 12:47:53 +0100 Subject: [PATCH] Draft documentation improvement --- README.md | 122 ++++++++++--------- doc/INDEX.md | 4 +- doc/benchexec.md | 298 ++++++++++++++++++++++++---------------------- doc/python-api.md | 39 ++++++ doc/runexec.md | 30 +---- 5 files changed, 263 insertions(+), 230 deletions(-) create mode 100644 doc/python-api.md diff --git a/README.md b/README.md index b219802c3..4576eea8c 100644 --- a/README.md +++ b/README.md @@ -8,6 +8,7 @@ SPDX-License-Identifier: Apache-2.0 --> # BenchExec + ## A Framework for Reliable Benchmarking and Resource Measurement [](https://gitlab.com/sosy-lab/software/benchexec/pipelines) @@ -15,58 +16,71 @@ SPDX-License-Identifier: Apache-2.0 [](https://pypi.python.org/pypi/BenchExec) [](https://zenodo.org/badge/latestdoi/30758422) - **News and Updates**: -- BenchExec is part of [Google Summer of Code](https://summerofcode.withgoogle.com/) again! If you are interested in being paid by Google for contributing to BenchExec, check our [project ideas and instructions](https://www.sosy-lab.org/gsoc/) and [contact us](https://github.com/sosy-lab/benchexec/discussions)! + +- BenchExec is part of [Google Summer of Code](https://summerofcode.withgoogle.com/) again! + If you are interested in being paid by Google for contributing to BenchExec, check our [project ideas and instructions](https://www.sosy-lab.org/gsoc/) and [contact us](https://github.com/sosy-lab/benchexec/discussions)! - BenchExec 3.18 brings support for systems with cgroups v2! -- Linux kernel 5.11 finally [makes it possible](https://github.com/sosy-lab/benchexec/blob/main/doc/INSTALL.md#kernel-requirements) to use all BenchExec features on distributions other than Ubuntu! -- We now provide an [Ubuntu PPA](https://launchpad.net/~sosy-lab/+archive/ubuntu/benchmarking) that makes installing and upgrading BenchExec easier ([docs](https://github.com/sosy-lab/benchexec/blob/main/doc/INSTALL.md#debianubuntu)). +- Linux kernel 5.11 finally [makes it possible](doc/INSTALL.md#kernel-requirements) to use all BenchExec features on distributions other than Ubuntu! +- We provide an [Ubuntu PPA](https://launchpad.net/~sosy-lab/+archive/ubuntu/benchmarking) that makes installing and upgrading BenchExec easier ([docs](doc/INSTALL.md#debianubuntu)). - An extended version of our paper on BenchExec and its background was published as open access in the journal STTT, you can read [Reliable Benchmarking: Requirements and Solutions](https://doi.org/10.1007/s10009-017-0469-y) online. We also provide a set of [overview slides](https://www.sosy-lab.org/research/prs/Latest_ReliableBenchmarking.pdf). +## Features + BenchExec provides three major features: -- execution of arbitrary commands with precise and reliable measurement - and limitation of resource usage (e.g., CPU time and memory), - and isolation against other running processes -- an easy way to define benchmarks with specific tool configurations - and resource limits, - and automatically executing them on large sets of input files +- execution of arbitrary commands with precise and reliable measurement and limitation of resource usage (e.g., CPU time and memory), and isolation against other running processes +- an easy way to define benchmarks with specific tool configurations and resource limits, and automatically executing them on large sets of input files - generation of interactive tables and plots for the results - -Unlike other benchmarking frameworks, -BenchExec is able to reliably measure and limit resource usage -of the benchmarked tool even if the latter spawns subprocesses. -In order to achieve this, -it uses the [cgroups feature](https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt) -of the Linux kernel to correctly handle groups of processes. -For proper isolation of the benchmarks, it uses (if available) -Linux [user namespaces](http://man7.org/linux/man-pages/man7/namespaces.7.html) -and an [overlay filesystem](https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt) -to create a [container](https://github.com/sosy-lab/benchexec/blob/main/doc/container.md) -that restricts interference of the executed tool with the benchmarking host. +Unlike other benchmarking frameworks, BenchExec is able to reliably measure and limit resource usage of the benchmarked tool even if the latter spawns subprocesses. +In order to achieve this, it uses the [cgroups feature](https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt) of the Linux kernel to correctly handle groups of processes. +For proper isolation of the benchmarks, it uses (if available) Linux [user namespaces](http://man7.org/linux/man-pages/man7/namespaces.7.html) and an [overlay filesystem](https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt) to create a [container](doc/container.md) that restricts interference of the executed tool with the benchmarking host. BenchExec is intended for benchmarking non-interactive tools on Linux systems. -It measures CPU time, wall time, and memory usage of a tool, -and allows to specify limits for these resources. -It also allows to limit the CPU cores and (on NUMA systems) memory regions, -and the container mode allows to restrict filesystem and network access. -In addition to measuring resource usage, -BenchExec can verify that the result of the tool was as expected, -and extract further statistical data from the output. -Results from multiple runs can be combined into CSV and interactive HTML tables, -of which the latter provide scatter and quantile plots -(have a look at our [demo table](https://sosy-lab.github.io/benchexec/example-table/svcomp-simple-cbmc-cpachecker.table.html)). - -BenchExec works only on Linux and needs a one-time setup of cgroups by the machine's administrator. +It measures CPU time, wall time, and memory usage of a tool, and allows to specify limits for these resources. +It also allows to limit the CPU cores and (on NUMA systems) memory regions, and the container mode allows to restrict filesystem and network access. +In addition to measuring resource usage, BenchExec can verify that the result of the tool was as expected, and extract further statistical data from the output. +Results from multiple runs can be combined into CSV and interactive HTML tables, of which the latter provide scatter and quantile plots (have a look at our [demo table](https://sosy-lab.github.io/benchexec/example-table/svcomp-simple-cbmc-cpachecker.table.html)). + +## Usage + +BenchExec is more than just the tool `benchexec`: +For benchmarking, BenchExec offers two command line utilities to execute benchmarks, `benchexec` and `runexec`, as well as a Python API, which are described in the following.. + +Before benchmarking, please consider the [general guidelines](doc/benchmarking.md) for benchmarking. + +### runexec: Benchmark a Single Command + +The tool `runexec` allows to run a single command with the isolation and resource limitation features provided by benchexec. +It can be viewed as a (much more powerful) replacement to a combination of utilities like `taskset`, `time`, and `timeout`. +For example, +``` + runexec --cores 0 --timelimit 5 --memlimit 1GB -- echo Test +``` +executes the command `echo Test`, restricted to a single core, 1 GB of memory, and 5 seconds of CPU-time, and displays precise measurements related to time, memory, I/O, etc. +Use `runexec` to reliably measure a single to a handful of executions. +See [`runexec`'s documentation](doc/runexec.md) for further usage instructions. + +### benchexec: Run a Benchmark Suite + +The tool `benchexec` is an eleborate wrapper around the base functionality of `runexec`. +This tool provides management for defining and executing a large number individual invocations as well as checking and aggregating their results. +We recommend using `benchexec` when executing a concrete benchmark suite, especially when multiple different tools are involved. +See [`benchexec`'s documentation](doc/benchexec.md) for further details, e.g., how to define benchmark sets, run `benchexec`, view results, or integrate tools. + +### API + +Benchexec also provides a [python API](doc/python-api.md) from which all features of `runexec` can be accessed directly. + +### Setup & Requirements + +BenchExec only works on Linux and needs a one-time setup of cgroups by the machine's administrator. The actual benchmarking can be done by any user and does not need root access. +See the [installation guide](doc/INSTALL.md) for detailed setup steps and troubleshooting. -BenchExec was originally developed for use with the software verification framework -[CPAchecker](https://cpachecker.sosy-lab.org) -and is now developed as an independent project -at the [Software Systems Lab](https://www.sosy-lab.org) -of the [Ludwig-Maximilians-Universität München (LMU Munich)](https://www.uni-muenchen.de). +## Useful Resources ### Links @@ -87,17 +101,19 @@ of the [Ludwig-Maximilians-Universität München (LMU Munich)](https://www.uni-m ### License and Copyright -BenchExec is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0), -copyright [Dirk Beyer](https://www.sosy-lab.org/people/beyer/). -Exceptions are some tool-info modules -and third-party code that is bundled in the HTML tables, -which are available under several other free licenses -(cf. [folder `LICENSES`](https://github.com/sosy-lab/benchexec/tree/main/LICENSES)). +BenchExec is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0), copyright [Dirk Beyer](https://www.sosy-lab.org/people/beyer/). +Exceptions are some tool-info modules and third-party code that is bundled in the HTML tables, which are available under several other free licenses (cf. [folder `LICENSES`](https://github.com/sosy-lab/benchexec/tree/main/LICENSES)). + +## Developers + +BenchExec was originally developed for use with the software verification framework [CPAchecker](https://cpachecker.sosy-lab.org) and is now developed as an independent project at the [Software Systems Lab](https://www.sosy-lab.org) of the [Ludwig-Maximilians-Universität München (LMU Munich)](https://www.uni-muenchen.de). + +### Maintainer + +[Philipp Wendler](https://www.philippwendler.de) -### Authors -Maintainer: [Philipp Wendler](https://www.philippwendler.de) +### Contributors -Contributors: - [Aditya Arora](https://github.com/alohamora) - [Levente Bajczi](https://github.com/leventeBajczi) - [Dirk Beyer](https://www.sosy-lab.org/people/beyer/) @@ -126,14 +142,10 @@ Contributors: - [Ilja Zakharov](https://github.com/IljaZakharov) - and [lots of more people who integrated tools into BenchExec](https://github.com/sosy-lab/benchexec/graphs/contributors) -### Users of BenchExec +## Users of BenchExec -BenchExec was successfully used for benchmarking in all instances -of the international competitions on [Software Verification](https://sv-comp.sosy-lab.org) -and [Software Testing](https://test-comp.sosy-lab.org) -with a wide variety of benchmarked tools and hundreds of thousands benchmark runs. -It is integrated into the cluster-based logic-solving service -[StarExec](https://www.starexec.org/starexec/public/about.jsp) ([GitHub](https://github.com/StarExec/StarExec)). +BenchExec was successfully used for benchmarking in all instances of the international competitions on [Software Verification](https://sv-comp.sosy-lab.org) and [Software Testing](https://test-comp.sosy-lab.org) with a wide variety of benchmarked tools and hundreds of thousands benchmark runs. +It is integrated into the cluster-based logic-solving service [StarExec](https://www.starexec.org/starexec/public/about.jsp) ([GitHub](https://github.com/StarExec/StarExec)). The developers of the following tools use BenchExec: diff --git a/doc/INDEX.md b/doc/INDEX.md index 175a5486e..e7b490da2 100644 --- a/doc/INDEX.md +++ b/doc/INDEX.md @@ -72,11 +72,9 @@ BenchExec always uses the SI standard units: which consists of the options for a tool configuration and will be combined with a task to define a run. -- **task**: A combination of a set of input files, a property file, and an expected verdict - that defines a problem for a tool to solve. +- **task**: A complete definition of a problem for a tool to solve, e.g. through one or more input files, with an optional expected result. A task corresponds to exactly one row in the result tables. Depending on what the tool supports, the set of input files can be empty. - Properties and expected verdicts are optional. - **task definition**: A file in [this format](https://gitlab.com/sosy-lab/benchmarking/task-definition-format) that describes a set of tasks diff --git a/doc/benchexec.md b/doc/benchexec.md index 7e8939cf9..0f2ae4184 100644 --- a/doc/benchexec.md +++ b/doc/benchexec.md @@ -8,47 +8,141 @@ SPDX-License-Identifier: Apache-2.0 --> # BenchExec: benchexec + ## Benchmarking a Collection of Runs -The program `benchexec` provides the possibility to easily benchmark -multiple executions of a tool in one go. +The program `benchexec` provides the possibility to easily benchmark thousands of executions of a tool in one go. +This document lays out the three major components of this process + + 1. Defining a set of runs that should be benchmarked + 2. Running `benchexec` to measure all these runs + 3. Gathering and displaying the results ### Input for benchexec -`benchexec` uses as input an XML file that defines the command(s) to execute, -the resource limits, and the tasks for which the command should be run. -A complete definition of the input format can be found in the file -[doc/benchmark.xml](benchmark.xml), -and examples in [doc/benchmark-example-rand.xml](benchmark-example-rand.xml), -[doc/benchmark-example-calculatepi.xml](benchmark-example-calculatepi.xml), -and [doc/benchmark-example-cbmc.xml](benchmark-example-cbmc.xml). -The document type of these files should be +Effectively, a benchmark comprises a set of runs, where a run means to "execute tool `X` with configuration `Y` on task `Z`". +In order to concisely specify a benchmark comprising a large set of runs, BenchExec provides a custom specification language in the form of XML files, which are described below. + +In a nutshell these files reference a particular tool and define one or more different configurations of the tool together with a set of inputs for which the tool (in each of the defined configurations) should be executed. +Every combination of tool configuration and task is then mapped to an actual invocation by a tool-specific module, called *tool-info module* (see [Tool Integration](tool-integration.md)). + +In most of the following specifications, BenchExec allows the use of variables, which are replaced by their corresponding values. +All possible variables and their values are explained further below. + +#### Benchmark Files + +The top level definition of a benchmark is an XML file that defines the command(s) to execute, the resource limits, and the inputs for which the command should be run. +This benchmark-definition file consist of a root tag `<benchmark>`, that defines the tool to use and the resource limits of the benchmark. +For example +``` +<benchmark tool="cpachecker" timelimit="4s"> +... +</benchmark> +``` +declares that this benchmark should execute the tool `cpachecker` with a timelimit of 4 seconds. +(To reference your own tool, you need to add a [tool-info module](tool-integration.md)), which is then associated with such a tool name.) + +Within the benchmark tag, different configurations of the tool as well as different sets of inputs can be defined, described in the following. +A complete definition of the input format can be found in the file [doc/benchmark.xml](benchmark.xml), and examples in + [doc/benchmark-example-rand.xml](benchmark-example-rand.xml), + [doc/benchmark-example-calculatepi.xml](benchmark-example-calculatepi.xml), and + [doc/benchmark-example-cbmc.xml](benchmark-example-cbmc.xml). +The document type of these files should be ```XML <!DOCTYPE benchmark PUBLIC "+//IDN sosy-lab.org//DTD BenchExec benchmark 1.18//EN" "https://www.sosy-lab.org/benchexec/benchmark-1.18.dtd"> ``` +A document-type definition with a formal specification of input files can be found in [doc/benchmark.dtd](benchmark.dtd). + +#### Tool Configuration + +Fixed command-line options for the tool can be specified with `<option>` tags directly inside the root tag. +Repeated `<rundefinition>` tags allow to specify multiple different configurations of the tool, each of which is executed with the tasks. +In other words, a rundefinition effectively is a "template" to be filled with task-specific details. +As an example, +``` +<benchmark tool="my_tool" ...> + <option="--strict" /> + <rundefinition name="default"><option="--mode">default</option></rundefinition> + <rundefinition name="precise"><option="--mode">precise</option></rundefinition> + + ... +</benchmark> +``` +would define two variants of the tool `my_tool`, `default` and `precise`. +The former would execute the tool with `--strict --mode default`, the latter with `--strict --mode precise`. +Both of these will be executed on each of the defined inputs. +Note that you need to use a separate `<option>` tag for each argument, putting multiple arguments separated by spaces into a single tag will not have the desired effect. + +#### Task Definition + +The concrete tasks (each variant of) the tool should solve are defined in `<tasks>` tags. +One `<tasks>` tag defines a logical group of concrete tasks, e.g. all relevant tasks or all tasks of a particular kind, etc. +Typically (but not necessarily), tasks correspond to an input file for the benchmarked tool. +BenchExec allows for several ways to declare input files. + +**Including**: +The easiest way to specify tasks inside a `<tasks>` tag is with the `<include>` tag, which contains a file-name pattern. +One task is created for each file matching the file-name pattern, and the respective file is given to the tool as input file. +As an example +``` +<benchmark tool="my_tool" ...> + <tasks><include>benchmarks/*.in</include></tasks> +</benchmark> +``` +would run `my_tool` for each `.in` file in the folder `benchmarks/`. + +**Set files**: +Instead of referring directly to input files, the `<tasks>` tag also supports `<includesfile>`, which contain a file-name pattern that points to *set files*. +These files are expected to contain a file-name pattern on each line, and `benchexec` will treat these patterns as specified with `<include>`. +This allows sharing a set of input files across different benchmark definitions. + +**Excluding**: +The tags `<exclude>` and `<excludesfile>` can be used to exclude tasks that would otherwise be used by `benchexec`. +If the patterns given inside `<exclude>` or inside an exclude set file match a task, the task will be ignored. + +**Tasks without files**: +A task that does not correspond to an input file can be defined with a `<withoutfile>` tag, giving the identifier of the task as tag content. +This might be applicable when the benchmark is completely defined by the tool's options. -A document-type definition with a formal specification of input files can be found in -[doc/benchmark.dtd](benchmark.dtd). -The benchmark-definition files consist of a root tag `<benchmark>` -that has attributes for the tool to use and the resource limits. -Nested `<rundefinition>` tags allow to specify multiple different configurations of the tool, -each of which is executed with the tasks. -The tasks are defined in nested `<tasks>` tags, -which are explained in the next section. -Command-line arguments for the tool are given with `<option>` tags, -which can appear directly inside the root tag (always effective), -inside a `<rundefinition>` tag (effective for this configuration), -or inside a `<tasks>` tag (effective only for this subset of tasks for all configurations). -Note that you need to use a separate `<option>` tag for each argument, -putting multiple arguments separated by spaces into a single tag will not have the desired effect. - -Which tool should be benchmarked by BenchExec is indicated by -the attribute `tool` of the tag `<benchmark>`. -It's value is the name of a so-called *tool-info module* -described in more detail under [Tool Integration](tool-integration.md). - -Inside the `<option>` tag and other tags some variables can be used -that will be expanded by BenchExec. The following variables are supported: +**Task options**: +The `<option>` tag can also be specified inside `<tasks>`, which sets the specified command-line options for all runs derived from this set of tasks. + +**`.yml` files** +If any file-name pattern points to a `.yml` file, these are instead interpreted in the *task-definition format* of BenchExec, described below. + + +All of the above tags can be freely combined inside a `<tasks>` tag. +Relative file names in these tags are interpreted as relative to the directory of the XML file, and relative file names inside set files are interpreted as relative to the directory of the set file. + +#### Task-Definition Files + +For more complex setups, more than a single input file might be required to fully define a task. +To simplify this case, BenchExec supports the [task-definition format](https://gitlab.com/sosy-lab/benchmarking/task-definition-format) version 1 and 2 (cf. also our [example file doc/task-definition-example.yml](task-definition-example.yml)). + +A task-definition file may refer to one or more input files and define multiple *properties*. +Each property references a [property file](properties.md) which contains anything relevant to the property. +Combining the given input files together with any of the defined properties defines a concrete task. +The intention is that such task-definition files accompany the input files and compactly represent every interesting problem related or applicable to the input files. + +As such, when referencing a task-definition file within `<tasks>`, BenchExec does *not* create one task for each property definition. +Instead, a `<tasks>` definition must specify which property files should be selected. +To this end, specify the tag `<propertyfile>` with a file path (see the referenced examples). +This instructs BenchExec to select all tasks defined by referenced task-definition files which point to this property file (listed as `property_file` in the task-definition files), following symlinks. +<!-- If the `<propertyfile>` tag has an attribute `expectedverdict` with one of the values `true`, `false`, `unknown`, or `false(subproperty)` for some `subproperty`, `benchexec` will ignore tasks where the expected verdict that is declared in the task-definition file does not match. --> +If instead no `<propertyfile>` tag is given in the `<tasks>` definition, one task is created for each task-definition file using the input files defined therein, and any information on properties and expected results is ignored. + +The task-definition file can also define `options`, which are also passed to the tool-info modules as is, without further checks. + +For creating task-definition files for existing tasks that use the legacy way of encoding expected verdicts in the file name we provide a [helper script](../contrib/create_yaml_files.py). + +#### Result Files + +The tag `<resultfiles>` inside the `<benchmark>` tag specifies [which files should be copied to the output directory](container.md#retrieving-result-files) (only supported if [container mode](container.md) is not turned off). + +#### Variable Replacement + +Inside most tags, some variables can be used that will be expanded by BenchExec. +The following variables are supported: ${benchmark_name} Name of benchmark execution ${benchmark_date} Timestamp of benchmark execution @@ -59,6 +153,7 @@ that will be expanded by BenchExec. The following variables are supported: ${logfile_path_abs} Directory where tool-output files will be stored (absolute path) ${rundefinition_name} Name of current run definition + If task-definition files are used, the following variables are defined: ${taskdef_name} Name of current task-definition file (without path) @@ -78,83 +173,15 @@ that is in the same directory as each input file, use <option name="-f">${inputfile_path}/additional-file.txt</option> ``` -The tag `<resultfiles>` inside the `<benchmark>` tag specifies -[which files should be copied to the output directory](container.md#retrieving-result-files) -(only supported if [container mode](container.md) is not turned off). - -### Defining Tasks for BenchExec -Typically, tasks for `benchexec` correspond to an input file of the benchmarked tool. -The easiest way to specify tasks inside a `<tasks>` tag is with the `<include>` tag, -which contains a file-name pattern. -If the file-name patterns point to `.yml` files in the task-definition format of BenchExec, -these are parsed by `benchexec` as explained in the following section. -Otherwise, one task is created for each file matching the file-name pattern, -and the respective file is given to the tool as input file. - -Inside a `<tasks>` tag there can also exist `<includesfile>` tags, -which contain a file-name pattern that points to so-called "set" files. -These set files are expected to contain a file-name pattern on each line, -and `benchexec` will treat these patterns as specified with `<include>`. - -The tags `<exclude>` and `<excludesfile>` can be used to exclude tasks -that would otherwise be used by `benchexec`. -If the patterns given inside `<exclude>` or inside an exclude set file match a task, -the task will be ignored. - -A task that does not directly correspond to an input file can be defined -with a `<withoutfile>` tag within a `<tasks>` tag, -giving the identifier of the task as tag content. - -All of these tags can be freely combined inside a `<tasks>` tag. -Relative file names in these tags are interpreted as relative to the directory of the XML file, -and relative file names inside set files are interpreted as relative to the directory of the set file. - -### Task-Definition Files -Such files can be used to specify more complex tasks, -such as tasks with several input files -or tasks where BenchExec should compare the produced tool output against an expected result. -The files need to be in [YAML format](http://yaml.org/) (which is a superset of JSON) -and their structure needs to adhere to the -[task-definition format](https://gitlab.com/sosy-lab/benchmarking/task-definition-format) -(cf. also our [example file doc/task-definition-example.yml](task-definition-example.yml)). -BenchExec supports versions 1.0 and 2.0 of the format. -For creating task-definition files for existing tasks -that use the legacy way of encoding expected verdicts in the file name -we provide a [helper script](../contrib/create_yaml_files.py). - -If no property file is given in the benchmark XML definition, -one task is created for each task-definition file using the input files defined therein, -and any information on properties and expected results is ignored. - -If a [property file](properties.md) is given in the benchmark XML definition with the `<propertyfile>` tag, -`benchexec` looks for an item in the `properties` entry of the task definition -that has the same property file listed as `property_file` (symlinks are allowed). -If none is found, the task defined by this task definition is ignored. -Otherwise a task is created using the set of input files from the task definition -and the given property, also using an expected verdict if given for that property. -All other properties defined in the task definition are ignored. -If the `<propertyfile>` tag has an attribute `expectedverdict` -with one of the values `true`, `false`, `unknown`, -or `false(subproperty)` for some `subproperty`, -`benchexec` will ignore tasks where the expected verdict -that is declared in the task-definition file does not match. - -The `options` dictionary can optionally contain parameters or information about the task -in an application-specified format. -BenchExec passes the dictionary to tool-info modules as is, without further checks. - -### Starting benchexec +### Running benchexec + To use `benchexec`, simply call it with an XML file with a benchmark definition: benchexec doc/benchmark-example-rand.xml Command-line arguments to `benchexec` allow to override the defined resource limits. -If one wants to execute only a subset of the defined benchmark runs, -the name of the `<rundefinition>` and/or `<tasks>` tags -that should be executed can also be given on the command line. -To start multiple executions of the benchmarked tool in parallel -(if the local machine has enough resources), -use the parameter `--numOfThreads`. +If one wants to execute only a subset of the defined benchmark runs, the name of the `<rundefinition>` and/or `<tasks>` tags that should be executed can also be given on the command line. +To start multiple executions of the benchmarked tool in parallel (if the local machine has enough resources), use the parameter `--numOfThreads`. Example: benchexec doc/benchmark-example-rand.xml --tasks "XML files" --limitCores 1 --timelimit 10s --numOfThreads 4 @@ -162,11 +189,9 @@ Example: The full set of available parameters can be seen with `benchexec -h`. For explanation of the parameters for containers, please see [container mode](container.md). -Command-line arguments can additionally be read from a file, -if the file name prefixed with `@` is given as argument. +Command-line arguments can additionally be read from a file, if the file name prefixed with `@` is given as argument. The file needs to contain one argument per line. -If parameter name and value are on the same line, -they need to be separated with `=` (not with a space). +If parameter name and value are on the same line, they need to be separated with `=` (not with a space). Alternatively, each of them can be put on a separate line. For example, if the file `benchexec.cfg` has the content @@ -183,51 +208,36 @@ the following command-line is equivalent to the one above: benchexec doc/benchmark-example-rand.xml @benchexec.cfg -### BenchExec Results -`benchexec` produces as output the results and resource measurements -of all the individual tool executions in (compressed) XML files -from which tables can be created using `table-generator`. -There is one file per run definition/tool configuration, -and additional files for each subset of tasks -(all by default in directory `./result/`). -A document-type definition with a formal specification of such result files can be found in -[doc/result.dtd](result.dtd), and a description under [Run Results](run-results.md). - -The output of the tool executions is stored in separate log files -in a ZIP archive beside the XML files. -Storing the log files in an archive avoids producing large amounts of small individual files, -which can slow down some file systems significantly. +### Viewing Results + +`benchexec` produces as output the results and resource measurements of all the individual tool executions in (compressed) XML files from which tables can be created using `table-generator`. +There is one file per run definition/tool configuration, and additional files for each subset of tasks (all by default in directory `./result/`). +A document-type definition with a formal specification of such result files can be found in [doc/result.dtd](result.dtd), and a description under [Run Results](run-results.md). + +The output of the tool executions is stored in separate log files in a ZIP archive beside the XML files. +Storing the log files in an archive avoids producing large amounts of small individual files, which can slow down some file systems significantly. Furthermore, tool outputs can typically be compressed significantly. -If you prefer uncompressed results, you can pass `--no-compress-results` to `benchexec`, -this will let XML files be uncompressed and the log files be stored as regular files in a directory. -Alternatively, you can simply uncompress the results with `bzip2 -d ...results.xml.bz2` -and `unzip -x ...logfiles.zip`. +If you prefer uncompressed results, you can pass `--no-compress-results` to `benchexec`, this will let XML files be uncompressed and the log files be stored as regular files in a directory. +Alternatively, you can simply uncompress the results with `bzip2 -d ...results.xml.bz2` and `unzip -x ...logfiles.zip`. The post-processing of results with `table-generator` supports both compressed and uncompressed files. -If the target directory for the output files (specified with `--outputpath`) -is a git repository without uncommitted changes and the option `--commit` -is specified, `benchexec` will add and commit all created files to the git repository. -One can use this to create a reliable archive of experimental results. +If the target directory for the output files (specified with `--outputpath`) is a git repository without uncommitted changes and the option `--commit` is specified, `benchexec` will add and commit all created files to the git repository. +This can be used to create a reliable archive of experimental results. ### Resource Handling -`benchexec` automatically tries to allocate the available hardware resources -in the best possible way. -More information on what should be considered when allocating hardware resources such as CPU cores -can be found in our paper -[Reliable Benchmarking: Requirements and Solutions](https://www.sosy-lab.org/research/pub/2019-STTT.Reliable_Benchmarking_Requirements_and_Solutions.pdf). + +`benchexec` automatically tries to allocate the available hardware resources in the best possible way. +More information on what should be considered when allocating hardware resources such as CPU cores can be found in our paper [Reliable Benchmarking: Requirements and Solutions](https://www.sosy-lab.org/research/pub/2019-STTT.Reliable_Benchmarking_Requirements_and_Solutions.pdf). Some additional technical information is also present in the documentation on [resource handling](resources.md). ### Extending BenchExec + +<!-- TODO Should this go into API? --> + BenchExec executes all runs on the local machine. -In some cases, it might be desired to use for example some cloud service -to execute the commands, and BenchExec should only handle the benchmark definition -and aggregate the results. -This can be done by replacing the module `benchexec.localexecution`, -which is responsible for executing a collection of runs, by something else. -To do so, inherit from the BenchExec main class `benchexec.BenchExec` -and override the necessary methods such as `load_executor` -(which by default returns the `benchexec.localexecution` module), -`create_argument_parser` (to add your own command-line arguments) etc. +In some cases, it might be desired to use for example some cloud service to execute the commands, and BenchExec should only handle the benchmark definition and aggregate the results. +This can be done by replacing the module `benchexec.localexecution`, which is responsible for executing a collection of runs, by something else. +To do so, inherit from the BenchExec main class `benchexec.BenchExec` and override the necessary methods such as `load_executor` (which by default returns the `benchexec.localexecution` module), `create_argument_parser` (to add your own command-line arguments) etc. diff --git a/doc/python-api.md b/doc/python-api.md new file mode 100644 index 000000000..4b3c9ddb2 --- /dev/null +++ b/doc/python-api.md @@ -0,0 +1,39 @@ +<!-- +This file is part of BenchExec, a framework for reliable benchmarking: +https://github.com/sosy-lab/benchexec + +SPDX-FileCopyrightText: 2007-2020 Dirk Beyer <https://www.sosy-lab.org> + +SPDX-License-Identifier: Apache-2.0 +--> + +# BenchExec: API + +From within Python, BenchExec can be used to execute a command as in the following example: + +```python +from benchexec.runexecutor import RunExecutor +executor = RunExecutor() +result = executor.execute_run(args=[<TOOL_CMD>], ...) +``` + +Further parameters for `execute_run` can be used to specify resource limits +(c.f. [runexecutor.py](../benchexec/runexecutor.py)). +The result is a dictionary with the same information about the run +that is printed to stdout by the `runexec` command-line tool (cf. [Run Results](run-results.md)). + +If `RunExecutor` is used on the main thread, +caution must be taken to avoid `KeyboardInterrupt`, e.g., like this: + +```python +import signal +from benchexec.runexecutor import RunExecutor +executor = RunExecutor() + +def stop_run(signum, frame): + executor.stop() + +signal.signal(signal.SIGINT, stop_run) + +result = executor.execute_run(args=[<TOOL_CMD>], ...) +``` diff --git a/doc/runexec.md b/doc/runexec.md index 6c8fd3cc2..81768d3ef 100644 --- a/doc/runexec.md +++ b/doc/runexec.md @@ -57,6 +57,7 @@ For explanation of the parameters for containers, please see [container mode](co Command-line parameters can additionally be read from a file as [described for benchexec](benchexec.md#starting-benchexec). + ## Integration into other Benchmarking Frameworks BenchExec can be used inside other benchmarking frameworks @@ -72,31 +73,4 @@ You can also execute `runexec` directly from the `.whl` file with the following PYTHONPATH=path/to/BenchExec.whl python3 -m benchexec.runexecutor ... -From within Python, BenchExec can be used to execute a command as in the following example: - -```python -from benchexec.runexecutor import RunExecutor -executor = RunExecutor() -result = executor.execute_run(args=[<TOOL_CMD>], ...) -``` - -Further parameters for `execute_run` can be used to specify resource limits -(c.f. [runexecutor.py](../benchexec/runexecutor.py)). -The result is a dictionary with the same information about the run -that is printed to stdout by the `runexec` command-line tool (cf. [Run Results](run-results.md)). - -If `RunExecutor` is used on the main thread, -caution must be taken to avoid `KeyboardInterrupt`, e.g., like this: - -```python -import signal -from benchexec.runexecutor import RunExecutor -executor = RunExecutor() - -def stop_run(signum, frame): - executor.stop() - -signal.signal(signal.SIGINT, stop_run) - -result = executor.execute_run(args=[<TOOL_CMD>], ...) -``` +Benchexec also provides a [simple Python API](python-api.md) to ease integration.