Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize & Partition Verification #229

Merged
merged 10 commits into from
Dec 19, 2024

Conversation

carolynzech
Copy link

@carolynzech carolynzech commented Dec 16, 2024

Shard Kani verification workflow across multiple runners by:

  1. Running kani list --format json, which outputs a JSON file like this:
{
  "kani-version": "0.56.0",
  "file-version": "0.1",
  "standard-harnesses": {
    "src/lib.rs": [
      "proof3",
      "verify::proof2"
    ],
    "src/test.rs": [
      "test::proof4",
      "test::proof5"
    ]
  },
  "contract-harnesses": {
    "src/lib.rs": [
      "proof"
    ]
  },
  "contracts": [
    {
      "function": "bar",
      "file": "src/lib.rs",
      "harnesses": [
        "proof"
      ]
    }
  ],
  "totals": {
    "standard-harnesses": 4,
    "contract-harnesses": 1,
    "functions-under-contract": 1
  }
}
  1. Extracting the harnesses inside "standard-harnesses" and "contract-harnesses" into an array called ALL_HARNESSES and the length of that array into HARNESS_COUNT. (So in this example, ALL_HARNESSES = [proof3, verify::proof2, test::proof4, test::proof5, proof] and HARNESS_COUNT=5).
  2. Dividing the harnesses evenly between four workers. For example, if worker 1's harnesses are proof3 and verify::proof2, then we call kani verify-std --harness proof3 --harness verify::proof2 --exact. The --exact makes Kani look for the exact harness name. This is important so that we don't match on partial patterns, e.g., if there is a harness called "foo" and a harness called "foo_bar", passing --harness foo without --exact would match against both harnesses, and then foo_bar would run twice.
  3. Also parallelize verification within a single runner by passing -j to Kani.

I chose four workers somewhat arbitrarily--it makes each worker take about 45 minutes to an hour to finish. I thought it was good to have a balance between too few workers (which still makes us wait a while) and too many workers (which makes you look through more logs to find where a given harness is being verified). But happy to play with this number if people have opinions.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

@carolynzech carolynzech requested a review from a team as a code owner December 16, 2024 23:11
@carolynzech
Copy link
Author

Note to reviewers: I would recommend reading the logs for each job to get a better idea of how the parallelization works--I tried to be fairly liberal with debug output.

scripts/run-kani.sh Outdated Show resolved Hide resolved
scripts/run-kani.sh Outdated Show resolved Hide resolved
@carolynzech
Copy link
Author

carolynzech commented Dec 18, 2024

Note: once this is approved by a second reviewer, I'll go ahead and change our settings to require the Verify std library (partition) jobs instead of the regular Verify std library ones.

The -j makes each runner go from about 45 minutes to 30 minutes. However, it also makes it harder to map verification results (and times) to each harness, since the "Checking harness ...." and verification results get printed separately, so they're all intertwined with each other. I'm not sure this tradeoff is worth it, but I suppose one could argue that this PR workflow is really just meant as a sanity check and people can debug locally.

@tautschnig
Copy link
Member

However, it also makes it harder to map verification results (and times) to each harness, since the "Checking harness ...." and verification results get printed separately, so they're all intertwined with each other. I'm not sure this tradeoff is worth it, but I suppose one could argue that this PR workflow is really just meant as a sanity check and people can debug locally.

Perhaps that's an issue we should just address in Kani? For example, when thread pools are enabled, print a thread id as prefix to each line of output?

@carolynzech carolynzech changed the title Parallelize Verification Parallelize & Partition Verification Dec 18, 2024
kani -j with no arguments defaults to the number of CPUs, so no need for a fixed number
@carolynzech
Copy link
Author

carolynzech commented Dec 18, 2024

Perhaps that's an issue we should just address in Kani? For example, when thread pools are enabled, print a thread id as prefix to each line of output?

Edit: implemented in #3790

scripts/run-kani.sh Outdated Show resolved Hide resolved
scripts/run-kani.sh Outdated Show resolved Hide resolved
scripts/run-kani.sh Show resolved Hide resolved
Copy link

@zhassan-aws zhassan-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@carolynzech carolynzech added this pull request to the merge queue Dec 19, 2024
Merged via the queue into model-checking:main with commit 67e2f1c Dec 19, 2024
14 checks passed
github-merge-queue bot pushed a commit that referenced this pull request Dec 19, 2024
This harness takes between 22-25 minutes in CI. Change the harnesses's
macro to use kissat solver instead, which brings verification time down
to ~7 seconds.

As an aside, I noticed this problem because the partition 1 runner in
#229 is taking ~55 minutes, where the other partitions are taking ~30
minutes. This harness accounts for almost the entire difference. One
nice consequence of partitioning verification is that it will make
performance issues like this more noticeable.

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 and MIT licenses.
github-merge-queue bot pushed a commit that referenced this pull request Dec 22, 2024
Fix a typo introduced in #229 that prevents `kani-args` from getting
passed to the `kani` command.

This was discovered by @tengjiang (see
#231).

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 and MIT licenses.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants