New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Improvements to DynamicPPLBenchmarks #346

Draft

torfjelde wants to merge 33 commits into main from tor/benchmark-update

+375 −365

Member

torfjelde commented Dec 3, 2021

Produces results such as can be seen here: #309 (comment)

torfjelde added 11 commits

August 2, 2021 02:08


          bigboy update to benchmarks

57b5d47


          Merge branch 'master' into tor/benchmark-update

e7c0a76


          Merge branch 'master' into tor/benchmark-update

60ec2c8


          Merge branch 'master' into tor/benchmark-update

eb1b83c


          Merge branch 'master' into tor/benchmark-update

d8afa71


          make models return random variables as NamedTuple as it can be useful…

5bb48d2

… for downstream tasks


          add benchmarking of evaluation with SimpleVarInfo with NamedTuple

02484cf


          added some information about the execution environment

5c59769


          added judgementtable_single

f1f1381


          added benchmarking of SimpleVarInfo, if present

a48553a


          Merge branch 'master' into tor/benchmark-update

f2dc062

torfjelde marked this pull request as draft

December 3, 2021 00:43

github-actions bot reviewed

View reviewed changes

benchmarks/src/DynamicPPLBenchmarks.jl Outdated Show resolved Hide resolved

benchmarks/src/DynamicPPLBenchmarks.jl Outdated Show resolved Hide resolved

benchmarks/src/DynamicPPLBenchmarks.jl Outdated Show resolved Hide resolved

benchmarks/src/DynamicPPLBenchmarks.jl Outdated Show resolved Hide resolved

benchmarks/src/DynamicPPLBenchmarks.jl Outdated Show resolved Hide resolved

benchmarks/src/tables.jl Outdated Show resolved Hide resolved

benchmarks/src/tables.jl Outdated Show resolved Hide resolved

benchmarks/src/tables.jl Outdated Show resolved Hide resolved

benchmarks/src/tables.jl Outdated Show resolved Hide resolved

benchmarks/src/tables.jl Outdated Show resolved Hide resolved


          added ComponentArrays benchmarking for SimpleVarInfo

fa675de

github-actions bot reviewed

View reviewed changes

benchmarks/src/DynamicPPLBenchmarks.jl Outdated Show resolved Hide resolved

yebai mentioned this pull request

Lightweight benchmarking on Github Actions TuringLang/Turing.jl#822

Closed

Member

yebai commented Dec 16, 2021

This might be helpful for running benchmarks via CI - https://github.com/tkf/BenchmarkCI.jl

This was referenced Dec 16, 2021

Trigger benchmarking with Bors commands TuringLang/Turing.jl#1341

Closed

Lightweight benchmarks for Turing TuringLang/Turing.jl#1534

Closed


          Merge branch 'master' into tor/benchmark-update

3962da2

github-actions bot reviewed

View reviewed changes

benchmarks/src/DynamicPPLBenchmarks.jl Outdated Show resolved Hide resolved

Member

yebai commented Aug 29, 2022

@torfjelde should we improve this PR by incorporating TuringBenchmarks ? Alternatively, we can move all benchmarking code here into TuringBenchmarks . I am happy with both cases, but ideally, these benchmarking utilities should live in only one place to minimise confusion.

Also, https://github.com/TuringLang/TuringExamples contains some very old benchmarking code.

cc @xukai92 @devmotion

yebai and others added 3 commits

November 2, 2022 20:42


          Merge branch 'master' into tor/benchmark-update

53dc571


          Merge branch 'master' into tor/benchmark-update

f5705d5


          formatting

7f569f7

yebai mentioned this pull request

Easy way to get gradient evaluation timing TuringLang/Turing.jl#1721

Closed


          Merge branch 'master' into tor/benchmark-update

4a06150

github-actions bot reviewed

View reviewed changes

benchmarks/benchmark_body.jmd Outdated Show resolved Hide resolved

benchmarks/benchmark_body.jmd Outdated Show resolved Hide resolved

benchmarks/benchmarks.jmd Outdated Show resolved Hide resolved

benchmarks/benchmarks.jmd Outdated Show resolved Hide resolved

benchmarks/benchmarks.jmd Outdated Show resolved Hide resolved

benchmarks/benchmarks.jmd Outdated Show resolved Hide resolved

benchmarks/benchmarks.jmd Outdated Show resolved Hide resolved

benchmarks/benchmarks.jmd Outdated Show resolved Hide resolved

benchmarks/benchmarks.jmd Outdated Show resolved Hide resolved

benchmarks/benchmarks.jmd Outdated Show resolved Hide resolved

coveralls commented Feb 2, 2023 •

edited

Loading

Pull Request Test Coverage Report for Build 13093265728

Details

0 of 0 changed or added relevant lines in 0 files are covered.
51 unchanged lines in 11 files lost coverage.
Overall coverage remained the same at 86.259%

Files with Coverage Reduction	New Missed Lines	%
src/varnamedvector.jl	1	88.25%
src/sampler.jl	1	94.03%
src/utils.jl	2	73.2%
src/contexts.jl	3	30.21%
src/values_as_in_model.jl	3	69.23%
src/distribution_wrappers.jl	4	41.67%
src/model.jl	5	80.0%
src/varinfo.jl	6	84.17%
src/simple_varinfo.jl	6	81.96%
src/compiler.jl	8	86.58%

Totals
Change from base Build 12993040441:	0.0%
Covered Lines:	3710
Relevant Lines:	4301

💛 - Coveralls

codecov bot commented Feb 2, 2023 •

edited

Loading

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.60%. Comparing base (6fe46ee) to head (ad4175a).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #346   +/-   ##
=======================================
  Coverage   84.60%   84.60%           
=======================================
  Files          34       34           
  Lines        3832     3832           
=======================================
  Hits         3242     3242           
  Misses        590      590

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

github-actions bot reviewed

View reviewed changes

Contributor

github-actions bot left a comment

Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit

JuliaFormatter

[JuliaFormatter] _{reported by reviewdog 🐶}

DynamicPPL.jl/benchmarks/results/release-0.30.1/benchmarks.md

Lines 714 to 715 in 0291c2f

[JuliaFormatter] _{reported by reviewdog 🐶}

DynamicPPL.jl/benchmarks/results/release-0.30.1/benchmarks.md

Line 849 in 0291c2f

[JuliaFormatter] _{reported by reviewdog 🐶}

DynamicPPL.jl/benchmarks/results/release-0.30.1/benchmarks.md

Lines 865 to 866 in 0291c2f

[JuliaFormatter] _{reported by reviewdog 🐶}

DynamicPPL.jl/benchmarks/results/release-0.30.1/benchmarks.md

Lines 1003 to 1005 in 0291c2f

[JuliaFormatter] _{reported by reviewdog 🐶}

DynamicPPL.jl/benchmarks/results/release-0.30.1/benchmarks.md

Lines 1021 to 1022 in 0291c2f

[JuliaFormatter] _{reported by reviewdog 🐶}

DynamicPPL.jl/benchmarks/results/release-0.30.1/benchmarks.md

Lines 1158 to 1160 in 0291c2f


          Merge branch 'master' of https://github.com/TuringLang/DynamicPPL.jl …

6f255d1

…into tor/benchmark-update

Member

shravanngoswamii commented Feb 1, 2025 •

edited

Loading

Could you summarise the work you've done so far? I see you've made a thing that produces some nice tables, is that built on top of this PR? Is the code on some public branch yet?

Hello @mhauru, I have updated this branch itself and added Julia script that generates the Markdown tables and also stores benchmarking report in Markdown file and JSON in results directory. Locally, generated tables is like this:

>> Running benchmarks for model: demo1
  0.013535 seconds (7.19 k allocations: 495.211 KiB, 99.84% compilation time)

>> Running benchmarks for model: demo2
  0.006908 seconds (5.35 k allocations: 361.320 KiB, 99.67% compilation time)

## DynamicPPL Benchmark Results (benchmarks_2025-02-02_04-36-46)

### Execution Environment
- Julia version: 1.10.5
- DynamicPPL version: 0.32.2
- Benchmark date: 2025-02-02T04:37:00.205

| Model | Evaluation Type                           |       Time |    Memory | Allocs | Samples |
|-------|-------------------------------------------|------------|-----------|--------|---------|
| demo1 | evaluation typed                          | 191.000 ns | 160 bytes |      3 |   10000 |
| demo1 | evaluation untyped                        |   1.029 μs |  1.52 KiB |     32 |   10000 |
| demo1 | evaluation simple varinfo dict            | 709.000 ns | 704 bytes |     26 |   10000 |
| demo1 | evaluation simple varinfo nt              |  43.000 ns |   0 bytes |      0 |   10000 |
| demo1 | evaluation simple varinfo dict from nt    |  49.000 ns |   0 bytes |      0 |   10000 |
| demo1 | evaluation simple varinfo componentarrays |  42.000 ns |   0 bytes |      0 |   10000 |
| demo2 | evaluation typed                          | 273.000 ns | 160 bytes |      3 |   10000 |
| demo2 | evaluation untyped                        |   2.570 μs |  3.47 KiB |     67 |   10000 |
| demo2 | evaluation simple varinfo dict            |   2.169 μs |  1.42 KiB |     60 |   10000 |
| demo2 | evaluation simple varinfo nt              | 136.000 ns |   0 bytes |      0 |   10000 |
| demo2 | evaluation simple varinfo dict from nt    | 122.000 ns |   0 bytes |      0 |   10000 |
| demo2 | evaluation simple varinfo componentarrays | 137.000 ns |   0 bytes |      0 |   10000 |


Benchmark results saved to: results/benchmarks_2025-02-02_04-36-46

I think it would be great to get some basic benchmarks running on GHA. Results will be variable because who knows what sort of resources the GHA runner has available, and we can't run anything very heavy, but that's okay. Just getting a GitHub comment with a table like the one you made would be helpful to spot any horrible regressions where suddenly we are allocating a lot and runtime has gone up tenfold because of some type instability.

We can just print the generated REPORT.md in comments!

Would help keep the table concise.

Do you want me to create a web interface for DynamicPPL benchmarks where we can compare multiple benchmark reports or simply see there all other benchmarks?

Member

mhauru commented Feb 6, 2025

Sorry for the slow response, I've been a bit on-and-off work this week.

I'll have a look at the code. Would you also be up for talking about this over Zoom? Could be easier. My first thought is that the table looks good and we could be close to having the first version of this done by just making those tables be autoposted on PRs. I do wonder about the accumulation of these REPORT.md files, it's nice to be able to see old results for comparison, but we might soon end up with dozens and dozens of these in the repo. Maybe there could be one file in the repo for the latest results on that branch, and you can see how benchmarks develop by checking the git history of that file? I might check what @willtebbutt has done for this in Mooncake.

Do you want me to create a web interface for DynamicPPL benchmarks where we can compare multiple benchmark reports or simply see there all other benchmarks?

Maybe at some point, but for now I think we can focus on getting a first version of this in, where it starts posting comments on PRs and helps us catch any horrible regressions, worry about fancier setups later.

Member

shravanngoswamii commented Feb 6, 2025

Sorry for the slow response, I've been a bit on-and-off work this week.

No worries at all! I’ve also been a bit slow, between exams and some hackathons recently.

Would you also be up for talking about this over Zoom?

Sure! Just let me know when you're available. I’m free anytime after 1:30 PM UTC on regular days, and anytime on Friday, Saturday, and Sunday.

My first thought is that the table looks good and we could be close to having the first version of this done by just making those tables be autoposted on PRs. I do wonder about the accumulation of these REPORT.md files, it's nice to be able to see old results for comparison, but we might soon end up with dozens and dozens of these in the repo.

Okay so I will set up a benchmarking CI for PRs and how about generating one REPORT.md for each version of DPPL? Or maybe append reports for each version in a single REPORT.md.

Member Author

torfjelde commented Feb 8, 2025

A drive-by comment: I don't think the models currently tested are that useful. These days, benchmarks should be performed with TuringBenchmarking.jl so you can track the gradient timings properly 👍

Member

mhauru commented Feb 13, 2025

Agreed that using TuringBenchmarking.jl would be good.

Some further thoughts:

Weave is unmaintained, and we no longer use it for our docs. I think we should try to move away from it. If switching to Quarto is trivial we could do that. However, this leads to the next question:
What's the value of having the results in notebooks? Could we cut code complexity and our dependencies by simply outputting JSON and/or plain text?
I think having a historical record of benchmark results from various versions isn't very valuable as long as we don't have a standardised piece of hardware and environment to run them in. And I don't think that's happening any time soon. Thus, I would see two uses for benchmarks:
- Having very crude benchmark results posted on GitHub in PR comments. Just a table like the one @shravanngoswamii posted above. The sort of benchmark where you should pay no attention to any differences that are less than ~50%, but that just alerts you to any horrible failures where either compilation or runtime has gone up in a qualitative jump. These should be lightweight enough to finish in a few minutes, to run on GHA.
- Having utilities for running more extensive benchmarks locally. If you want to compare two versions you'll have to run both of them yourself, but at least then you know you're doing a fair comparison. These can take longer, but should be runnable on a laptop in preferably substantially less than an hour.
Mooncake has a nice setup for posting comments in PRs: https://github.com/compintell/Mooncake.jl/blob/6c66347bbc50aa92959d34f3ad66b534a1e25442/.github/workflows/CI.yml#L145 We could mimic that. It would allow us to not keep any result files in the repo, which I think would be preferable.

Member

yebai commented Feb 14, 2025

I agree with @mhauru's suggestions.

@penelopeysm showed some nice examples #806 (comment). I'd suggest that we turn that into a CI workflow. It is also a good idea to keep these benchmarks useful for DynamicPPL developers rather than for the general audience.

Member

mhauru commented Feb 14, 2025 •

edited by shravanngoswamii

Loading

@shravanngoswamii and I just had a call to discuss this. He helped me understand how the current code works, and we decided on the following action items:

Switch use of @benchmarkable within DynamicPPLBenchmarks.jl to a suitable function call from TuringBenchmarking.jl. I think make_turing_suite is the function we need.
Remove everything related to Weave documents. Let's make this work with plain text tables first and consider fancier Quarto things later if we feel like it. If there's something in the functions and files that we delete that we may want to come back to using later, maybe make note of it so we know to dig it up from git history when needed.
Set up CI to post a table of benchmark results to GitHub PRs without storing any files in the repo, mimicing Mooncake.
Add functionality to benchmarks.jl to choose combinations of model, AD backend, and varinfo type to run benchmarks on. Note that we don't want to test all models on all backends and all varinfos (too many benchmarks), so we need to be able to manually pick the combinations we want.
Curate a list of model - AD backend - varinfo combinations that we want to benchmark on.

I'll take the last item of that list, @shravanngoswamii will take on the others and I'm available for help whenever needed.

The goal is to have a small set of quick, crude benchmarks that you can run locally and get output as plain text (or maybe JSON if we feel like it) and that runs automatically on GHA and posts comments with a results table on PRs. We can then later add more features if/when we want them, such as

A standardised set of more comprehensive benchmarks that one can run locally.
Quarto output.

shravanngoswamii added 2 commits

February 20, 2025 14:49


          updated benchmarking setup

3b5e448


          Merge branch 'master' of https://github.com/TuringLang/DynamicPPL.jl …

1e61025

…into tor/benchmark-update

Member

shravanngoswamii commented Feb 24, 2025

@mhauru I don't know if the current approach I used is correct or not, just have a look at it let me know whatever changes are required!

mhauru requested changes

View reviewed changes

Member

mhauru left a comment

Thanks @shravanngoswamii, the overall structure and approach here looks good. I had one bug fix to propose, and then some style points and simplifications.

I'll also start making a list of models to test. Would you like for me to push changes to the models to this same PR, or make a PR into this PR?

benchmarks/src/DynamicPPLBenchmarks.jl Outdated Show resolved Hide resolved

benchmarks/benchmarks.jl Outdated Show resolved Hide resolved

benchmarks/src/DynamicPPLBenchmarks.jl Outdated Show resolved Hide resolved

benchmarks/benchmarks.jl Outdated Show resolved Hide resolved

benchmarks/benchmarks.jl Outdated Show resolved Hide resolved

benchmarks/benchmarks.jl Outdated Show resolved Hide resolved

benchmarks/src/DynamicPPLBenchmarks.jl Outdated Show resolved Hide resolved

benchmarks/benchmarks.jl Outdated Show resolved Hide resolved

shravanngoswamii added 4 commits

February 28, 2025 01:31


          applied suggested changes

640aa45


          Merge branch 'master' of https://github.com/TuringLang/DynamicPPL.jl …

3bdbe40

…into tor/benchmark-update


          updated benchmarks/README.md

d8fd05c


          setup benchmarking CI

c34e489

Contributor

github-actions bot commented Feb 27, 2025 •

edited

Loading

Computer Information

Julia Version 1.11.3
Commit d63adeda50d (2025-01-21 19:42 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

Benchmark Report

Model	AD Backend	VarInfo Type	Linked	Eval Time / Ref Time	AD Time / Eval Time
Simple assume observe	forwarddiff	typed	false	8.5	1.5
Smorgasbord	forwarddiff	typed	false	1469.3	30.3
Smorgasbord	forwarddiff	simple_namedtuple	true	878.1	42.7
Smorgasbord	forwarddiff	untyped	true	2395.4	20.6
Smorgasbord	forwarddiff	simple_dict	true	1794.9	29.7
Smorgasbord	reversediff	typed	true	1914.9	24.8
Loop univariate 1k	reversediff	typed	true	5643.4	47.6
Multivariate 1k	reversediff	typed	true	1116.9	68.2
Loop univariate 10k	reversediff	typed	true	63274.1	45.2
Multivariate 10k	reversediff	typed	true	8939.1	85.7
Dynamic	reversediff	typed	true	127.7	35.9
Submodel	reversediff	typed	true	26.4	12.3
LDA	reversediff	typed	true	377.1	5.8

Member

shravanngoswamii commented Feb 27, 2025 •

edited

Loading

the overall structure and approach here looks good. I had one bug fix to propose, and then some style points and simplifications.

Thanks for suggestions @mhauru, I have updated the code with all your suggestions and also added CI for commenting on PRs, above comment is generated from it.

Let me know if I should change/add anything else in it!

I'll also start making a list of models to test. Would you like for me to push changes to the models to this same PR, or make a PR into this PR?

Feel free to push changes to the models in this same PR itself!


          Merge remote-tracking branch 'origin/main' into tor/benchmark-update

1d1b11e

github-actions bot reviewed

View reviewed changes

benchmarks/README.md

-                  •  Rest of the passed kwargs will be passed on to Weave.weave.
-              ```
+              julia --project=benchmarks benchmarks/benchmarks.jl
+              ```

Contributor

github-actions bot Mar 3, 2025

[JuliaFormatter] _{reported by reviewdog 🐶}

Suggested change

mhauru mentioned this pull request

Update benchmark models #826

Merged

Member

mhauru commented Mar 3, 2025

@shravanngoswamii, nice, the automated comment looks really good. I actually did end up putting my changes in a PR into this PR, because I changed a few small things other than just the models. Please review #826 and let me know what you think.

Member

mhauru commented Mar 3, 2025

Hmm, something is going wrong with the GitHub action, you can see that it's installing the latest compatible DPPL version rather than the current version from this PR.


          Update benchmark models (#826)

ad4175a

* Update models to benchmark plus small style changes

* Make benchmark times relative. Add benchmark documentation.

* Choose whether to show linked or unlinked benchmark times

* Make table header more concise

Member

shravanngoswamii commented Mar 3, 2025 •

edited

Loading

Hmm, something is going wrong with the GitHub action, you can see that it's installing the latest compatible DPPL version rather than the current version from this PR.

https://github.com/TuringLang/DynamicPPL.jl/blob/ad4175a1bdfd5d78f05fc049a2f4a1452473599f/.github/workflows/Benchmarking.yml#L21C1-L22C75

Should this not use current version? How do we use the current version from this PR?

Member

mhauru commented Mar 3, 2025

That line will install the dependencies from benchmarks/Project.toml, one of which is DPPL, but it'll just pull it from the Julia package repository. I think we need something like this line in Mooncake to make it use the DPPL from the current folder instead: https://github.com/compintell/Mooncake.jl/blob/6c66347bbc50aa92959d34f3ad66b534a1e25442/bench/run_benchmarks.jl#L2

The way we'll know we are using the local version of DPPL is that we'll see the CI job crash with a complaint that there are incompatible version constraints. :) That's because currently there isn't a version of TuringBenchmarking that works with the latest DPPL version, I'll need to fix that.

This will actually be a bit annoying in general: Every time we make a new breaking version bump of DPPL the benchmarks will stop working until we also make a release of TuringBenchmarking that supports the new DPPL version. It's similar to the problem we had earlier with Turing.jl integration tests. @penelopeysm, we didn't come up with a nice solution for this did we? I can't really see a way out other than to make a TuringBenchmarking release in tandem with the DPPL release, or to not use TuringBenchmarking, which would be a shame.

Member

shravanngoswamii commented Mar 3, 2025

I think we need something like this line in Mooncake to make it use the DPPL from the current folder instead: https://github.com/compintell/Mooncake.jl/blob/6c66347bbc50aa92959d34f3ad66b534a1e25442/bench/run_benchmarks.jl#L2

Okay, got it!

The way we'll know we are using the local version of DPPL is that we'll see the CI job crash with a complaint that there are incompatible version constraints. :) That's because currently there isn't a version of TuringBenchmarking that works with the latest DPPL version, I'll need to fix that.

Let me know, how can I help further!

Member

penelopeysm commented Mar 3, 2025

we didn't come up with a nice solution for this did we

Nope. If you want to use TuringBenchmarking.jl (or any reverse dependency of DynamicPPL), then the benchmarks will stop working whenever you have a minor release of DynamicPPL.

You can get around this by bumping compat versions on TuringBenchmarking whenever DynamicPPL is released. The issue with this is that you can't do this until the minor version has been released, which means that if you want to e.g. open a PR that proposes to bump a minor version, you can't run benchmarks on that branch. And this is bad news because this is perhaps the most important scenario in which you do want to run benchmarks.

I would personally recommend to not use reverse dependencies of DynamicPPL as far as possible. In practice, this probably means copying the relevant code from TuringBenchmarking into benchmarks/src. If the amount of code duplication needed isn't very large, I would even consider putting it into DynamicPPL.jl itself, maybe as a benchmarking extension.

Member

penelopeysm commented Mar 3, 2025 •

edited

Loading

There's a similar option, which I proposed a while ago in the context of testing, but we didn't do it: basically split the DPPL github repository into multiple Julia packages, one called DynamicPPL.jl, one called DynamicPPLBenchmarks.jl. Then move the TuringBenchmarking code into DynamicPPLBenchmarks.jl.

This keeps everything within the same repo so that when you have a new minor version of DPPL you can also just update the compat entry in DPPLB 😄

Benefits of this:

don't shove everything into DynamicPPL.jl the package
but keep everything in the same repo so that CI can be satisfied

Downsides of this:

might be a bit annoying to maintain

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet