-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements to DynamicPPLBenchmarks #346
base: main
Are you sure you want to change the base?
Conversation
… for downstream tasks
This might be helpful for running benchmarks via CI - https://github.com/tkf/BenchmarkCI.jl |
@torfjelde should we improve this PR by incorporating Also, https://github.com/TuringLang/TuringExamples contains some very old benchmarking code. |
Pull Request Test Coverage Report for Build 13093265728Details
💛 - Coveralls |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #346 +/- ##
=======================================
Coverage 84.60% 84.60%
=======================================
Files 34 34
Lines 3832 3832
=======================================
Hits 3242 3242
Misses 590 590 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit
JuliaFormatter
[JuliaFormatter] reported by reviewdog 🐶
DynamicPPL.jl/benchmarks/results/release-0.30.1/benchmarks.md
Lines 714 to 715 in 0291c2f
[JuliaFormatter] reported by reviewdog 🐶
[JuliaFormatter] reported by reviewdog 🐶
DynamicPPL.jl/benchmarks/results/release-0.30.1/benchmarks.md
Lines 865 to 866 in 0291c2f
[JuliaFormatter] reported by reviewdog 🐶
DynamicPPL.jl/benchmarks/results/release-0.30.1/benchmarks.md
Lines 1003 to 1005 in 0291c2f
[JuliaFormatter] reported by reviewdog 🐶
DynamicPPL.jl/benchmarks/results/release-0.30.1/benchmarks.md
Lines 1021 to 1022 in 0291c2f
[JuliaFormatter] reported by reviewdog 🐶
DynamicPPL.jl/benchmarks/results/release-0.30.1/benchmarks.md
Lines 1158 to 1160 in 0291c2f
…into tor/benchmark-update
Hello @mhauru, I have updated this branch itself and added Julia script that generates the Markdown tables and also stores benchmarking report in Markdown file and JSON in results directory. Locally, generated tables is like this: >> Running benchmarks for model: demo1
0.013535 seconds (7.19 k allocations: 495.211 KiB, 99.84% compilation time)
>> Running benchmarks for model: demo2
0.006908 seconds (5.35 k allocations: 361.320 KiB, 99.67% compilation time)
## DynamicPPL Benchmark Results (benchmarks_2025-02-02_04-36-46)
### Execution Environment
- Julia version: 1.10.5
- DynamicPPL version: 0.32.2
- Benchmark date: 2025-02-02T04:37:00.205
| Model | Evaluation Type | Time | Memory | Allocs | Samples |
|-------|-------------------------------------------|------------|-----------|--------|---------|
| demo1 | evaluation typed | 191.000 ns | 160 bytes | 3 | 10000 |
| demo1 | evaluation untyped | 1.029 μs | 1.52 KiB | 32 | 10000 |
| demo1 | evaluation simple varinfo dict | 709.000 ns | 704 bytes | 26 | 10000 |
| demo1 | evaluation simple varinfo nt | 43.000 ns | 0 bytes | 0 | 10000 |
| demo1 | evaluation simple varinfo dict from nt | 49.000 ns | 0 bytes | 0 | 10000 |
| demo1 | evaluation simple varinfo componentarrays | 42.000 ns | 0 bytes | 0 | 10000 |
| demo2 | evaluation typed | 273.000 ns | 160 bytes | 3 | 10000 |
| demo2 | evaluation untyped | 2.570 μs | 3.47 KiB | 67 | 10000 |
| demo2 | evaluation simple varinfo dict | 2.169 μs | 1.42 KiB | 60 | 10000 |
| demo2 | evaluation simple varinfo nt | 136.000 ns | 0 bytes | 0 | 10000 |
| demo2 | evaluation simple varinfo dict from nt | 122.000 ns | 0 bytes | 0 | 10000 |
| demo2 | evaluation simple varinfo componentarrays | 137.000 ns | 0 bytes | 0 | 10000 |
Benchmark results saved to: results/benchmarks_2025-02-02_04-36-46
We can just print the generated REPORT.md in comments!
Do you want me to create a web interface for DynamicPPL benchmarks where we can compare multiple benchmark reports or simply see there all other benchmarks? |
Sorry for the slow response, I've been a bit on-and-off work this week. I'll have a look at the code. Would you also be up for talking about this over Zoom? Could be easier. My first thought is that the table looks good and we could be close to having the first version of this done by just making those tables be autoposted on PRs. I do wonder about the accumulation of these REPORT.md files, it's nice to be able to see old results for comparison, but we might soon end up with dozens and dozens of these in the repo. Maybe there could be one file in the repo for the latest results on that branch, and you can see how benchmarks develop by checking the git history of that file? I might check what @willtebbutt has done for this in Mooncake.
Maybe at some point, but for now I think we can focus on getting a first version of this in, where it starts posting comments on PRs and helps us catch any horrible regressions, worry about fancier setups later. |
No worries at all! I’ve also been a bit slow, between exams and some hackathons recently.
Sure! Just let me know when you're available. I’m free anytime after 1:30 PM UTC on regular days, and anytime on Friday, Saturday, and Sunday.
Okay so I will set up a benchmarking CI for PRs and how about generating one REPORT.md for each version of DPPL? Or maybe append reports for each version in a single REPORT.md. |
A drive-by comment: I don't think the models currently tested are that useful. These days, benchmarks should be performed with TuringBenchmarking.jl so you can track the gradient timings properly 👍 |
Agreed that using TuringBenchmarking.jl would be good. Some further thoughts:
|
I agree with @mhauru's suggestions. @penelopeysm showed some nice examples #806 (comment). I'd suggest that we turn that into a CI workflow. It is also a good idea to keep these benchmarks useful for DynamicPPL developers rather than for the general audience. |
@shravanngoswamii and I just had a call to discuss this. He helped me understand how the current code works, and we decided on the following action items:
I'll take the last item of that list, @shravanngoswamii will take on the others and I'm available for help whenever needed. The goal is to have a small set of quick, crude benchmarks that you can run locally and get output as plain text (or maybe JSON if we feel like it) and that runs automatically on GHA and posts comments with a results table on PRs. We can then later add more features if/when we want them, such as
|
…into tor/benchmark-update
@mhauru I don't know if the current approach I used is correct or not, just have a look at it let me know whatever changes are required! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @shravanngoswamii, the overall structure and approach here looks good. I had one bug fix to propose, and then some style points and simplifications.
I'll also start making a list of models to test. Would you like for me to push changes to the models to this same PR, or make a PR into this PR?
Computer Information
Benchmark Report
|
Thanks for suggestions @mhauru, I have updated the code with all your suggestions and also added CI for commenting on PRs, above comment is generated from it. Let me know if I should change/add anything else in it!
Feel free to push changes to the models in this same PR itself! |
• Rest of the passed kwargs will be passed on to Weave.weave. | ||
``` | ||
julia --project=benchmarks benchmarks/benchmarks.jl | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[JuliaFormatter] reported by reviewdog 🐶
``` | |
``` |
@shravanngoswamii, nice, the automated comment looks really good. I actually did end up putting my changes in a PR into this PR, because I changed a few small things other than just the models. Please review #826 and let me know what you think. |
Hmm, something is going wrong with the GitHub action, you can see that it's installing the latest compatible DPPL version rather than the current version from this PR. |
* Update models to benchmark plus small style changes * Make benchmark times relative. Add benchmark documentation. * Choose whether to show linked or unlinked benchmark times * Make table header more concise
Should this not use current version? How do we use the current version from this PR? |
That line will install the dependencies from The way we'll know we are using the local version of DPPL is that we'll see the CI job crash with a complaint that there are incompatible version constraints. :) That's because currently there isn't a version of TuringBenchmarking that works with the latest DPPL version, I'll need to fix that. This will actually be a bit annoying in general: Every time we make a new breaking version bump of DPPL the benchmarks will stop working until we also make a release of TuringBenchmarking that supports the new DPPL version. It's similar to the problem we had earlier with Turing.jl integration tests. @penelopeysm, we didn't come up with a nice solution for this did we? I can't really see a way out other than to make a TuringBenchmarking release in tandem with the DPPL release, or to not use TuringBenchmarking, which would be a shame. |
Okay, got it!
Let me know, how can I help further! |
Nope. If you want to use TuringBenchmarking.jl (or any reverse dependency of DynamicPPL), then the benchmarks will stop working whenever you have a minor release of DynamicPPL. You can get around this by bumping compat versions on TuringBenchmarking whenever DynamicPPL is released. The issue with this is that you can't do this until the minor version has been released, which means that if you want to e.g. open a PR that proposes to bump a minor version, you can't run benchmarks on that branch. And this is bad news because this is perhaps the most important scenario in which you do want to run benchmarks. I would personally recommend to not use reverse dependencies of DynamicPPL as far as possible. In practice, this probably means copying the relevant code from TuringBenchmarking into |
There's a similar option, which I proposed a while ago in the context of testing, but we didn't do it: basically split the DPPL github repository into multiple Julia packages, one called DynamicPPL.jl, one called DynamicPPLBenchmarks.jl. Then move the TuringBenchmarking code into DynamicPPLBenchmarks.jl. This keeps everything within the same repo so that when you have a new minor version of DPPL you can also just update the compat entry in DPPLB 😄 Benefits of this:
Downsides of this:
|
Produces results such as can be seen here: #309 (comment)