-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Performance/ Benchmarking/ Scaling #12
Comments
This is a good question and request. I did a small scale benchmarking for the DQC paper in here. However, it only tests small molecules (the largest one was C6H8O6 with density fitting).
If you'd like to do an extensive benchmark, it would be great! We can list the bottlenecks here, so start working on improving the performance. |
This is a great start, thanks for the link to the notebook, I hadn't seen that yet!
Ah ok - thanks for the heads up - I will look out for that.
Yeah, I'm up for trying. Maybe let's start listing out what you'd like to see in the benchmarking and deciding what tools we want to use. I have used airspeed velocity (asv) for previous benchmarking efforts. Have you used asv? Do you have a strong preference for another benchmarking tool? There's a blog post I really like about using asv for continuous benchmarking in CI, which is maybe more advanced than what we need right now, but it is nice to see how using asv can scale to more advanced applications if you need. I could imagine creating a folder of asv compatible tests that might look like this one from the scikit-image benchmarks. Another nice feature of asv is that they have a nice syntax for parameterized benchmarks and multiparameter benchmarks which could be useful for addressing the scaling questions I mentioned above. Let me know what you recommend as a best next step. |
I've always wanted to use asv but had no time (and enough motivation) to actually learn and use it, so it is a happy coincident!
|
Great! I have begun work with a very simple proof of concept PR at #13.
I think I will probably wait on this stuff if that's ok. Maybe we get a set of benchmarks that we like and can run locally first and then worry about CI integration / automatically updating results.
Ok yeah, we can either include those a different parameters or as different benchmarks. It might depend a little on how we end up structuring the benchmarks
This makes a lot of sense - I might need to help specifying that - I didn't see anywhere obvious to me in the API where I could control that. |
Is your feature request related to a problem? Please describe.
This is maybe more of a question than a feature request - but it could turn into one.
I saw you've got one of two files in the repo around benchmarks - like time_forward.py - but I was wondering if you have done or are interested in doing more widespread and systematic benchmarking - particularly with respect to how performance scales with variables like number of atoms for different basis sets on cpu and gpu.
I'm wondering how you expect performance to compare to a library like PySCF just knowing the architecture of dqc? I'm also curious if performance / scale we're things you were interested in or even motivated by when developing dqc too?
I havn't tried any comparisons yet and don't really know enough about the actual implementations to have any expectations one way or another. There is a little bit of PySCF benchmark data that could be compared to, or we could compute our own.
Ideally I'd like to push to as large systems as possible, but I am so new to this space that I'm really not sure what is possible. If I could do things on the scale of amino acids (~20 atoms) that would be a nice start - getting to ~100 atoms would be even better, and so it continues!
Describe the solution you'd like
I could imagine developing a benchmarking suite of both basic calculations and calculations of properties (like IR spectra) and a set of molecules of increasing size and then measuring calculation time for a variety of different basis sets and hardware configurations, cpu/ gpu. The goal would be to assess performance as molecule size increased.
Describe alternatives you've considered
I could do ad-hoc testing using basic timing functionality to try and build up an intuitive feel for scaling performance
Additional context
If this was something you were interested in I'd appreciate your help in designing the best approach to benchmarking as I am so new to this space.
If benchmarking is successful one can then start doing profiling to try and identify performance gaps and ultimately work on improving performance
The text was updated successfully, but these errors were encountered: