-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enabling of parallelization of analysis.atomicdistances.AtomicDistances
#4822
base: develop
Are you sure you want to change the base?
Enabling of parallelization of analysis.atomicdistances.AtomicDistances
#4822
Conversation
changed to be applicable with Results
adjusted pytestst
removed unnecessary results
Hello @talagayev! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2024-12-12 23:12:15 UTC |
added documentation and adjusted output
removed the .distances
adjusted changelog
pep fixes
pep fixes
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #4822 +/- ##
===========================================
- Coverage 93.68% 93.66% -0.03%
===========================================
Files 177 189 +12
Lines 21743 22813 +1070
Branches 3055 3055
===========================================
+ Hits 20370 21367 +997
- Misses 927 1000 +73
Partials 446 446 ☔ View full report in Codecov by Sentry. |
|
||
def _conclude(self): | ||
# adjust self.results to self.results.distances | ||
self.results = self.results.distances |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm just confused but it looks to me like the tests haven't actually changed yet (just formatting changes as far as I can tell?) and this just restored the original behavior of self.results
storing a NumPy array again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, first I modified it to have the output would be self.results.distances
, but then I thought that for the convenience to not modify the docs, where it uses my_dists.results
to my_dists.results.distances
I could define self.results
as self.results.distances
to not have to change the docs and still be able to implement the parallelization of the class.
As for the pytest
changes, first I had the self.results.distances
output and thus the tests needed to have also the .distances
. The Idea to have self.results
came after creating the PR 🙈, but yes I will then remove the mention of the pytest
modification, was quite late when I finished up the PR, so didn't think about that :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a hack!
If this the way we want to go then you have to write more commentary to say what you're doing and why, with references to issue and a note that this needs to be changed for 3.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was confused because the PR supposedly fixes #4819 but it does not.
Instead it provides a workaround for the problem identified in PR #4808 that we cannot currently parallelize AtomicDistances.
I would be ok with this hack to make parallelization work for 2.9.0 and then make the switch (fix for #4819) before 3.0. I'd like this PR to be clearer about what it does.
I'd also like to hear some other opinions, e.g., @tylerjereddy @marinegor @IAlibay .
package/CHANGELOG
Outdated
@@ -23,6 +24,8 @@ Fixes | |||
Enhancements | |||
* Add check and warning for empty (all zero) coordinates in RDKit converter (PR #4824) | |||
* Added `precision` for XYZWriter (Issue #4775, PR #4771) | |||
* Incorporation of `Results` into `analysis.atomicdistances.AtomicDistances` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a Breaking Change as it breaks code. If we put it in 2.9.0 then it has to go under Fix, though.
For right now, please put under Fixes and state clearly how it breaks existing code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... assuming a breaking change.
If this PR is done without breaking changes (ie keeping self.results
as array at the end) then do not reference the issue (which says that we need to have self.results.distances
)
Implementation of `Results` into the class | ||
for application in parallelization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to be clearer about what was done:
Implementation of `Results` into the class | |
for application in parallelization. | |
Changed storage of results from :attr:`results` to | |
:attr:`results.distances`. This fix **breaks old | |
code** that expects to find the data array in | |
:attr:`results`. |
... assuming a breaking change.
@@ -145,6 +146,10 @@ class AtomicDistances(AnalysisBase): | |||
|
|||
|
|||
.. versionadded:: 2.5.0 | |||
|
|||
.. versionchanged:: 2.9.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the docs above!!!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... assuming a breaking change.
|
||
def _conclude(self): | ||
# adjust self.results to self.results.distances | ||
self.results = self.results.distances |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a hack!
If this the way we want to go then you have to write more commentary to say what you're doing and why, with references to issue and a note that this needs to be changed for 3.0.
Yes my bad, I thought that it would fix #4819 but I think I missunderstood it, since I thought having it work with parallelization would fix the Issue, since it would have something that the |
I will then rename the PR more like enabling the parallelization of |
Results
into analysis.atomicdistances.AtomicDistances
analysis.atomicdistances.AtomicDistances
renamed change
changed docs
Yes, this is why I went silent on my review activity--I couldn't decide if I was just too out of the loop on the parallel business or if this didn't do what the targeted ticket asked for. I couldn't work out how the changes addressed the original problem so figured I may have just not understood. My original understanding was that there was supposed to be some kind of results-related container object that may contain "slots" for specific results, and it was being incorrectly used to store a specific result itself. If I understand this correctly, is it really that much more work to do it the "right" way? Or just a compatibility concern? |
It would be easy to do it right, just as you say — just not add the line that you commented on ( My opinion (as stated in #4819 (comment) ) is that I'd rather fix the API ("doing it right") and call the break a fix. |
I second Oliver on this -- I'd rather fix the non-compliant class and add the new functionality later, than add some temporary hacks. If it's urgent to add the distances parallelization right now (which I doubt), it's always possible to add it as an mdakit which you'll deprecate when the upstream fixes the issue. |
Just wanted to say that I see the ping here but I haven't read anything and won't until the latter half of next week - apologies it's a busy end of year period. |
While giving Irfan more time, let me briefly reply to @marinegor #4822 (comment) :
If we decide to fix the non-compliant class (#4819 ) then the parallelization is already free — this PR already does it. With the hack in this PR we could also have the parallelization and keep the old behavior of the class and then remove the one line that preserves the old API when we're ready for 3.0.
It's not urgent but it's also hard to communicate to the outside world why something as simple as distance calculations cannot be immediately parallelized — it just looks a bit amateurish. As a sidenote: MDAKits are not "free", they still require maintenance by someone and especially importing MDAKits in MDA as a transitional measure is particularly annoying because of package dependency issues. We are trying to get away from that pattern as quickly as possible. |
oh, that makes sense. Though I guess the explanation about API consistency could do most of the explaining. Also, a side note: it could be relatively low priority to parallelize specifically distances, since I guess it's relatively fast already, and reading from the disk would be a bottleneck for it, rather than the computation itself. |
Changes made in this Pull Request:
self.results
andself.results.distances
inanalysis.atomicdistances.AtomicDistances
for the use ofResults
PR Checklist
Developers certificate of origin
📚 Documentation preview 📚: https://mdanalysis--4822.org.readthedocs.build/en/4822/