-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance by avoiding loading the GMT library repeatedly #2930
Conversation
CodSpeed Performance ReportMerging #2930 will not alter performanceComparing Summary
|
1f3d922
to
80f1d79
Compare
80f1d79
to
51f3943
Compare
I'm a liitle confused why we don't see some improvements in the benchmark report above. |
If you click into the link - https://codspeed.io/GenericMappingTools/pygmt/branches/clib/load-libgmt, it shows that most tests improve by 1-2%. Since most unit tests only call a few GMT commands, they shouldn't show too much of a gain, but still, I think it's worth it. Maybe wait for #2924? There are a few clib/session related tests in that PR which might show a clearer improvement. |
Just tried to benchmark locally. In the main branch:
And in this branch:
So, the |
I've added a unit test in commit f589ff8. The unit test passes in this branch and fails if in the main branch:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This solution may help address issue #217, but it's unclear what exactly we should do and it definitely needs big refactors, so we can explore the solution later.
I can see the value of not loading GMT repeatedly in terms of speed, but I'm still a little unsure about making this a global variable. There are also issues like #1242 and #1582 where there has been flakiness with using a single GMT session (though the changes here aren't at the session level, but on the libgmt loading level), and we'll need to come up with some really good unit tests to make sure that there isn't any side effects from this change.
Could we make a unit test based on #217 (comment), and test to see if the change in this PR allows for this multiprocessing/parallel code to work:
import pygmt
import multiprocessing as mp
def gmt_func(n):
fig = pygmt.Figure()
fig.coast(...)
fig.show()
# gmt_func(n=5) #calls the function without multiprocessing - uncomment as desired to see that the function works on its own
with mp.Pool(2) as p: #calls the process with multiprocessing, using 2 cores
a = p.map(gmt_func, [x for x in range(0,2)])
@@ -208,6 +209,45 @@ def test_brokenlib_brokenlib_workinglib(self): | |||
assert check_libgmt(load_libgmt(lib_fullnames=lib_fullnames)) is None | |||
|
|||
|
|||
class TestLibgmtCount: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe better to put this unit test in test_session_management.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer to keep it in test_clib_loading.py
because this unit test is actually not related to session management.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I kinda wanted to move the test since test_clib_loading.py
has 300+ lines of code, while test_session_management.py
has <100 lines, and we're kinda checking that Session()
doesn't reload libgmt here, but up to you 🙂
No, it definitely won't work. To allow multiprocessing for PyGMT, we need to call |
Hmm you're right. Is it ok for each process to have its own 'global' libgmt though? I suppose the changes here are only affecting things on the initial 'load GMT library' library, and not the 'GMT session' level, so it's probably ok? |
Co-authored-by: Wei Ji <[email protected]>
970e4bb
to
48c00ae
Compare
In 48c00ae, I've added a test to make sure that the workaround in #217 still works after this change. But, as mentioned in #217 (comment), the workaround doesn't work for Windows, so it's marked as |
I've decided to remove this multiprocessing test from this PR, and will add it in PR #2938 and also fix the Windows issue. |
@@ -208,6 +209,45 @@ def test_brokenlib_brokenlib_workinglib(self): | |||
assert check_libgmt(load_libgmt(lib_fullnames=lib_fullnames)) is None | |||
|
|||
|
|||
class TestLibgmtCount: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I kinda wanted to move the test since test_clib_loading.py
has 300+ lines of code, while test_session_management.py
has <100 lines, and we're kinda checking that Session()
doesn't reload libgmt here, but up to you 🙂
Description of proposed changes
The
load_libgmt
function can search for the GMT library and load it so that we can call GMT API functions in PyGMT. The function is called in theSession
class'sget_libgmt_func
function.Although the
get_libgmt_func
function is called multiple times inSession
's different methods, the GMT library is only loaded once for each session, as shown below:pygmt/pygmt/clib/session.py
Lines 310 to 311 in bb82345
However, when wrapping GMT modules, we always create a new session and then call the GMT module like this
It means the GMT library is searched and loaded multiple times, as reported in #867.
To verify if the GMT library is loaded multiple times, we just need to add a simple print statement like
print("Loading GMT library")
at the start of theload_libgmt
function, and then call PyGMT. Here is what you'll see. It's clear that the GMT library are loaded multiple times:It's unnecessary to search for and load the GMT library multiple times, which is actually time-consuming (120 ms for me on macOS).
This PR fixes the issue by moving the
load_libgmt
call outside theSession
class, so that the GMT library is only searched for and loaded once when importing pygmt.This PR addresses the following comment in PR #867 so it fixes #867:
#867 also proposed another alternative solution:
This solution may help address issue #217, but it's unclear what exactly we should do and it definitely needs big refactors, so we can explore the solution later.
It's worth noting that the small changes in this PR speed up our Tests significantly! For example, the "Run tests" step now only takes 85 seconds (on Linux) compared to 250 seconds in the main branch.