-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize GCLDAModel with numba
and Parallel
#744
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportBase: 88.56% // Head: 87.68% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #744 +/- ##
==========================================
- Coverage 88.56% 87.68% -0.88%
==========================================
Files 38 35 -3
Lines 4371 3913 -458
==========================================
- Hits 3871 3431 -440
+ Misses 500 482 -18
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
With the changes to the two update methods, it seems like they could be converted to functions instead. Is there any benefit to using |
You're right. The two update methods can be converted to functions. I don't see any benefit in using |
nimare/annotate/gclda.py
Outdated
# Float precision issue in Numba: https://github.com/numba/numba/issues/3426 | ||
probs_pdf = np.trunc(probs_pdf * (10**12)) / (10**12) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line will need to be removed in the future. It is ensuring the sum never goes over 1.0 so that we don't get this error in numba, but this truncation is causing the maps on the example to look different to the ones from previous versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was running some tests with Neurosynth and Neuroquery, and got the error again:
File "<...>/lib/python3.9/site-packages/nimare/annotate/gclda.py", line 622, in fit
self._update(loglikely_freq=loglikely_freq)
File "<...>/lib/python3.9/site-packages/nimare/annotate/gclda.py", line 681, in _update
) = _update_peak_assignments(
File "<...>/lib/python3.9/site-packages/numba/cpython/randomimpl.py", line 833, in binomial_impl
raise ValueError("binomial(): p outside of [0, 1]")
ValueError: binomial(): p outside of [0, 1]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would np.clip help here?
during normalization, you could do this:
probs_pdf = (probs_pdf / np.sum(probs_pdf)).clip(0, 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried using clip
but it still raised the same error. I think it has to do with the way numba
has the sum implemented.
Given these limitations (precision issue with sum, and codecov of jitted functions), do you think we should try Cython for these two functions? |
I think @jdkent or @adelavega would be better able to answer that. |
nimare/annotate/gclda.py
Outdated
@@ -14,6 +15,247 @@ | |||
LGR = logging.getLogger(__name__) | |||
|
|||
|
|||
@jit(nopython=True, cache=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would codecov track this function if nopython=False
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. Let me try this out.
@adelavega
|
nimare/annotate/gclda.py
Outdated
# from the sampling distribution | ||
# Returns a binary [1 x R*T] vector with a '1' in location that was sampled | ||
# and zeros everywhere else | ||
assignment_vec = np.random.multinomial(1, probs_pdf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation says to "use the multinomial
method of a default_rng()
instance instead". If we're lucky, that might also be faster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried different combinations with:
rng = np.random.default_rng()
assignment_vec = rng.multinomial(1, probs_pdf)
and everything has failed in numba.
Using the correction proposed in this issue https://github.com/numba/numba/issues/3426
, the current assignment_vec = np.random.multinomial(1, probs_pdf)
is working fine without truncating probs_pdf
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that's good to hear. I'd love to hear what the speed up is using numba on a large-ish dataset. When you get a chance to run some bench marks, ping me.
We then need to decide if its worth the "risk" to fork numba. It is unfortunately a maintenance problem potentially.
@adelavega Running times per iteration for 200 topics:
Typically we have 10,000 iterations, so to train GCLDA on Neurosynth with |
The latest changes showed improvement in Creating chunks ( However, this is only scalable to a small number of CPUs. I got small linear scalability for only <= 4 CPUs in a local machine and <= 8 CPUs in an HPC. What do you all think? Should we:
I think number 1 is good, since using numba for |
@JulioAPeraza awesome! q: why is it only scalable to a small number of CPUs? I don't quite understand that. but given that limitation, I agree with your reasoning and think #1 is a good option. That's an order of magnitude so not bad. |
@adelavega |
Ah, disk i/o would certainly explain it. I wonder using "threads" would also help: https://joblib.readthedocs.io/en/latest/parallel.html#thread-based-parallelism-vs-process-based-parallelism |
numba
and Parallel
I tested the scalability of the function with more CPUs in the HPC and it looks good. I also ran some tests using This PR is ready for review. Thanks! |
Do you know if explicitly setting https://numba.readthedocs.io/en/stable/user/parallel.html It looks like the main concern would be the same matrices are being access by the document loop, but it looks like it would be different parts of the arrays? |
Thanks, @adelavega.
I tried |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a minor suggestion for readability for me.
nimare/annotate/gclda.py
Outdated
[ | ||
self.data["ptoken_doc_idx"][chunk_i : chunk_i + chunk_size], | ||
old_regions[chunk_i : chunk_i + chunk_size], | ||
old_topics[chunk_i : chunk_i + chunk_size], | ||
peak_probs[chunk_i : chunk_i + chunk_size, :, :], | ||
seeds[chunk_i : chunk_i + chunk_size], | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be a dictionary so that the arguments being sent to _update_peak_assignments
are explicit?
nimare/annotate/gclda.py
Outdated
|
||
results = Parallel(n_jobs=self.n_cores, max_nbytes=None)( | ||
delayed(_update_peak_assignments)( | ||
*chunk, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with above, this would now be **chunk
nimare/annotate/gclda.py
Outdated
self.topics["n_peak_tokens_region_by_topic"], | ||
self.topics["n_peak_tokens_doc_by_topic"], | ||
self.topics["n_word_tokens_doc_by_topic"], | ||
self.params["n_regions"], | ||
self.params["n_topics"], | ||
self.params["alpha"], | ||
self.params["delta"], | ||
self.params["gamma"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a bit redundant, but now the arguments here should be named too (e.g., `gamma = self.params["gamma"]
just want to make note of the current gclda example and the one in this pull request: current decoding ROI: pull request decoding ROI: I suppose I do not know how stochastic the results are on this test dataset since it is very small. EDIT: consistent since first change in this pull request: https://nimare--744.org.readthedocs.build/en/744/auto_examples/03_annotation/04_plot_gclda.html#sphx-glr-auto-examples-03-annotation-04-plot-gclda-py |
nimare/annotate/gclda.py
Outdated
len(self.vocabulary), | ||
self.params["beta"], | ||
self.params["gamma"], | ||
self.seed, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that earlier int he code self.seed = 0
rather than self.seed = self.params["seed_init"]
.
Could this be part of the issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably that explains the difference you saw because we are testing with a small number of iterations. I think self.params["seed_init"]
is used to initialize the peak_topic
and the peak_subregion
assignments, whereas self.seed
is used to perform the sampling in _update()
.
nimare/annotate/gclda.py
Outdated
Updated number of word-tokens assigned to each topic (T) per document (D). | ||
""" | ||
# --- Seed random number generator | ||
np.random.seed(randseed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's likely jit does something slightly different here, since the results change if I comment out @jit
above.
Closes None.
Currently, the
GCLDA
model is slow, mainly when the vocabulary is large (for example for the 3228 terms from Neurosynth)._update_word_topic_assignments
takes ~73 seconds per iteration and_update_peak_assignments
takes ~52 seconds per iteration for a total of ~125 seconds of_update()
per iteration. Compiling and running these functions withnumba
reduce these times to 2, 4, and 6 seconds per iteration respectively.Changes proposed in this pull request:
_update_word_topic_assignments
and_update_peak_assignments
to static methods, and add a decorator to compile and run them withnumba
innopython
mode.