Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue causing large memory consumption in pred_dist() #344

Merged

Conversation

mesenrj
Copy link
Contributor

@mesenrj mesenrj commented Jan 23, 2024

Hi,

I was recently working on a project and when inspecting resource usage realized memory consumption was way too high for .pred_dist() when passing a max_iter value other than None.

When inspecting the code, I realized that the issue stemmed from using .staged_pred_dist() but only returning the last value. This resulted in unnecessary memory allocation (in my case hundreds of GBs).

if (
    max_iter is not None
):  # get prediction at a particular iteration if asked for
    dist = self.staged_pred_dist(X, max_iter=max_iter)[-1]   <-- Only using last Dist obj, but allocates memory for max_iter * Dist objects

Also, the conditional statement wasn't really needed. All it was checking for is if max_iter is None or not, but pred_param() can already handle that case.

This PR fixes the aforementioned issues.

Thanks!

@alejandroschuler alejandroschuler self-assigned this Jan 24, 2024
@ryan-wolbeck ryan-wolbeck self-requested a review January 29, 2024 18:46
Copy link
Collaborator

@ryan-wolbeck ryan-wolbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The max_iter parameter is used in a conditional statement within a loop over self.base_models, self.scalings, and self.col_idxs. If max_iter is None, the condition max_iter and i == max_iter will always evaluate to False because None is considered False in a boolean context in Python.

This means that if max_iter is None, the loop will not break prematurely and will instead run for all models, scalings, and column indices. This is equivalent to having no limit on the number of iterations, which is the typical behavior when max_iter is None.

So, it seems that we can safely replace the conditional statement with a direct call to pred_param(), passing max_iter as an argument, whether it's None or an integer.

Thanks for the PR!

@ryan-wolbeck ryan-wolbeck merged commit ffbd359 into stanfordmlgroup:master Jan 31, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants