Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LogisticRegression] Support standardization for dense vectors #565

Merged
merged 9 commits into from
Feb 17, 2024

Conversation

lijinf2
Copy link
Collaborator

@lijinf2 lijinf2 commented Feb 14, 2024

Sparse vectors will be densified to dense vectors if standardization is on.

The PR removes uvm in the test of nlp20news dataset, because it is found the uvm leads to a cuda memset error in vars calculation.

@lijinf2
Copy link
Collaborator Author

lijinf2 commented Feb 14, 2024

build

python/src/spark_rapids_ml/classification.py Show resolved Hide resolved
init_parameters["fit_intercept"] is True
and len(intercept_array) > 1
):
intercept_mean = sum(intercept_array) / len(intercept_array)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a 'mean' method that can be called? Also, how does this not change the model output?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, revised the code to use np.mean and cp.mean

convert_to_sparse: bool = False,
) -> LogisticRegression:
assert (
standardization is False
), "standardization=True is not supported due to testing with single-GPU LogisticRegression"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use cuml standard scalar preprocessr?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revised the code to remove the standardization option from argument list.
this test was for standardization=False only.

python/tests/test_logistic_regression.py Show resolved Hide resolved
python/tests/test_logistic_regression.py Show resolved Hide resolved
@lijinf2
Copy link
Collaborator Author

lijinf2 commented Feb 17, 2024

build

Copy link
Collaborator

@eordentlich eordentlich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Just one more question.

python/src/spark_rapids_ml/classification.py Show resolved Hide resolved
Copy link
Collaborator

@eordentlich eordentlich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@lijinf2 lijinf2 merged commit c3ea178 into NVIDIA:branch-24.02 Feb 17, 2024
2 checks passed
@lijinf2 lijinf2 deleted the lr_standardization branch April 4, 2024 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants