[LogisticRegression] Support standardization for dense vectors #565

lijinf2 · 2024-02-14T23:39:23Z

Sparse vectors will be densified to dense vectors if standardization is on.

The PR removes uvm in the test of nlp20news dataset, because it is found the uvm leads to a cuda memset error in vars calculation.

Signed-off-by: Jinfeng <[email protected]>

…l tests working

…-memory access bug in sum kernal

lijinf2 · 2024-02-14T23:43:19Z

build

python/src/spark_rapids_ml/classification.py

eordentlich · 2024-02-15T23:30:24Z

python/src/spark_rapids_ml/classification.py

+                    init_parameters["fit_intercept"] is True
+                    and len(intercept_array) > 1
+                ):
+                    intercept_mean = sum(intercept_array) / len(intercept_array)


Is there a 'mean' method that can be called? Also, how does this not change the model output?

Yeah, revised the code to use np.mean and cp.mean

eordentlich · 2024-02-15T23:42:58Z

python/tests/test_logistic_regression.py

    convert_to_sparse: bool = False,
 ) -> LogisticRegression:
+    assert (
+        standardization is False
+    ), "standardization=True is not supported due to testing with single-GPU LogisticRegression"


Can we use cuml standard scalar preprocessr?

revised the code to remove the standardization option from argument list.
this test was for standardization=False only.

python/tests/test_logistic_regression.py

lijinf2 · 2024-02-17T00:12:18Z

build

eordentlich

Nice. Just one more question.

python/src/spark_rapids_ml/classification.py

eordentlich

👍

lijinf2 added 8 commits February 14, 2024 15:41

add test_compat_standardization in progress

8bc958f

Signed-off-by: Jinfeng <[email protected]>

support standardization and add tests

f01f7a7

rebase latest

f3722d8

test standardization on sparse example and nlp20news

9afa688

Signed-off-by: Jinfeng <[email protected]>

try setting enable_sparse_data_optim in progress

9910acc

densification in progress with padding zeroes

54920f3

remove uvm for nlp20news gets padding 0 working, still need to get al…

3cee060

…l tests working

confirms cudaMemSet error at vars is due to uvm instead of the out-of…

b0d772f

…-memory access bug in sum kernal

lijinf2 force-pushed the lr_standardization branch from f713a77 to b0d772f Compare February 14, 2024 23:41

eordentlich reviewed Feb 15, 2024

View reviewed changes

revise per comments

16e9e15

eordentlich reviewed Feb 17, 2024

View reviewed changes

python/src/spark_rapids_ml/classification.py Show resolved Hide resolved

eordentlich approved these changes Feb 17, 2024

View reviewed changes

lijinf2 merged commit c3ea178 into NVIDIA:branch-24.02 Feb 17, 2024
2 checks passed

lijinf2 deleted the lr_standardization branch April 4, 2024 19:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LogisticRegression] Support standardization for dense vectors #565

[LogisticRegression] Support standardization for dense vectors #565

lijinf2 commented Feb 14, 2024

lijinf2 commented Feb 14, 2024

eordentlich Feb 15, 2024

lijinf2 Feb 17, 2024

eordentlich Feb 15, 2024

lijinf2 Feb 17, 2024

lijinf2 commented Feb 17, 2024

eordentlich left a comment

eordentlich left a comment

[LogisticRegression] Support standardization for dense vectors #565

[LogisticRegression] Support standardization for dense vectors #565

Conversation

lijinf2 commented Feb 14, 2024

lijinf2 commented Feb 14, 2024

eordentlich Feb 15, 2024

Choose a reason for hiding this comment

lijinf2 Feb 17, 2024

Choose a reason for hiding this comment

eordentlich Feb 15, 2024

Choose a reason for hiding this comment

lijinf2 Feb 17, 2024

Choose a reason for hiding this comment

lijinf2 commented Feb 17, 2024

eordentlich left a comment

Choose a reason for hiding this comment

eordentlich left a comment

Choose a reason for hiding this comment