Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TRACKER] Ensure that estimator classes do not perform parameter validation on construction #6105

Open
29 tasks
csadorf opened this issue Oct 10, 2024 · 2 comments
Labels
cuml-cpu improvement Improvement / enhancement to an existing function Tech Debt Issues related to debt Tracker For epoch-level tracking of work that encapsulates many stories

Comments

@csadorf
Copy link
Contributor

csadorf commented Oct 10, 2024

As required by cuml's development guide, estimator classes must adhere to the API and implementation guidelines put forth by scikit-learn to maximize compatibility.

Those guidelines require that estimators are initialized without any parameter validation or mutation:

There should be no logic, not even input validation, and the parameters should not be changed. The corresponding logic should be put where the parameters are used, typically in fit.

Some of cuml's estimators do not adhere to that standard. This issue tracks checking all of cuml's estimator classes and potentially fixing them.

The following estimators have been checked and are confirmed to not perform any parameter validation or mutation during construction:

  • DBSCAN from cuml.cluster.dbscan
  • DBSCANMG from cuml.cluster.dbscan_mg
  • HDBSCAN from cuml.cluster.hdbscan.hdbscan
  • KMeans from cuml.cluster.kmeans
  • KMeansMG from cuml.cluster.kmeans_mg
  • IncrementalPCA from cuml.decomposition.incremental_pca
  • PCA from cuml.decomposition.pca
  • PCAMG from cuml.decomposition.pca_mg
  • TruncatedSVD from cuml.decomposition.tsvd
  • TSVDMG from cuml.decomposition.tsvd_mg
  • ForestInference from cuml.experimental.fil.fil
  • ElasticNet from cuml.linear_model.elastic_net
  • Lasso from cuml.linear_model.lasso
  • LinearRegression from cuml.linear_model.linear_regression
  • LinearRegressionMG from cuml.linear_model.linear_regression_mg
  • LogisticRegression from cuml.linear_model.logistic_regression
  • LogisticRegressionMG from cuml.linear_model.logistic_regression_mg
  • Ridge from cuml.linear_model.ridge
  • RidgeMG from cuml.linear_model.ridge_mg
  • TSNE from cuml.manifold.t_sne
  • UMAP from cuml.manifold.umap
  • KNeighborsClassifier from cuml.neighbors.kneighbors_classifier
  • KNeighborsClassifierMG from cuml.neighbors.kneighbors_classifier_mg
  • KNeighborsRegressor from cuml.neighbors.kneighbors_regressor
  • KNeighborsRegressorMG from cuml.neighbors.kneighbors_regressor_mg
  • NearestNeighbors from cuml.neighbors.nearest_neighbors
  • NearestNeighborsMG from cuml.neighbors.nearest_neighbors_mg

The following estimators are identified to either validate and/or mutilate init parameters:

  • Do not validate/mutilate parameters during construction in LinearRegression from cuml.linear_model.linear_regression
  • Do not validate/mutilate parameters during construction in LogisticRegression from cuml.linear_model.logistic_regression
@csadorf csadorf added improvement Improvement / enhancement to an existing function Tech Debt Issues related to debt Tracker For epoch-level tracking of work that encapsulates many stories labels Oct 10, 2024
@csadorf csadorf self-assigned this Oct 10, 2024
@betatim
Copy link
Member

betatim commented Oct 11, 2024

LogisticRegression is another class that performs parameter validation in the constructor.

@csadorf
Copy link
Contributor Author

csadorf commented Oct 11, 2024

LogisticRegression is another class that performs parameter validation in the constructor.

Updated the issue description. I've structured the task lists such that it would be easy to create a sub-task if desired.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuml-cpu improvement Improvement / enhancement to an existing function Tech Debt Issues related to debt Tracker For epoch-level tracking of work that encapsulates many stories
Projects
None yet
Development

No branches or pull requests

3 participants