WIP: Cross-validation support for multiple covariate matching #64

mnarayan · 2021-02-12T22:23:28Z

No description provided.

arokem

Great!

arokem · 2021-02-12T23:28:34Z

afqinsight/cross_validate.py

+def stratified_crossvalidate(
+    covariate_df,
+    pipeline_estimator=None,
+    cv_splitter=StratifiedShuffleSplit(n_splits=1, test_size=0.25),


Might set this to None, and initialize below. Just to be on the safe side.

arokem · 2021-02-12T23:29:59Z

afqinsight/cross_validate.py

+
+    # Check covariate_df is a dataframe
+    if not isinstance(covariate_df, pd.DataFrame):
+        raise TypeError


Suggested change

raise TypeError

raise TypeError("`covariate_df` input needs to be a DataFrame object")

Is that crucial, by the way? Why not take arrays?

arokem · 2021-02-12T23:30:54Z

afqinsight/cross_validate.py

+    if not isinstance(covariate_df, pd.DataFrame):
+        raise TypeError
+
+    X = covariate_df.to_numpy()


I guess you can check if it's a dataframe, and if not, skip this line

arokem · 2021-02-12T23:31:46Z

afqinsight/cross_validate.py

+    if pipeline_estimator is None:
+        pipeline_estimator = make_pipeline(
+            KNeighborsTransformer(n_neighbors=10, mode="distance"),
+            Isomap(n_components=5, neighbors_algorithm="auto"),


I might expose n_neighbors and n_components as key-word arguments for the whole function, set to these values as default.

Maybe not. If they want to customize, just pass your own pipeline_estimator

arokem · 2021-02-12T23:32:15Z

afqinsight/cross_validate.py

+        )
+
+    X_reduced = pipeline_estimator.fit_transform(X)
+    cluster_stratify = KMeans(n_clusters=5).fit_predict(X_reduced)


Same here : expose n_clusters

arokem · 2021-02-12T23:34:33Z

afqinsight/cross_validate.py

+    for foldno, (train_index, test_index) in enumerate(
+        cv_splitter.split(X, cluster_stratify)
+    ):
+        print("Fold No: {}".format(foldno))


Maybe use logging instead of print?

arokem · 2021-02-12T23:35:21Z

afqinsight/cross_validate.py

+        A scikit-learn object for splitting the dataset into
+        train/test set such that splits are stratified
+
+    Yields


I think it Returns these, not Yields

arokem · 2021-02-12T23:35:46Z

afqinsight/cross_validate.py

+        print(train_clusters)
+        print(train_freq)
+        print(test_clusters)
+        print(test_freq)


Again, logging?

arokem · 2021-02-22T03:37:44Z

@mnarayan : any progress on this one? Looks like the build failures are just some minor linting.

Would you mind rebasing and adding tests?

arokem · 2022-11-06T22:40:37Z

I believe this was superseded by #107

Added new function (skeleton) for covariate matching

bd85772

mnarayan linked an issue Feb 12, 2021 that may be closed by this pull request

Implement stratified cross-validation for multiple covariates #63

Open

mnarayan added 3 commits February 12, 2021 15:15

covariate matching functionality added to cross_validate

2919f34

Re-added missing import

e70897e

Added pandas import

05f94f7

arokem reviewed Feb 12, 2021

View reviewed changes

arokem closed this Nov 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Cross-validation support for multiple covariate matching #64

WIP: Cross-validation support for multiple covariate matching #64

mnarayan commented Feb 12, 2021

arokem left a comment

arokem Feb 12, 2021

arokem Feb 12, 2021

arokem Feb 12, 2021

arokem Feb 12, 2021

arokem Feb 12, 2021

arokem Feb 12, 2021

arokem Feb 12, 2021

arokem Feb 12, 2021

arokem Feb 12, 2021

arokem commented Feb 22, 2021

arokem commented Nov 6, 2022

	raise TypeError
	raise TypeError("`covariate_df` input needs to be a DataFrame object")

WIP: Cross-validation support for multiple covariate matching #64

WIP: Cross-validation support for multiple covariate matching #64

Conversation

mnarayan commented Feb 12, 2021

arokem left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arokem commented Feb 22, 2021

arokem commented Nov 6, 2022