Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a transform method in CrossValCurate #6

Open
sumanthprabhu opened this issue May 9, 2024 · 0 comments
Open

Implement a transform method in CrossValCurate #6

sumanthprabhu opened this issue May 9, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@sumanthprabhu
Copy link
Owner

Is your feature request related to a problem? Please describe.
Currently, we support a limited set of options for curate_feature_extractor (TfidfVectorizer, CountVectorizer, SentenceTransformer) and curate_model (Sklearn models which implement predict_proba). Domain specific feature extraction methods / SOTA models are not included.

Describe the solution you'd like
If we would like to experiment with a wider set of feature extraction methods / classification models, then decoupling the cross validation based training from the label quality assessment would be helpful. Essentially, in addition to the fit_transform method, we implement a transform method that accepts the results from cross validation based training as input and performs the label quality checks. Following is an example snippet of how transform would potentially work -

crossval_pred_probability_matrix = CustomCrossValitidationTraining(CustomModel, data_with_noisy_labels)
cvc = CrossValCurate(random_state=seed, correctness_threshold=0.0)
train_data_modified = cvc.transform(crossval_pred_probability_matrix, train_data, y_col_name="label") 
  • CustomCrossValitidationTraining is a custom trainer and CustomModel is a custom model both defined by the user outside of DQC Toolkit's scope.
  • crossval_pred_probability_matrix contains the cross validation based prediction probabilities for each label for each sample.
  • train_data_modified is the result similar to what is observed for CrossValCurate.fit_transform
@sumanthprabhu sumanthprabhu added the enhancement New feature or request label May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant