You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Currently, we support a limited set of options for curate_feature_extractor (TfidfVectorizer, CountVectorizer, SentenceTransformer) and curate_model (Sklearn models which implement predict_proba). Domain specific feature extraction methods / SOTA models are not included.
Describe the solution you'd like
If we would like to experiment with a wider set of feature extraction methods / classification models, then decoupling the cross validation based training from the label quality assessment would be helpful. Essentially, in addition to the fit_transform method, we implement a transform method that accepts the results from cross validation based training as input and performs the label quality checks. Following is an example snippet of how transform would potentially work -
Is your feature request related to a problem? Please describe.
Currently, we support a limited set of options for
curate_feature_extractor
(TfidfVectorizer, CountVectorizer, SentenceTransformer) andcurate_model
(Sklearn models which implementpredict_proba
). Domain specific feature extraction methods / SOTA models are not included.Describe the solution you'd like
If we would like to experiment with a wider set of feature extraction methods / classification models, then decoupling the cross validation based training from the label quality assessment would be helpful. Essentially, in addition to the
fit_transform
method, we implement atransform
method that accepts the results from cross validation based training as input and performs the label quality checks. Following is an example snippet of howtransform
would potentially work -CustomCrossValitidationTraining
is a custom trainer andCustomModel
is a custom model both defined by the user outside of DQC Toolkit's scope.crossval_pred_probability_matrix
contains the cross validation based prediction probabilities for each label for each sample.train_data_modified
is the result similar to what is observed forCrossValCurate.fit_transform
The text was updated successfully, but these errors were encountered: