Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I adjust parameter when the recall of a particular label is zero In Classification Task? #252

Open
000namc opened this issue Jul 27, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@000namc
Copy link

000namc commented Jul 27, 2023

Thank you for enabling me to use this excellent model. I am working on a category classification problem and facing some challenges during the parameter tuning process. How do I adjust parameter when the recall of a particular label is zero In Classification Task?

Let me provide a brief explanation of the problem I am currently tackling.

I have approximately 3,000,000 rows of training data, and it is categorized into a total of 4300 categories. The data count is normally distributed among each category. When I first applied pecos, I achieved a top1 accuracy of 87%, which is a great score. With a bit more improvement, it would be suitable for our service usage. Therefore, I am currently attempting parameter tuning.

After examining the classified results, out of the 4300 categories, 3900 categories have a top1 accuracy of 95-99%, while the remaining 400 categories have a recall of 0 (In Training and Validation ). I want to adjust the parameters to make predictions for these 400 categories. However, I'm unsure about the best approach. How should I go about trying to achieve this?

The methods I have attempted are as follows:

  • refined the clustering parameters "refined_indexer_params"
  • refined train_params.matcher_params_chain.model_shotcut to use "https://huggingface.co/monologg/kobert" (Due to the data being in Korean)
    (((
    Even though I am feeding embeddings (X = np.hstack((textual_embed_x, normalized_visual_embed))) that were independently computed from Pecos into the training process (both in the main model and clustering), I'm wondering if they are still being utilized during the label embedding step.
    )))

And now I am trying:

  • remove tf-idf sparse vector, Since I'm wondering if overfitting is occurring due to the tf-idf vectors.

I would be extremely grateful for your help.

@000namc 000namc added the enhancement New feature or request label Jul 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant