Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: classifier accuracy #8

Open
vancauwe opened this issue Dec 13, 2024 · 3 comments
Open

fix: classifier accuracy #8

vancauwe opened this issue Dec 13, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@vancauwe
Copy link
Contributor

The classifier currently has issues in its predictions. They do not match the ones stated in its article. Further investigations are needed to understand this accuracy drop.

@vancauwe vancauwe added the bug Something isn't working label Dec 13, 2024
@rmm-ch
Copy link
Collaborator

rmm-ch commented Dec 13, 2024

Test script added, and some outputs from one run (the model predictions aren't deterministic)

terrible performance '(

Mean load time: 0.035 +- 0.02 s
Mean classify time: 1.079 +- 0.06 s
Accuracy: correct with top prediction: 2 | any of top 3 correct: 5.000 (of total 100)
Which classes are predicted?
pred_0
pantropic_spotted_dolphin    29
gray_whale                   26
white_sided_dolphin          13
beluga                       10
dusky_dolphin                 7
pygmy_killer_whale            6
long_finned_pilot_whale       3
spotted_dolphin               2
common_dolphin                2
false_killer_whale            1
melon_headed_whale            1
Name: count, dtype: int64

# and then check most popular target classes:
print(df_results.target.value_counts().head())

bottlenose_dolphin         18
humpback_whale             18
beluga                     14
blue_whale                 10
melon_headed_whale          7

rmm-ch added a commit that referenced this issue Dec 13, 2024
@vancauwe
Copy link
Contributor Author

vancauwe commented Feb 14, 2025

Accuracy evaluation over all 51K images of the train set highlights a clear model problem. (see results below. ok is that the top prediction of the model corresponded to the label, and any is that one of the top 3 predictions matched the label).

Our accuracy is heterogenous across species but we do not have the same specie predispositions as can be found in the original paper presenting the model. (https://doi.org/10.1111/2041-210X.14167 Figure 6)

We think this is coming from the image preprocessing which is carried out before model classification (as this part was adapted from the WhaleDataset object from the original repository.

Image Image

@vancauwe
Copy link
Contributor Author

After comparison of the preprocessing on the current classifier (cetacean-classifier in Saving-Willy Hugging Face space) to the one performed in the WhaleDataset object from the original repository, we will explore the following:

Next steps:

  • Adding a bounding box cropping for centering on the animal (big addition to the preprocessing)
  • Seeing efficiency of the RandomResize (it may be done at random and may need to be removed)
  • Adding the Normalisation transform (an omission)

First, we will see the transforms on one image and decide which are beneficial. Then, we will test on a subset of 100 images of the training set to see if accuracy is enhanced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants