Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Acquiring more labelled training images #81

Open
davidwagner opened this issue Dec 18, 2020 · 2 comments
Open

Acquiring more labelled training images #81

davidwagner opened this issue Dec 18, 2020 · 2 comments

Comments

@davidwagner
Copy link

Currently the 0.0.4 dataset provides 125 training images of each class. If we want to train on more images, are there any resources to make it easier to acquire more labelled images that are valid and unambiguous, or do we need to re-implement the tasker evaluation ourselves?

If we use the IDs in bird-or-bicycle/bird_or_bicycle/metadata/0.0.4/, it looks like we can get close to 1000 more images of birds that have been verified by taskers, but no more images of bicycles are available for training from there. Anything else I am missing?

@carlini
Copy link
Collaborator

carlini commented Dec 18, 2020

I don't think we've collected more high quality labeled examples in train. The extra dataset has something like 27k more images that we've found helpful for training a classifier. I've been able to train a single linear layer on top of ImageNet features using the extra dataset to get ~99% test accuracy. But as you say, they're not filtered correctly.

@davidwagner
Copy link
Author

Thank you. Seems like getting more images of bicycles might take the most work. In my random sample of bicycles from extras/, 1/20 (5%) looked to me like they meet the requirements; I took another random sample, and 4/34 (12%) looked to me like they met the requirements; though I see from tasker_labels_0.0.4.csv that about 289/1322 (22%) met the requirements. I'm not sure why there was such variability among those three estimates (perhaps you all did some filtering before feeding images to taskers? or perhaps I just got unlucky in my random samples?). So if we filter extras, I'm guessing we might be able to obtain ~ 10000 good training images of birds and between 800-3000 good training images of bicycles, but this will require us to do the filtering ourselves. Thanks for the information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants