Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAL query methods do not choose from the whole unlabelled set #1

Open
erik-werner opened this issue Nov 8, 2018 · 9 comments
Open

Comments

@erik-werner
Copy link

unlabeled_idx = np.random.choice(unlabeled_idx, np.min([labeled_idx.shape[0]*10, unlabeled_idx.size]), replace=False)

I suppose the reason for this line is that you want a somewhat balanced train set for the discriminator. But when you do the selection I assume that you want to use the whole set. That's how it's described in the paper.

I looked into this because I found it hard to believe Fig. 3c of your paper. That the two selection algorithms should be perfectly independent seems inconceivable to me. When I tried to reproduce your results (after fixing the issue) I instead found a small anti-correlation between DAL and entropy selection. I have not tested this at all, so it could be a mistake on my part. But if it holds up I think this is even more interesting than no correlation.

@dsgissin
Copy link
Owner

Hey Erik, I'm sorry but for some reason I only saw your issue now... Thanks for looking into this.
Figure 3 was produced using different code which I didn't upload here, and the problem you mentioned isn't present there. I tried re-running these experiments following your comments and was unable to reproduce a result which has an anti-correlation (the results remained the same as in the figure).

I want to think that my experiments were correct, but if you for some reason happen to still have the code you ran to reproduce this experiment I would be very happy to look into it.

Thanks

@d12306
Copy link

d12306 commented Sep 17, 2019

@erik-werner , @dsgissin ,Hi, thank you for posting this question out. But I am wondering how do you fix the issue. I mean training a discriminator on a so unbalanced dataset will lead to significantly biased prediction result. I mean the discriminator is prone to predict all the samples to one class (the class that is with much more samples) in order to let the loss decrease. How do you think of this issue.
Hopefully you can help me with this issue.

Thanks,

@dsgissin
Copy link
Owner

What I did originally was to weight the classes proportionally to the number of examples in the dataset, and use a large batch size to make sure that every batch has examples from the class that is under-represented.
I also sub-sample the unlabeled set so that the under-representation of the labeled set wouldn't be too large.

These are relatively simple ways of dealing with the problem of unbalanced classes, but they were enough to reach the results in the paper.

@d12306
Copy link

d12306 commented Sep 17, 2019

Hi, @dsgissin , Thanks, but what I mean is that the discriminator. I mean the number of images in the labeled set is too small compared to that of the unlabeled set. Say 1000 labeled and 40000 unlabeled. So the discriminator is biased. Also in the code, you randomly sample 10 times unlabeled images than labeled images for training the discriminator. So how do you weight classes? I mean we do not know the labels of the unlabeled images though. Actually, I am not quite sure about the meaning of " under-representation " here.
May be I am asking a naive question, I am sorry if I misunderstand you.

Thanks,

@dsgissin
Copy link
Owner

No problem, I'll try to explain more clearly.

What I do in the code (which is only one option which happened to work OK and can probably be improved), is subsample the unlabeled set so that there is only ten times the examples of the unlabeled set. This is still very unbalanced, but is manageable.
Then, I weight the labeled examples ten times that of the unlabeled examples (the loss is ten times as large for them).
Finally, I use large batch sizes to ensure that the labeled examples appear in all of the batches with high probability, so I don't get optimization issues.

Some clarifications:

  1. The discriminator doesn't care what the actual labels of the unlabeled set are - it simply tries to discriminate between the labeled set (labeled as "1", for example) and the unlabeled set (labeled as "0", for example). So in this case, the weight of the "1" class is ten times that of the "0" class. The upside of this algorithm is in fact that you are agnostic to the ML task you are running active learning for - you don't care about the actual labels of the data, only that the labeled set will resemble the unlabeled set as much as possible.
  2. Even with all the things I do for balancing out the binary classification task, it's true that the discriminator is probably biased towards the unlabeled set. However, that shouldn't hurt us too bad, since we are assuming it will still be most confident for examples which are most different from the labeled set. We only care about taking the top-K examples from the current unlabeled batch for labeling, and since we use early stopping in the training, we should expect the discriminator to be most confident about the examples which are easiest to differentiate from the labeled examples - these are the most different examples from the current labeled set, which diversify it the most and hopefully make the labeled set similar to the unlabeled set.

I hope that's more clear,
Daniel

@d12306
Copy link

d12306 commented Sep 17, 2019

Thanks, I got what you mean, @dsgissin , one more quick question, it seems like the training of the discriminator will gradually becomes adversarial training-flavored. I mean for examples, there are some dogs in the labeled set and there are some more dogs in the unlabeled set when the labeled set enlarges its size. The representations are intuitively similar but we still try to distinguish them. How do you think that will affect the model?

@dsgissin
Copy link
Owner

I'm not quite sure what you mean.
In real data we don't expect the representations to be really similar for the entire unlabeled set - there should always be examples that are different enough to be singled out by the discriminator. If the representations really are similar, than you're in good shape because the labeled set is similar to the unlabeled set (which is what we wanted).

@d12306
Copy link

d12306 commented Sep 18, 2019

Thanks for your clarification.

@d12306
Copy link

d12306 commented Jan 9, 2021

@dsgissin , hi, sorry for bothering you again, but how you select the hyperparameters, such as the iterations, the number of starting samples, etc, in the experiments? It seems like the validation dataset is only used in training the model and decide which checkpoint to use, but the other hyperparameters are arbitrarily used?

Probably better to set the same validation set for tuning these hyperparameters?

Thanks,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants