-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducing Results (Table 1) #3
Comments
Dear nil123532, To analyze the tradeoff between labeling costs and performance, we vary the number of available labels in our experiments. To ensure an equal number of labeled samples per class, we divide the total number of labels by the number of classes and randomly select the required number of samples for each class. You can find the corresponding implementation in lines 103–118 of Semi-Supervised/datasets/cifar.py. Table 1 in our paper provides an overview of the quality of the generated artificial expert labels. Specifically, we compute the F-0.5 score between the artificial expert labels and the ground-truth expert labels on the training set. This metric helps assess how well the artificial expert labels approximate the true expert labels. Let me know if you have any further questions! Best, |
Thank you for your prompt response both here and via email. From my understanding, once training is complete, you generate the expert labels from the binary output and then evaluate using the F0.5 score. Could you please let me know if you have code available for that evaluation? Also, could you clarify which dataset you use for F0.5 score evaluation: the test set or a combination of the labeled and unlabeled training sets? Best, |
Hi Nilesh, That’s correct. We evaluate the artificial expert labels by computing the F0.5 score between the artificial and true expert labels on the test set of CIFAR-100. While we do not have dedicated code for this in our repository, computing the F0.5 score is straightforward—for example, you can use the fbeta_score function from the sklearn library. Let me know if you need further clarification! Best, |
Thank you, |
Hi, I have a few additional clarifications. Thanks. |
Hello, I hope you’re doing well. I noticed that in the embedding model’s learning rate graph, the LR quickly decays to 8e‑4, which might be due to the scheduler step being called after every mini‐batch (on line 99 ). As a result, the schedule reaches the milestone of 160 steps almost immediately. It might be more appropriate to call the scheduler’s step() method once per epoch, so the learning rate decays at the intended intervals. Because of this rapid LR decay, the model only reached around 63% accuracy on CIFAR, which is relatively low for an EfficientNet‐based approach. I made a small change to increment the scheduler’s epoch counter only after each epoch, instead of after every batch. Separately, regarding the training of the “expertise” model (a linear model), I noticed that loss_x decreases while loss_u increases and then remains at that level for the rest of the 50 epochs. I would appreciate any guidance on how best to address this issue. At the moment, I suspect it's because the embedding model might not have been trained properly, and that could explain why I’m seeing an F0.5 score of 74% (for n_labelled = 120) instead of 84% for Embedding‐FixMatch. Any advice or suggestions would be greatly appreciated. Thank you, and I look forward to your response. |
def evaluate_f0_5_sklearn(model, ema_model, emb_model, dataloader, beta=0.5):
Here's my script to calculate the F0.5 score. Note that the dataloader will give items containing binary expert labels. |
Dear Authors,
I have a question regarding your evaluation methodology. In the paper, you mention that “for each l, we draw instances randomly from the training data while ensuring balanced class proportions.” However, from the code, it appears that the evaluation is performed using dlval, which is built from data that is disjoint from the training set. Could you please clarify how this is intended to work and how one should reproduce Table 1 in your paper?
The text was updated successfully, but these errors were encountered: