This repository contains an implementation of the loss function proposed in the paper "Combining Distance to Class Centroids and Outlier Discounting for Improved Learning with Noisy Labels". The loss function can be used to train deep neural networks especially in the presence of noisy labels.
To cite our work
@misc{https://doi.org/10.48550/arxiv.2303.09470,
doi = {10.48550/ARXIV.2303.09470},
url = {https://arxiv.org/abs/2303.09470},
author = {Wani, Farooq Ahmad and Bucarelli, Maria Sofia and Silvestri, Fabrizio},
keywords = {Machine Learning (cs.LG), Artificial Intelligence (cs.AI), Machine Learning (stat.ML), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Combining Distance to Class Centroids and Outlier Discounting for Improved Learning with Noisy Labels},
publisher = {arXiv},
year = {2023},
copyright = {arXiv.org perpetual, non-exclusive license}
}
To use the loss function, follow these instructions:
Declare the following variables outside the class:
mean
: The initialization mean for parameteru
std
: The initialization standard deviation for parameteru
encoder_features
: Number of featuresϕ(xi)
the output of the second to the last layer of our networktotal_epochs
: The total number of epochs the network is trained for
It is preferable to pass the above four variables using the config file. For simplicity, they have been declared global in this implementation.
Initialize the ncodLoss
class with the following parameters:
sample_labels
: The list of training labels for all samples "should not be one hot encoded"num_examp
: Total number of training samplesnum_classes
: Total number of training classes
If you want to use the two additional regularization terms "the consistency regularizer LC and class-balance regularizer LB", pass the following parameters:
ratio_consistency
The weightage given toLC
. Check our paper for its value. Default is zero.ratio_balance
he weightage given toLB
. Check our paper for its value. Default is zero.
Additional information for step 2:
self.beginning
: It is turned toTrue
because we start saving the average class latent representation from the beginning. Turn this toFalse
at the time of saving the average latent representation of the class so that it will be turned on again by the function after the first run.
Call the forward
method of ncodLoss
with the following parameters:
index
: The index of each training sample in the current batch.outputs
: The target output generated by the model (M).labels
: one-hot encoded representation of each sample in the batch.out
: encoded feature representation of each sample in a batch.flag
: The batch number or batch_id for each batch.epoch
: The number of the current epoch.
Note that in the case of the two-networked ensembles architecture, the size of outputs
and out
will be twice that of the number of indices because the second chunk contains the result of the augmented data of the first chunk.
Create an object of the above class ncodLoss
and get the weights of u
. Assign some learning rate to u
(for learning rate check the paper) with zero weight decay and create a separate optimizer for u
. We used SGD. Let us call it optimizer_u
.
During training of the network, optimize the weights of `u` as well. For example:
for current_epoch in total_number_of_epochs:
for batch_number, (Actualdata, Augumented, label, indexs) in yourdataloader:
Actualdata, label = Actualdata.to(device), label.long().to(device)
target = torch.zeros(len(label),your_number_of_classes).to(device).scatter_(1, label.view(-1,1), 1)
if (you want to use the additional regularisation of LC and LB) > 0:
Augumented = Augumented.to(self.device)
data = torch.cat([Actualdata, Augumented]).cuda()
else:
data = Actualdata
output,out = your_model(data)
loss = object_of_ncodLoss(indexs, output, target, out, batch_number, epoch)
self.optimizer_u.zero_grad()
self.your_own_optimizer.zero_grad()
loss.backward()
self.optimizer_u.step()
self.your_own_optimizer.step()