-- can be useful for novelty detection
-- an interesting problem statement
http://openaccess.thecvf.com/content_cvpr_2018/papers/Sung_Learning_to_Compare_CVPR_2018_paper.pdf
http://openaccess.thecvf.com/content_cvpr_2018/papers/Cao_HashGAN_Deep_Learning_CVPR_2018_paper.pdf
-- does work with partial data x_i, x_j, s_ij
-- but does not try to figure out the mising relations but rather generates more examples for augmentation purposes
-- the problem that I am trying to solve still remains open
http://openaccess.thecvf.com/content_cvpr_2018/papers/He_Triplet-Center_Loss_for_CVPR_2018_paper.pdf
-- triplet loss + center loss
-- center loss takes care of intra class variance but nothing about inter class
-- be close to the correct center and far away from the nearest center of some other class (nearest negative center)
-- a very interesting problem statement where there are noisy labels
-- but the noisy labels does not mean that the they belong to one of the classes in the dataset in fact it can be an outlier
-- as an example consider that you have cat dog with clean and noisy labels however you also have images of elephants with suppose dog labels obviously elephant images should not even be there . basically a kind of open set
-- definitely useful for mine own work and supritim's work
-- quick thought ? suppose you use some outlier algorithm and only use the in distribution samples to train your algorithm does the performance in general increase ?
-- an interesting way to model the conditional GAN discriminator architecture
-- basically what it is doing is trying to compute a kind of triplet loss by providing positive pairs and negative pairs ... also while sending the pairs they are actually utilizing the generated examples
-- not sure how applicable it is to attribute ... as they are obviously using transformations which are only applicable to images
http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhang_Deep_Mutual_Learning_CVPR_2018_paper.pdf
-- an interesting problem in which a larger network tries to make a smaller student network mimic its performance or even do better
-- this deals with the fact that a smaller network might be able to achieve the same accuracy as a larger network but might be simply harder to train ... just using supervised loss might not be good enough to easily get those parameters and so a different kind of approach so that the smaller network learns
-- distillation sounds like a fascinating technique to learn
http://openaccess.thecvf.com/content_cvpr_2018/papers/Niu_Learning_From_Noisy_CVPR_2018_paper.pdf
-- Learning from web data is increasingly popular due to abundant free web resources. However, the performance gap between webly supervised learning and traditional supervised learning is still very large, due to the label noise of web data as well as the domain shift between web data and test data. To fill this gap, most existing methods propose to purify or augment web data using instance-level supervision, which generally requires heavy annotation. Instead, we propose to address the label noise and domain shift by using more accessible category-level supervision. In particular, we build our deep probabilistic framework upon variational autoencoder (VAE), in which classification network and VAE can jointly leverage category-level hybrid information. Then, we extend our method for domain adaptation followed by our low-rank refinement strategy. Extensive experiments on three benchmark datasets demonstrate the effectiveness of our proposed method.
-- useful for supritim's work
-- how to handle web annotated data (noisy labeled data)
-- It is unsurprising that learning from web images becomes increasingly popular because of the large amount of freely available web data. However, the labels of web images crawled from public website are very noisy and often inaccurate. Moreover, the data distributions between web data (i.e., source domain) and test data (i.e., target domain) are quite different, which is known as domain shift. Therefore, when applying the classifier learnt on the noisy web training images to the test images, the performance will be significantly degraded.
-- a very interesting loss function which is helpful for supervised classification
-- basically the feature representation over each distinct class should be low rank and over all the classes should be high rank ... this will signify that small intra class variance and large inter class variance
-- a deep approach way to do this
-- shows quite a bit of promise when added with the standard softmax loss
-- code is availble also : https://github.com/jlezama/OrthogonalLowrankEmbedding
-- another interesting loss with softmax is mentioned here also (Large-margin softmax loss for convolutional neural networks ICML) https://github.com/wy1iu/LargeMargin_Softmax_Loss
-- an interesting paper where they have tried to match two domains without having pairwise correspondences
-- basically they are dealing with two domains of data X and Y and transformations Wx Wy to project it into the common domain called Cx and Cy. Once projected we should be able to get back the original information as Vx Cx = X and Vy Cy = Y.
-- so an adversarial network is trained in such a way that the correct thing classifies its domain eqn (4) and (5) but the projections Wx and Wy are supposed to misclassify (like the ACMR paper: Adversarial Cross Modal Retrieval). Reconstruction losses between Vx Cx = X and Vy Cy = Y are also there. so is the cycle consistent loss as in eqn (8)
-- eqn (9) and eqn (10) are adversarial losses also
-- code not available yet.
-- an interesting idea to generate hard negatives for learning either contrastive embedding or triplet loss embedding
-- the logic is that hard negatives are few in no and so can we generate some more negative samples near the positive samples using the easy negative samples? this is done using the adversarial loss
-- the generator tries to generate some negative samples which (1) looks like the positive sample (2) looks like the easy negative samples (3) still the metric will misclassify this generated samples
-- seamlessly works under different kind of losses
-- code is not provided yet