https://openreview.net/pdf?id=ByS1VpgRZ
https://github.com/pfnet-research/sngan_projection
-- this paper looks interesting and the code is provided
-- the first paper where the condition is added not at the front end of the algorithm but rather at the backend of the algorithm
https://openreview.net/pdf?id=B1QRgziT-
https://github.com/pfnet-research/sngan_projection
-- this stabilises the GAN training
-- even Ian Goodfellow have said that this is a wonderful work (everybody should have a look at it ... people who are using the generative models)
https://arxiv.org/pdf/1801.09195.pdf
-- a way to avoid mode collapse
-- but the interesting thing is this how they are using the autoencoder to generate an extra feature to concatenate to that of the output of the discriminator to determine real/fake examples
-- possible uses it for novelty detection ? suppose we can extend it to get conditions like attributes in the generator part ? for supritam's work it might be useful
http://data.bit.uni-bonn.de/publications/ICML2018.pdf
https://github.com/lukasruff/Deep-SVDD-PyTorch
-- a nice paper for novelty detection or anamoly detection
-- basically construct a hyperspehere of radius R such that most points fall inside it ... so whatever falls outside will be an outlier
-- can this be used to learn a better classifier ... will this be useful for example if we could do per class ... like c1,r1 ; c2,r2; ... etc and then force each class sample to be within each center of that particular class within a particular margin (the margin being the radius r1, r2, ... )
-- may be useful for supritam's work or even Ayyappa's work
-- learn two generate new categories without forgetting
-- potentially solving the problem of incremental learning
-- possible extensions to few shot learning/ zero shot learning / etc
http://papers.nips.cc/paper/7471-generalized-zero-shot-learning-with-deep-calibration-network.pdf
-- uses entropy but they require to know the category names of the unseen categories during the training itself
-- entropy used during training (in our case it is different)
https://github.com/bhanML/Co-teaching
-- an interesting paper to deal with noises in labels
-- basically two networks .. select in a mini-batch which label prediction losses are poorer and then cross-use them to update the parameters of the network
-- might be useful for our own work
https://papers.nips.cc/paper/7825-masking-a-new-perspective-of-noisy-supervision.pdf
-- an interesting idea so that we can ensure some structure in the weight matrix / channel layer
-- how they determine the mask ? -- can we use word2vec between the categories to do that ?
http://papers.nips.cc/paper/7798-to-trust-or-not-to-trust-a-classifier.pdf
-- an interesting problem to check whether to trust or not trust the classifier
-- check the concept of trust score
-- code available here at :: https://github.com/google/TrustScore
-- a metric to replace entropy?
-- discussed by sivaram an exciting work in few shot learning that tries to mimic the variations in the given classes and uses the learned variations to augment the few shot example data
http://papers.nips.cc/paper/7344-maximum-entropy-fine-grained-classification.pdf
-- uses entropy to basically say that for fine grained classifications it is better that the network is not too confident
-- shown to be an effective fine-tuning mechanism with lot of improvements observed over the generic baseline results even on tough datasets such as CUB
https://nips.cc/media/Slides/nips/2018/220e(06-15-30)-06-16-55-12761-Generalized_Cro.pdf
-- discusses the concept of symmetric loss functions and why MAE is better than CCE for the noisy labeled samples (refers to sastry sir's AAAI 2018 paper)
-- CCE converges fast but overfits to noise whereas MAE converges very very slow (often does not reach the performance of the CCE but is much more robust to the noisy labels) This is the reason why the authors tried to tread a middle ground here.
-- defines a new loss function called box function and truncated function to deal with the above problems
-- a paper which uses a small set of gold standard (or basically cleaned data) to clean other sample labels under severe noise conditions
-- this work uses a new property that given the noisy labels and the clean labels both are conditionally indepenedent of each other provided the original data ... or in other words the noisy label is not dependent on the clean label but rather on the data sample itself.
-- from the paper "The conditional independence assumption is reasonable, as it is usually the case that noisy labeling processes do not have access to the true label." This actually makes a lot of sense.
-- very good results but you do need the set of trusted labels
http://papers.nips.cc/paper/7830-dual-swap-disentangling.pdf
-- a nice idea using autoencoders to disentangle representations
-- also used to incorporate both the labeled and unlabeled samples
-- the best idea is about the swapping of the latent factors