This is an implementation for the paper "A Closer Look at Weak Label Learning for Audio Events". In this paper, we attempt to understand the challenges of large scale Audio Event Detection (AED) using weakly labeled data through a CNN based framework. Our network architecture is capable of handling variable length recordings and architecture design provides a way to control segment size of adjustable secondary outputs and thus these features eliminate the need for additional preprocessing steps. We look into how label density and label corruption affects performance and further compare mined web data as training data in comparison with manually labelled training data from AudioSet. We believe our work provides an approach to understand the challenges of weakly labeled learning and future AED works would benefit from our exploration.
We provide the Audioset data (list of files used in our experimentation) provided for reproducibility.
If you have any question please contact - Ankit Shah - [email protected] or Anurag Kumar - [email protected].
If you use our repository for your research WALNet- weak label analysis, please cite our paper:
@article{shah2018closer,
title={A Closer Look at Weak Label Learning for Audio Events},
author={Shah, Ankit and Kumar, Anurag and Hauptmann, Alexander G and Raj, Bhiksha},
journal={arXiv preprint arXiv:1804.09288},
year={2018}
}
Training Set | MAP |
---|---|
AudioSet - 10 | 22.87 |
AudioSetAt30 | 22.42 |
AudioSetAt60 | 22.42 |
Contact Ankit Shah ([email protected]) or Anurag Kumar ([email protected])