Freesound-audio-tagging

DCASE2018 Task2 - General-purpose audio tagging of Freesound content with AudioSet labels

Kaggle - Freesound General-Purpose Audio Tagging Challenge

What you can get from this repository?

Framework for audio-tagging or audio classification which based on PyTorch.
Audio data processing method and feature extraction method.
Encapsulation of multiple models for the audio data.

Data

From Kaggle competition https://www.kaggle.com/c/freesound-audio-tagging/data

Requirments:

python 3.6

pytorch 0.4.0

cuda 9.1

librosa 0.5.1

torchvision 0.2.1

How to run?

Feature extraction.

python data_transform.py

This code can extract three types of features by selecting different functions:

Wave
Log-Mel
MFCC

Note: To extract different features, you need to set different parameters in config.

In order to speed up the extraction process, we use parallel computing, you could modify the number of threads according to your computer situation.

We extract log-mel and MFCC features, the delta and accelerate of log-mel and MFCC are calculated. Then we concatenate log-mel or MFCC with delta and accelerate to form a 3 x 64 x N dimension matrix where N depends on the length of audio files.

Train on Wave.

python train_on_wave.py

To train the network directly from waveform.

Before run it, you should instantiate Class config to set parameters (such as directory, learning rate, batch size, epoch...). Make sure the data you are using is the wave feature you extracted earlier.

Train on Log-Mel

python train_on_logmel.py

To train the network from log-mel feature.

Make sure the data you are using is the log-mel feature you extracted earlier.

Train on MFCC

python train_on_logmel.py

To train the network from MFCC feature using the same code, but you should use the MFCC feature you extracted earlier.

Single Models

Several deep learning networks are encapsulated for sound data in the network_*.py, including:

Resnet
ResNeXt
SE-ResNeXt
DPN
Xception

Also, you can find useful pretrained models in this repository.

To be improved

More efficient and high-performance models to be designed.
Currently, the models are trained on single GPU. Multiple GPUs can be used for parallel training to accelerate learning.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
pretrainedmodels		pretrainedmodels
.gitignore		.gitignore
README.md		README.md
config.py		config.py
core.py		core.py
data_loader.py		data_loader.py
data_transform.py		data_transform.py
make_predictions.py		make_predictions.py
network.py		network.py
network_MTOresnext.py		network_MTOresnext.py
network_dpn.py		network_dpn.py
network_resnext.py		network_resnext.py
network_senet.py		network_senet.py
network_xception.py		network_xception.py
train_on_logmel.py		train_on_logmel.py
train_on_wave.py		train_on_wave.py
util.py		util.py
wavelist.csv		wavelist.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Freesound-audio-tagging

What you can get from this repository?

Data

Requirments:

How to run?

Feature extraction.

Train on Wave.

Train on Log-Mel

Train on MFCC

Single Models

To be improved

About

Releases

Packages

Languages

Blank-Wang/DCASE2018Task2

Folders and files

Latest commit

History

Repository files navigation

Freesound-audio-tagging

What you can get from this repository?

Data

Requirments:

How to run?

Feature extraction.

Train on Wave.

Train on Log-Mel

Train on MFCC

Single Models

To be improved

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages