Bagging-With-Strong-Learners-for-Emotion-Recognition-from-Speech

Speech emotion recognition, a highly promising and exciting problem in the field of Human Computer Interaction, has been studied and analyzed over several years, and concerns the task of recognizing a speaker's emotions from their speech recordings. Recognizing emotions from speech can go a long way in determining a person's physical and psychological state of well-being, and also provides us with valuable information regarding language etc. In this work we analyzed three corpora - the Berlin EmoDB, the Indian Institute of Technology Kharagpur Simulated Emotion Hindi Speech Corpus (IITKGP-SEHSC) and the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and extracted spectral features from them which were further processed and reduced to the required feature set by using Boruta, a wrapper-based feature selection method. We propose a bagged ensemble comprising of Support Vector Machines with a Gaussian kernel as a viable algorithm for the problem at hand, and report the results obtained on the three datasets above.

Currently writing paper for the same.

Early copy of the paper.

Results

Note: All above codes designed for IITKGP database , same codes can be used for other databses with slight modification speech_emotion_model.py in code. I cannot share IITKGP-SEHSC database because it is a property of IITKGP. to get the database please contact Prof. K. S. Rao.

Download EmoDB database, Ravdess database

Libraries required

Scipy
Numpy
python_speech_features
boruta
imblearn
sklearn

MFCCs nd their delta nd double delta features are calculated for each frame. (13+13+13=39)

The spectral sub-band centroids are calculated next, 26 for each frame.

the mean, variance, maximum value, minimum value, skewness, kurtosis and inter-quartile range. These values were calculated for each audio file over all the frames and for each coefficient, which gave us a feature vector of dimension ((13+13+13+26) * 7 = 455 features for each sample

read_iitkgp.py -> for extracting features for IITKGP dataset. You can download output pkl file iitkgp.pkl

readdataemodb.py -> for extracting features for EmoDB dataset. You can download output pkl file emodb.pkl

readdataravdess.py -> for extracting features for Ravdess dataset.

Our base estimator was a Support Vector Machine with a Gaussian kernel, penalty term 100 and kernel coefficient 0.1. We combined 20 of these in a bagging ensemble, and set bootstrap features as True - so samples are drawn from the training set with replacement. This entire procedure was carried out using the scikit-learn Python package.

speech_emotion_model.py -> for training and testing a model

To know more about feature selection using Boruta package

How to perform feature selection (i.e. pick important variables) using Boruta Package in R by AnalyticsVidhya

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
README.md		README.md
bagging-strong-learners.pdf		bagging-strong-learners.pdf
read_iitkgp.py		read_iitkgp.py
readdataemodb.py		readdataemodb.py
readdataravdess.py		readdataravdess.py
results.png		results.png
speech_emotion_model.py		speech_emotion_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bagging-With-Strong-Learners-for-Emotion-Recognition-from-Speech

Results

Libraries required

About

Releases

Packages

Languages

cpankajr/Bagging-With-Strong-Learners-for-Emotion-Recognition-from-Speech

Folders and files

Latest commit

History

Repository files navigation

Bagging-With-Strong-Learners-for-Emotion-Recognition-from-Speech

Results

Libraries required

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages