Code for our thesis (CSE400) where we tried multimodal emotion recognition from speech and text features using an ensemble of different classifiers on the IEMOCAP dataset. For that, we tried different hetergoneneous ensemble learning techniques to compare and find the best ensemble. We covered the following techniques:
- Hard Voting
- Soft Voting
- Stacking
- Blending
We have 6 models trained on speech data, and 6 models trained on text data. The findings of all 12 models are combined using the ensemble methods. The results are found in main.ipynb
.
This is the code used for our research work presented in the paper:
Multimodal Emotion Recognition Using Heterogeneous Ensemble Techniques
DOI: 10.1109/ICCIT57492.2022.10054720
Conference: 2022 25th International Conference on Computer and Information Technology (ICCIT)
Publisher: IEEE
-
Clone the repository
-
cd
to the codebase and create a virtual environment and activate it.cd emotion-recognition-ensemble-learning
- For Linux
python3 -m venv env source env/bin/activate
- For Windows
python -m venv env env\Scripts\activate.bat
-
Install necessary libraries
pip install -r requirements.txt pip install PyAudio-0.2.11-cp37-cp37m-win_amd64.whl # (OPTIONAL) depends on your python version (37, 38 or 39)
-
Download the IEMOCAP dataset by submitting a request from here. Will take 1-3 days for them to email you.
-
Make a folder named
data
in the project directory and put the dataset there. Rename the folder toIEMOCAP_dataset
-
Process the dataset and extract features into the data folder
python3 -m process_dataset.speech_features python3 -m process_dataset.text_features
-
Run main.ipynb
To test out the performance of each model individually, just run their respective file as a module. For example
python3 -m speech_models.speech_logistic_regression
IEMOCAP metadata was obtained from here.