Skip to content

Multimodal emotion recognition using lexico-acoustic language descriptions

License

Notifications You must be signed in to change notification settings

gustavocidornelas/fused-multimodal-emotion

Repository files navigation

fused-multimodal-emotion

About

Multimodal emotion recognition, using speech and text. The proposed model is inspired in the encoder-decoder architechture used in the problem of Neural Machine Translation (NMT). In our multimodal model, we use a main modality (text) and an auxiliary modality (speech). The main modality performs the classification given its input and intermediate representations generated by the auxiliary modality.

This repository was developed as part of my semester project at the Chair for Mathematical Information Science, at ETH Zürich in Spring 2019. The full report with the detailed description of our approach is avaliable here.

Requirements

  • Python 3 (tested with Python 3.7)
  • Tensorflow (tested with version 1.13.1)
  • Exhaustive list of requirements can be found in requirements.txt
  • GPU is not required, although it significantly acceletates training (tested with CUDA 10.0)

Project structure

The project is comprised of the directories data , model, parameters and preprocessing. After training a model, two other folders are created, namely graphs and pretrained-models.

data

Contains all the data used, in its raw and processed stages. The dataset used is the IEMOCAP dataset. The IEMOCAP is available under liscense agreement in the previous link. The data actually used by our model is preprocessed from the raw dataset. Our preprocessed data can be made available uppon request, provided that the person already has access to the IEMOCAP dataset.

model

Contains all of the implemented models. All the models have three main files:

  • process_[model]_data.py
  • train_[model].py
  • evaluate_[model].py,

where [model] is the model of interest (text, audio, multimodal, multimodal_attention). process_[model]_data.py contains the data input pipeline of the model of interest, loading, batching and splitting the data into training, validation and test sets. train_[model].py contains the main training loop. evaluate_[model].py evaluates the performance of the model in the validation and test sets and creates the confusion matrix for the test set.

parameters

The file parameters.py contains all of the relevant parameters for the simulation, divided into sections.

preprocessing

The file prepare_raw_audio.py reads all the raw audio files that are used by our model, truncates, zero pads and saves them to the expected directory within the /data folder.

graphs

Created once the model starts training. Contains two folders: /graph_train and /graph_val, with the information that can be visualized on TensorBoard, including the model's graph and the training and validation accuracies and losses.

pretrained-models

Created after the model is trained. Saves the whole model (graph and weights), that can be used to do inference at a later stage.

Running

To train a model, the first step is obtaining all the data. If you already have all the preprocessed data in the correct folder within the /data directory, you are good to go! If you would like to truncate the raw audio files differently, you can edit that in /preprocessing/prepare_raw_audio.py and run it once. The preprocessed raw audio files will be saved to the correct directory.

Once you have all the data correctly placed, you can edit the model's parameters in /parameters/parameters.py. The parameters for all the models are in this single file, but they are organized in sections, so it is important that you edit the parameters in the correct section.

With the correct parameters, it is time to train! The proportion of the data used to build the training, validation and test sets is hardcoded in train_[model].py. To train a model, you should run train_[model].py, where [model] is one of the options text, audio, multimodal, multimodal_attention. The model is evaluated on the validation set every 50 batches, but that can be changed in train_[model].py inside the training loop.

After training, the model is evaluated on the test set and the full model is saved to /pretrained-models.

References

A significant part of the code for this project is built over the code from multimodal-speech-emotion.

About

Multimodal emotion recognition using lexico-acoustic language descriptions

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages