Emotion Audio Analysis
This project involves analyzing audio files containing different human emotions and classifying them accordingly. The main components of the project are:
Data Visualization: Display audio data as spectrograms and waveforms to visually analyze the structure of the audio files.
Data Augmentation: Apply various transformations to the audio data to increase the dataset size and improve model performance.
Feature Extraction: Extract Mel-frequency cepstral coefficients (MFCCs) as features from the audio files.
Model Training: Implement and train Long Short-Term Memory (LSTM), Bidirectional LSTM, and Convolutional Neural Network (CNN) models for emotion classification.
Model Comparison: Compare the performance of the different models using various metrics.
Input and Output Input:
A directory containing emotion-labeled audio files.
Output:
Visualizations of audio data as spectrograms and waveforms.
Augmented audio data.
Extracted features for each audio file.
Trained LSTM, Bidirectional LSTM, and CNN models.
Model performance comparison.