A deep neural network that detects emotions through facial expressiosn from talking faces.
- GRID corpus dataset
- RAVDESS
- AffectNet
- Oulu-CASIA
- NBE Datasets (not publically available yet)
The DNN consists of two sub-models:
All scripts for data preperation, model training, and model benchmarks are provided under the directory of script
.
Store the dataset under the data
directory. Each dataset folder should has a structure as follows:
dataset
npy
videos
The GRID dataset requirs additional alignment files (check the LipNet repo for more details), the directory tree should looks like this:
GRID
align
npy
videos
dataset
train
faces
angry
xxx.jpg
disgust
fear
happy
neutral
sad
surprise
test
faces
...
- All images should be stored in the format of
.jpg
. - All videos should be stored in the format of
.npy
. - Each face image should be cropped to the shape (224,224,3), and each lip image should be in the shape of (50,100,3). The order of shape is
Width x Height x Channel
.
Use the script/lipnet.py
script to start training the lipnet model after preprocessed the GRID dataset.
Use the script/baseline.py
script to start training the baseline AFER model (still image based) after preprocessed the affectnet dataset.
Use the script/dnn.py
script to start training the dnn model after preprocessed the RAVDESS dataset.
Use the predict.py
script to analyze a video or a directory of videos with a trained model:
Scripts under the directory of script/evaluating
are used for real-time handy evaluation, results are not guaranteed.
- Tensorflow Dataset pipeline
- Generate dummy cropped image representing netual emotion
- Documentation: Proper usage and code documentation
- Testing: Develop unit testing
- Kai Yao - fecodoo @ Aalto University - Department of Computer Science
This project is licensed under the Apache 2.0 License - see the LICENSE file for details