MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild

This repository provides an official implementation for the paper MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild.

Installation

Please create an environment with Python 3.10 and use requirements file to install the rest of the libraries

pip install -r reqiurements.txt

Data preparation

We provide the codes for DFEW and MAFW datasets, which you would need to download. The annotations are provided in annotations/ directory. You would need to update the paths to your own - there are preprocessing scripts in the same directory to preprocess data and rename the paths.

For MAFW dataset, you would need to extract faces from videos. Please refer to data_utils that has an example of face detection pipeline. To extract audio from the video files (in both datasets), use the following script (after modifying the paths to your own).

You will also need to download pre-trained checkpoints for vision encoder from https://github.com/FuxiVirtualHuman/MAE-Face and for audio encoder from https://github.com/facebookresearch/AudioMAE Please extract them and rename the audio checkpoint to 'audiomae_pretrained.pth'. Both checkpoints are expected to be in root folder.

Running the code

The main script in main.py. You can invoke it through running:

./train_DFEW.sh

./train_MAFW.sh

Evaluation

You can download pre-trained models on DFEW from here and on MAFW from here. Please respect the dataset license when downloading the models! Evaluation can be done as follows:

python evaluate.py --fold $FOLD --checkpoint $CHECKPOINT_PATH --img-size $IMG_SIZE --dataset [MAFW|DFEW]

References

This repository is based on DFER-CLIP https://github.com/zengqunzhao/DFER-CLIP. We also thank the authors of MAE-Face https://github.com/FuxiVirtualHuman/MAE-Face and Audiomae https://github.com/facebookresearch/AudioMAE

Citation

If you use our work, please cite as:

@InProceedings{Chumachenko_2024_CVPR,
    author    = {Chumachenko, Kateryna and Iosifidis, Alexandros and Gabbouj, Moncef},
    title     = {MMA-DFER: MultiModal Adaptation of Unimodal Models for Dynamic Facial Expression Recognition In-the-wild},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month     = {June},
    year      = {2024},
    pages     = {4673-4682}
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
AudioMAE		AudioMAE
annotation		annotation
data_utils		data_utils
dataloader		dataloader
models		models
README.md		README.md
evaluate.py		evaluate.py
fff.drawio.png		fff.drawio.png
main.py		main.py
mp42wav.sh		mp42wav.sh
requirements.txt		requirements.txt
scheduler.py		scheduler.py
train_DFEW.sh		train_DFEW.sh
train_MAFW.sh		train_MAFW.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild

Installation

Data preparation

Running the code

Evaluation

References

Citation

About

Releases

Packages

Languages

katerynaCh/MMA-DFER

Folders and files

Latest commit

History

Repository files navigation

MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild

Installation

Data preparation

Running the code

Evaluation

References

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages