This project tried to improve the lip recognition performance by designing the attention module and applying it to improve the lipreading network.
It was developed based on the LipNet.
Fig1. Lip-Attention Network (LAN) Fig2. RCAB Fig3. Channel-Attention (CA)
You can check the architecture of 'Lip-Attention Network' in 'RG1_RCAB10.txt'.
Scenario | samples | Epoch | CER | WER | BLEU |
---|---|---|---|---|---|
Original LipNet | 3964 (whole) | 149 | 12.21% | 19.10% | 81.56% |
Lip-Attention Network (LAN) | 3964 (whole) | 149 | 8.02% | 14.23% | 84.02% |
To use the model, first you need to clone the repository:
git clone https://github.com/smu-ivpl/LAN
Then you can install the package:
cd LAN/
pip install -r requirements.txt
Note: if you don't want to use CUDA, you need to edit the requirements.txt
and change tensorflow-gpu
to tensorflow
This model uses GRID corpus (http://spandh.dcs.shef.ac.uk/gridcorpus/)
You can download and use the weights provided here: https://github.com/Han-lim/Lip-Attention-Network/tree/master/evaluation/models.
- Download all datasets from the GRID Corpus website.
- Extracts all the videos and aligns.
- Create
datasets
folder ontraining/unseen_speakers
folder. - All current
train_XXX.py
expect the videos to be in the form of 100x50px mouthcrop image frames. - The other way would be to extract the mouthcrop image using
scripts/extract_mouth_batch.py
(usage can be found inside the script). - Create symlink from each
training/*/datasets/align
to your align folder.
First, save the validation datasets (s1
, s2
, s20
, and s22
) in the val folder, and the rest of datasets in the train folder.
Then, create symlink from training/unseen_speakers/datasets/[train|val]
to your selection of [train|val]
inside of the video dataset folder.
Train the model using the following command:
./train unseen_speakers [GPUs (optional)]
To evaluate and visualize the trained model on a single video / image frames, you can execute the following command:
./predict [path to weight] [path to video]
Example:
./predict evaluation/models/unseen-weights126.h5 evaluation/samples/id2_vcd_swwp2s.mpg
Many thanks to the excellent open source projects: