Lip-Attention Network (LAN)

This project tried to improve the lip recognition performance by designing the attention module and applying it to improve the lipreading network.

It was developed based on the LipNet.

Model

Fig1. Lip-Attention Network (LAN)

Fig2. RCAB

Fig3. Channel-Attention (CA)

You can check the architecture of 'Lip-Attention Network' in 'RG1_RCAB10.txt'.

Results

Scenario	samples	Epoch	CER	WER	BLEU
Original LipNet	3964 (whole)	149	12.21%	19.10%	81.56%
Lip-Attention Network (LAN)	3964 (whole)	149	8.02%	14.23%	84.02%

Installation

To use the model, first you need to clone the repository:

git clone https://github.com/smu-ivpl/LAN

Then you can install the package:

cd LAN/

pip install -r requirements.txt

Note: if you don't want to use CUDA, you need to edit the requirements.txt and change tensorflow-gpu to tensorflow

Dataset

This model uses GRID corpus (http://spandh.dcs.shef.ac.uk/gridcorpus/)

Pre-trained weights

You can download and use the weights provided here: https://github.com/Han-lim/Lip-Attention-Network/tree/master/evaluation/models.

Get started

Download all datasets from the GRID Corpus website.
Extracts all the videos and aligns.
Create datasets folder on training/unseen_speakers folder.
All current train_XXX.py expect the videos to be in the form of 100x50px mouthcrop image frames.
The other way would be to extract the mouthcrop image using scripts/extract_mouth_batch.py (usage can be found inside the script).
Create symlink from each training/*/datasets/align to your align folder.

Train

First, save the validation datasets (s1, s2, s20, and s22) in the val folder, and the rest of datasets in the train folder.

Then, create symlink from training/unseen_speakers/datasets/[train|val] to your selection of [train|val] inside of the video dataset folder.

Train the model using the following command:

./train unseen_speakers [GPUs (optional)]

Evaluate

To evaluate and visualize the trained model on a single video / image frames, you can execute the following command:

./predict [path to weight] [path to video]

Example:

./predict evaluation/models/unseen-weights126.h5 evaluation/samples/id2_vcd_swwp2s.mpg

Acknowledgement

Many thanks to the excellent open source projects:

LipNet

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
common		common
evaluation		evaluation
lipnet		lipnet
scripts		scripts
tests		tests
training/unseen_speakers		training/unseen_speakers
.gitignore		.gitignore
LICENSE		LICENSE
LipNet_org-report.txt		LipNet_org-report.txt
README.md		README.md
RG1_RCAB10.txt		RG1_RCAB10.txt
lan_env.yaml		lan_env.yaml
predict		predict
requirements.txt		requirements.txt
setup.py		setup.py
train		train

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lip-Attention Network (LAN)

Model

Results

Installation

Dataset

Pre-trained weights

Get started

Train

Evaluate

Acknowledgement

About

Releases

Packages

Languages

License

smu-ivpl/Lip-Attention-Network

Folders and files

Latest commit

History

Repository files navigation

Lip-Attention Network (LAN)

Model

Results

Installation

Dataset

Pre-trained weights

Get started

Train

Evaluate

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages