Hierarchical Windowed Graph Attention Network (HWGAT) is a deep learning model specifically designed for sign language recognition. This model leverages hierarchical and windowed attention mechanisms to effectively capture the temporal and spatial dependencies in sign language skeleton data. This repository includes a comprehensive implementation of HWGAT, covering data preprocessing and the full training pipeline.
To get started with HWGAT for sign language recognition, follow these steps:
-
Clone the repository:
git clone https://github.com/suvajit-patra/sl-hwgat.git cd sl-hwgat/hwgat
-
Create a docker instance with the
Dockerfile
and run the container. -
Install the required dependencies with
pip install -r requirements.txt
.
The data preprocessing pipeline prepares the raw sign language data for training.
-
Generate metadata: Ensure your dataset is structured properly and run the metadata generator scripts with correspoding dataset. After this, one metadata must be generated to run the deep learning pipeline with the following command. If you are using different dataset then make our own mata generator.
python meta_generators/FDMSE-ISL_meta_gen.py
!!!Note: Remember to update the paths inside every meta generator script.
This should generate a file in '/data/datasets/FDMSE-ISL/FDMSE-ISL_meta/metadata.csv'.
-
Generate keypoints: Extract keypoints and save them using the
pose_feature_extract.py
file by running the following command, where--root
: root directory of the dataset,--meta
: dataset's metadata.csv,--out_path
: saving path of the outputs (keypoints) (the folder will be created under the root directory).python pose_feature_extract.py --root '/data/datasets/FDMSE-ISL' --meta '/data/datasets/FDMSE-ISL/FDMSE-ISL_meta/metadata.csv' -m mediapipe --out_path 'mediapipe_out/'
-
Process keypoints data: Next preprocess the generated keypoints so that it can be used to trained the transformer based model using the following command, where
-ds
: dataset name,--root
: root directory of the dataset,--meta
: dataset's metadata.csv,-dr
: keypoints output relative path from the root,-kpm
: keypoint extraction model,-ft
: feature type that is extracted.python data_preprocess.py --root /data/datasets/FDMSE-ISL/ --ds FDMSE-ISL --meta /data/datasets/FDMSE-ISL/FDMSE-ISL_meta/metadata.csv -dr mediapipe_out/ kpm mediapipe -ft keypoints
Once the data is preprocessed, you can train the HWGAT model using the training pipeline provided.
-
Configure the training parameters: Edit the
configs.py
file to set your training parameters, such as learning rate, batch size, number of epochs, etc. -
Training the model: Start the training process of the model by running
python main.py -m train -d FDMSE-ISL --model HWGATE
-
Testing the model: Test the model using
python main.py -m test -d FDMSE-ISL --model HWGAT -t 240227_1807 -px best_loss
-
Load and train the model: Load and train the model or finetune on different datasets using
-
Load and train on same dataset.
python main.py -m load -d FDMSE-ISL --model HWGATE -t 240227_1807 -px best_loss
-
Finetune on other dataset.
python main.py -m load -d INCLUDE --model HWGATE -mw output/FDMSE-ISL/HWGAT_240227_1807/model_best_loss.pt
-
Go to this repository to get the demo application the HWGAT model for sign language recognition tasks.
This project is licensed under the MIT License - see the LICENSE file for details.
If you find this project useful in your research, please consider cite:
@misc{patra2024hierarchicalwindowedgraphattention,
title={Hierarchical Windowed Graph Attention Network and a Large Scale Dataset for Isolated Indian Sign Language Recognition},
author={Suvajit Patra and Arkadip Maitra and Megha Tiwari and K. Kumaran and Swathy Prabhu and Swami Punyeshwarananda and Soumitra Samanta},
year={2024},
eprint={2407.14224},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2407.14224},
}
Thank you for using this repository. For any questions or support, please open an issue in this repository.