####a pre-train model, trained with 3d landmarks sequences, using bert model framework.
search as many as possible face dataset【contains sequencial frames】.
dlib face detection and face alignment facealign to obtain 3d landmarks randomly mask some frames train ViT like bert####datasets: FADID
AFEW KMU-FED CK+