Skip to content

Latest commit

 

History

History
121 lines (92 loc) · 5.42 KB

README.md

File metadata and controls

121 lines (92 loc) · 5.42 KB

Python 3.6

MMSA

Pytorch implementation for codes in multimodal sentiment analysis.

Note: We strongly recommend that you browse the overall structure of our code at first. If you have any question, feel free to contact us.

Support Models

In this framework, we support the following methods:

Type Model Name From
Single-Task EF_LSTM MultimodalDNN
Single-Task LF_DNN -
Single-Task TFN Tensor-Fusion-Network
Single-Task LMF Low-rank-Multimodal-Fusion
Single-Task MFN Memory-Fusion-Network
Single-Task Graph-MFN Graph-Memory-Fusion-Network
Single-Task MulT(without CTC) Multimodal-Transformer
Single-Task MISA MISA
Multi-Task MLF_DNN MMSA
Multi-Task MTFN MMSA
Multi-Task MLMF MMSA
Multi-Task SELF_MM Self-MM

Results

Detailed results are shown in results/result-stat.md

Usage

Clone codes

  • Clone this repo and install requirements.
git clone https://github.com/thuiar/MMSA
cd MMSA
pip install -r requirements.txt

Datasets and pre-trained berts

Download dataset features and pre-trained berts from the following links.

For all features, you can use SHA-1 Hash Value to check the consistency.

MOSI/unaligned_50.pkl: 5da0b8440fc5a7c3a457859af27458beb993e088
MOSI/aligned_50.pkl: 5c62b896619a334a7104c8bef05d82b05272c71c
MOSEI/unaligned_50.pkl: db3e2cff4d706a88ee156981c2100975513d4610
MOSEI/aligned_50.pkl: ef49589349bc1c2bc252ccc0d4657a755c92a056
SIMS/unaligned_39.pkl: a00c73e92f66896403c09dbad63e242d5af756f8

Due to the size limitations, the MOSEI features and SIMS raw videos are available in Baidu Cloud Drive only. All dataset features are organized as:

{
    "train": {
        "raw_text": [],
        "audio": [],
        "vision": [],
        "id": [], # [video_id$_$clip_id, ..., ...]
        "text": [],
        "text_bert": [],
        "audio_lengths": [],
        "vision_lengths": [],
        "annotations": [],
        "classification_labels": [], # Negative(< 0), Neutral(0), Positive(> 0)
        "regression_labels": []
    },
    "valid": {***}, # same as the "train" 
    "test": {***}, # same as the "train"
}

For MOSI and MOSEI, the pre-extracted text features are from BERT, different from the original glove features in the CMU-Multimodal-SDK.

For SIMS, if you want to extract features from raw videos, you need to install Openface Toolkits first, and then refer our codes in the data/DataPre.py.

python data/DataPre.py --data_dir [path_to_Dataset] --language ** --openface2Path  [path_to_FeatureExtraction]

For bert models, you also can download Bert-Base, Chinese from Google-Bert. And then, convert tensorflow into pytorch using transformers-cli

Then, modify config/config_*.py to update dataset pathes.

Run

python run.py --modelName *** --datasetName ***

Paper

Please cite our paper if you find our work useful for your research:

@inproceedings{yu2020ch,
  title={CH-SIMS: A Chinese Multimodal Sentiment Analysis Dataset with Fine-grained Annotation of Modality},
  author={Yu, Wenmeng and Xu, Hua and Meng, Fanyang and Zhu, Yilin and Ma, Yixiao and Wu, Jiele and Zou, Jiyun and Yang, Kaicheng},
  booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
  pages={3718--3727},
  year={2020}
}
@article{yu2021learning,
  title={Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis},
  author={Yu, Wenmeng and Xu, Hua and Yuan, Ziqi and Wu, Jiele},
  journal={arXiv preprint arXiv:2102.04830},
  year={2021}
}