Skip to content
FlameSky edited this page Jan 14, 2022 · 32 revisions

MMSA-Feature Extraction Toolkit

MMSA-Feature Extraction Toolkit extracts multimodal features for Multimodal Sentiment Analysis Datasets. It integrates several commonly used tools for visual, acoustic and text modality. The extracted features are compatible with the MMSA Framework and thus can be used directly. The tool can also extract features for single videos.

1. Installation

MMSA-Feature Extraction Toolkit is available from Pypi:

$ pip install MMSA-FET

For the OpenFaceExtractor to work, a few system-wide dependancies are needed. See Dependency Installation for more information.

2. Quick Start

MMSA-FET is fairly easy to use. Below is a basic example on how to extract features for a single video file and a dataset folder.

Note: The dataset folder should be arranged the same way as the MMSA Framework does, see Dataset Folder Structure for details. Arranged datasets can be downloaded here with code ctgs

from MSA_FET import FeatureExtractionTool

# initialize with config file
fet = FeatureExtractionTool("config.json")

# extract features for single video
feature = fet.run_single("input.mp4")
print(feature)

# extract for dataset & save features to file
feature = fet.run_dataset(dataset_dir="~/MOSI", out_file="output/feature.pkl")

The config.json is the path to a custom config file, the format of which is introduced here.

For more details, please read APIs.

3. Config File

MMSA-FET comes with a few example configs which can be used like below.

# Each supported tool has an example config
fet = FeatureExtractionTool(config="librosa")
fet = FeatureExtractionTool(config="opensmile")
fet = FeatureExtractionTool(config="wav2vec")
fet = FeatureExtractionTool(config="openface")
fet = FeatureExtractionTool(config="mediapipe")
fet = FeatureExtractionTool(config="bert")
fet = FeatureExtractionTool(config="xlnet")

For customized features, you'll have to provide a config file which is in the following format.

{
  "audio": {
    "tool": "librosa",
    "sample_rate": null,
    "args": {
      "mfcc": {
        "n_mfcc": 20,
        "htk": true
      },
      "rms": {},
      "zero_crossing_rate": {},
      "spectral_rolloff": {},
      "spectral_centroid": {}
    }
  },
  "video": {
    "tool": "openface",
    "fps": 25,
    "average_over": 3,
    "args": {
      "hogalign": false,
      "simalign": false,
      "nobadaligned": false,
      "landmark_2D": true,
      "landmark_3D": false,
      "pdmparams": false,
      "head_pose": true,
      "action_units": true,
      "gaze": true,
      "tracked": false
    }
  },
  "text": {
    "model": "bert",
    "device": "cpu",
    "pretrained": "models/bert_base_uncased",
    "args": {}
  }
}

4. Supported Tools & Features

4.1 Audio Tools

4.2 Video Tools

  • OpenFace (link)

    Supports all features in OpenFace's FeatureExtraction binary, including: facial landmarks in 2D and 3D, head pose, gaze related, facial action units, HOG binary files. Details of these features can be found in the OpenFace Wiki here and here. Detailed configurations can be found here.

  • MediaPipe (link)

    Supports face mesh and holistic(face, hand, pose) solutions. Detailed configurations can be found here.

4.3 Text Tools

  • BERT (link)

    Integrated from huggingface transformers. Detailed configurations can be found here.

  • XLNet (link)

    Integrated from huggingface transformers. Detailed configurations can be found here.

Clone this wiki locally