- Authors: Archiki Prasad, Trung Bui, Seunghyun Yoon, Hanieh Deilamsalehy, Franck Dernoncourt, Mohit Bansal
- Paper // Project Page
- Note: This repository contains code and data for our ACL 2023 paper "MeetingQA: Extractive Question-Answering on Meeting Transcripts".
This code is written using PyTorch and HuggingFace's Transformer repo. Running MeetingQA baselines requires access to GPUs. Most evaluations are relatively light-weight, so a total of 2-3 GPUs should suffice.
The consituent folders are described below (also refer to corresponding README.md
files in the folders):
- requirements: Contains various
*_requirements.txt
files used to setup multiple conda environments used throughout the project. - DataCollection: Contains code used for gathering and processing raw meeting transcripts/datasets.
- PostAnnotationProcessing: Contains code use in annotation and post-processing of meeting transcripts for QA.
- qaCode: Contains code for training models and running experiments.
- ProcessedTranscripts: Contains processed meeting transcripts (data).
- SyntheticDataset: Contains dataset files used for model training and evaluation (data).
In order to setup the project from scratch, follow the steps described in the folders following the order requirements -> DataCollection -> PostAnnotationProcessing -> qaCode
. The appropriate data files and folders will be created in the process starting with ProcessedTranscripts
followed by AllData
.
Please cite our paper if you use our repository and/or dataset in your works:
@article{prasad2023meeting,
author = {Archiki Prasad, Trung Bui, Seunghyun Yoon, Hanieh Deilamsalehy, Franck Dernoncourt, and Mohit Bansal},
title = {MeetingQA: Extractive Question-Answering on Meeting Transcripts},
journal = {61st Annual Meeting of the Association for Computational Linguistics (ACL)},
year = {2023}