This repository maintains the dataset and models described in our paper Generating Media Background Checks for Automated Source Critical Reasoning. We propose to generate Media Background Checks (MBCs) that summarise indicators of trustworthiness and tendency for media sources. MBCs can be used, either by humans or by retrieval-augmented models, to determine which documents can be relied on for further reasoning, and to craft reliable narratives based on untrustworthy evidence.
The dataset is structured as follows. First, the media background checks themselves can be found in data/mbcs. The division into splits can be found in data/splits. Each split file (train, dev, test) contains, separated by newlines, the filenames of the background checks associated with that split. The GPT-3.5 generated atomic facts we used for FactScore-style evaluation in the paper can be found in data/splits/dev_facts and data/splits/test_facts, respectively. Finally, data/ also contains the controversial and misinformative question-answer pairs we constructed for human evaluation.
Evaluation scripts with atomic facts (i.e., FactScore) and traditional metrics are found respectively in eval_with_atomics.py and eval_with_metrics.py. To run the scripts, please use:
python eval_with_atomics.py --predictions_folder your_output_folder --dataset_file data/splits/dev.tsv --fact_folder data/splits/dev_facts
python eval_with_metrics.py --predictions_folder your_output_folder --dataset_file data/splits/dev.tsv --reference_folder data/mbcs
Our baseline can be run as follows:
python generate_media_background_checks.py --predictions_folder your_output_folder --dataset_file data/splits/dev.tsv
If you used our dataset or code, please cite our paper as:
@misc{schlichtkrull2024mediabackgroundchecks,
title={Generating Media Background Checks for Automated Source Critical Reasoning},
author={Michael Schlichtkrull},
year={2024},
eprint={2409.00781},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2409.00781},
}
All credit for the creation of the background checks included in this dataset goes to the Media Bias / Fact Check team. Assessments of credibility and bias can change over time, and if you are trying to evaluate the credibility of a media outlet it is important to use the most recent information. For up-to-date versions of each background check, please visit https://mediabiasfactcheck.com/.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.