Media Background Checks

This repository maintains the dataset and models described in our paper Generating Media Background Checks for Automated Source Critical Reasoning. We propose to generate Media Background Checks (MBCs) that summarise indicators of trustworthiness and tendency for media sources. MBCs can be used, either by humans or by retrieval-augmented models, to determine which documents can be relied on for further reasoning, and to craft reliable narratives based on untrustworthy evidence.

Dataset Structure

The dataset is structured as follows. First, the media background checks themselves can be found in data/mbcs. The division into splits can be found in data/splits. Each split file (train, dev, test) contains, separated by newlines, the filenames of the background checks associated with that split. The GPT-3.5 generated atomic facts we used for FactScore-style evaluation in the paper can be found in data/splits/dev_facts and data/splits/test_facts, respectively. Finally, data/ also contains the controversial and misinformative question-answer pairs we constructed for human evaluation.

Evaluation

Evaluation scripts with atomic facts (i.e., FactScore) and traditional metrics are found respectively in eval_with_atomics.py and eval_with_metrics.py. To run the scripts, please use:

python eval_with_atomics.py --predictions_folder your_output_folder --dataset_file data/splits/dev.tsv --fact_folder data/splits/dev_facts
python eval_with_metrics.py --predictions_folder your_output_folder --dataset_file data/splits/dev.tsv --reference_folder data/mbcs

Baseline

Our baseline can be run as follows:

python generate_media_background_checks.py --predictions_folder your_output_folder --dataset_file data/splits/dev.tsv

Citation

If you used our dataset or code, please cite our paper as:

@misc{schlichtkrull2024mediabackgroundchecks,
      title={Generating Media Background Checks for Automated Source Critical Reasoning}, 
      author={Michael Schlichtkrull},
      year={2024},
      eprint={2409.00781},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2409.00781}, 
}

License

All credit for the creation of the background checks included in this dataset goes to the Media Bias / Fact Check team. Assessments of credibility and bias can change over time, and if you are trying to evaluate the credibility of a media outlet it is important to use the most recent information. For up-to-date versions of each background check, please visit https://mediabiasfactcheck.com/.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
data		data
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
Background_check_example.png		Background_check_example.png
README.md		README.md
atomic_fact_generation_for_eval.py		atomic_fact_generation_for_eval.py
eval_with_atomics.py		eval_with_atomics.py
eval_with_metrics.py		eval_with_metrics.py
generate_media_background_checks.py		generate_media_background_checks.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Media Background Checks

Dataset Structure

Evaluation

Baseline

Citation

License

About

Releases

Packages

Languages

MichSchli/MediaBackgroundChecks

Folders and files

Latest commit

History

Repository files navigation

Media Background Checks

Dataset Structure

Evaluation

Baseline

Citation

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages