GitHub - joaoaleite/euvsdisinfo: This repository allows collecting the EUvsDisinfo dataset and reproducing the research experiments presented in the paper.

About

This repository contains the materials that allow reproducing the work introduced in the paper "EUvsDisinfo: A Dataset for Multilingual Detection of Pro-Kremlin Disinformation in News Articles". The EUvsDisinfo dataset contains around 18K articles labelled as either containing misinformation or not. The misinformation articles are sourced from pro-Kremilin outlets, while non-misinformation articles are sourced from credible / less biased outlets. The dataset is collected according to the URLs cited within the debunks made by the EUvsDisinfo organisation in their website.

Collect EuvsDisinfo

Use this repository to collect the EuvsDisinfo dataset described in our paper 'EUvsDisinfo: A Dataset for Multilingual Detection of Pro-Kremlin Disinformation in News Articles'.

Setup python environment:

conda create -n euvsdisinfo python=3.11.5
conda activate euvsdisinfo
pip install -r requirements.txt

To collect the data:

Get API Keys for the DiffBot API (free for academic purposes).
Set the environment variable DIFFBOT_API_KEY with your keys: export DIFFBOT_API_KEY="my_keys"
Download the base data file in Zenodo.
Create a folder named data in the root directory.
Place the base data file inside the data folder.
Run python3 scripts/collect/collect.py.
When finished, the script should save a file named euvsdisinfo.csv inside the data folder.

To reproduce the experiments:

Data analysis: open and run the eda.ipynb jupyter notebook.
Classification:
1. Run the python script for the desired scenario inside baselines/.
2. After finished, the script will save the results in a file named results_{scenario}.csv in the root folder.

Supplementary material:

Please refer to this file.

License

The EUvsDisinfo dataset is licensed under a Creative Commons BY-SA 4.0 license. The code available for reproducing experiments is licensed under an Apache-2.0 license that can be found in the file LICENSE.txt.

Citing:

Dataset: https://zenodo.org/records/10514307

Software: https://zenodo.org/records/10492913

Paper: https://dl.acm.org/doi/abs/10.1145/3627673.3679167

BibTex:

@inproceedings{10.1145/3627673.3679167,
author = {Leite, Jo\~{a}o A. and Razuvayevskaya, Olesya and Bontcheva, Kalina and Scarton, Carolina},
title = {EUvsDisinfo: A Dataset for Multilingual Detection of Pro-Kremlin Disinformation in News Articles},
year = {2024},
isbn = {9798400704369},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3627673.3679167},
doi = {10.1145/3627673.3679167},
abstract = {This work introduces EUvsDisinfo, a multilingual dataset of disinformation articles originating from pro-Kremlin outlets, along with trustworthy articles from credible / less biased sources. It is sourced directly from the debunk articles written by experts leading the EUvsDisinfo project. Our dataset is the largest to-date resource in terms of the overall number of articles and distinct languages. It also provides the largest topical and temporal coverage. Using this dataset, we investigate the dissemination of pro-Kremlin disinformation across different languages, uncovering language-specific patterns targeting certain disinformation topics. We further analyse the evolution of topic distribution over an eight-year period, noting a significant surge in disinformation content before the full-scale invasion of Ukraine in 2022. Lastly, we demonstrate the dataset's applicability in training models to effectively distinguish between disinformation and trustworthy content in multilingual settings.},
booktitle = {Proceedings of the 33rd ACM International Conference on Information and Knowledge Management},
pages = {5380–5384},
numpages = {5},
keywords = {classification, dataset, disinformation, news articles, pro-kremlin},
location = {Boise, ID, USA},
series = {CIKM '24}
}

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
supplementary_material.md		supplementary_material.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Collect EuvsDisinfo

Setup python environment:

To collect the data:

To reproduce the experiments:

Supplementary material:

License

Citing:

About

Releases

Packages

Contributors 2

Languages

License

joaoaleite/euvsdisinfo

Folders and files

Latest commit

History

Repository files navigation

About

Collect EuvsDisinfo

Setup python environment:

To collect the data:

To reproduce the experiments:

Supplementary material:

License

Citing:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages