Media outlets and social media platforms run rampant with "fake news," or information that has not been fact-checked, especially as they become more opinionated and stray away from centrist, fact-based reporting. This is an increasing issue in reporting, as the public receives most of their information in this way and depend on these outlets to be informed. According to BBC, false information can take many forms (satire, clickbait, propaganda, and mistakes), and it can be classified as disinformation or misinformation. It is very difficult for the public to identify any media outlet or social media post as any of these classifications without reading competing claims or doing their own research. Therefore, the purpose of this project is to show how a potential Fake News Detection tool can be built and used by various platforms to warn users of the content before they read.
This ethical tool will search through Tweets and attempt to label them as "right", "left", "centrist" and also add a level of fake news detection to them. While Twitter is currently working on this feature, it is not completely employed at the moment. The goal of this feature is to minimize the amount of fake news that the public receives from social media outlets. This tool aims to eventually be able to use machine learning algorithms to aid in its fake news detection. This detection is done by searching the selected user's tweets for various words which indicate fake news, such as "most", "least", etc.
The project is funded by Mozilla Foundation and it will be used in Data Analytics course at Allegheny College. Please visit the Allegheny Ethical CS for more information.
-
Twitter API to search for a user, their screen name, hashtag, or keyword.
- API from Bluebird
-
Tweet classification(binary)
- Naive Bayes
- Linear SVM
- Credit to Zach Leonardo on Polarized
-
Tweet classification
- fake
- true
- Credit to Favio Vazquez on fake-news
-
Clone the source code onto your machine
With HTTPS:
https://github.com/Allegheny-Mozilla-Fellows/FakeNewsDetection.git
or With SSH:
[email protected]:Allegheny-Mozilla-Fellows/FakeNewsDetection.git
After pulling the repo, install textblob and its data and install the virtual environment requirements:
pip install textblob
and then,
python3 -m textblob.download_corpora
and then,
pip install -U pandas-profiling
and then,
pipenv install --dev
After installing these packages, you will run the program with the command
pipenv run streamlit run streamlit_web.py
After running this command, you will be prompted to enter the name of a given senator, which the API will cross-reference with current Twitter users. You will then confirm the name of the senator and choose your preferred diagram for output.
Currently, this project examines tweets using a Twitter API provided by Bluebird. This project can be furthered by adding more classifications to the tweets or adding features to visualize how many tweets contain false information, and its effect on society, the media, and democracy. Another great addition to the project would be utilizing other methods to detect fake news, such as coding different algorithms, developing a Bot, or using AI.
Here is the list of articles that may give the user more insights into fake news detection.
-
What happens if one news outlet or platform produces more fake news than another? Will that alter the way we perceive news and/or classify facts?
-
Why might algorithms be particularly harmful for detecting fake news?
-
Should we enforce using fake news detecting algorithms? Do media outlets and social media platforms have an obligation to detect fake news?
-
What are some of the ways we can prevent biases in fake news detection algorithms as developers and as users?
The data used in this project is retrieved from the Twitter website.
If you have any questions or concerns about this project please contact:
- Dr. Jumadinova([email protected])
- Rachael Harris ([email protected])