Fake News Classifier Web App with Deep Learning Language Model BERT

Author: Vinh Nguyen

Deploying huggingface‘s BERT to production with pytorch/serve

Deploy it: Using Heroku to continuously build and deploy a deep-learning powered web application

Data

Data was acquired from Kaggle. Author had aggregated various datasets across Kaggle pertaining to fake or real news.

The total dataset size is 5384 articles. They were labelled as either fake or real. Our training dataset included 4434 articles, while our test dataset included 950 articles.

Training our BERT Model

The model I utilized is BERT: Bidirectional Encoder Representations from Transformers. It’s a neural network architecture designed by Google researchers that’s totally transformed what’s state-of-the-art for NLP tasks, like text classification, translation, summarization, and question answering.

My preprocessing steps:

Lowercase our text (if we're using a BERT lowercase model)
Tokenize it (i.e. "sally says hi" -> ["sally", "says", "hi"])
Break words into WordPieces (i.e. "calling" -> ["call", "##ing"])
Map our words to indexes using a vocab file that BERT provides
Add special "CLS" and "SEP" tokens (see the readme)
Append "index" and "segment" tokens to each input (see the BERT paper)

For training, we had the following parameters:

Max Sequence Length = 128
Batch Size = 32
Learning Rate = 2e-5
Epochs = 12
Warmup Proportion = 0.1

Training only took 1.429 minutes. This is the beauty of BERT's parallel structure.

Evaluation

auc = 0.9884099
f1_score = 0.9885057
loss = 0.0650449
precision = 0.983368
recall = 0.99369746

Web App

To run locally:

Install requirements: pip install requirements.txt

Run Streamlit app: streamlit run app.py

Production

TODO:

Deploy to Heroku server as a web app.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.vscode		.vscode
data		data
media		media
model		model
notebooks		notebooks
test		test
utils		utils
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
heroku.yml		heroku.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fake News Classifier Web App with Deep Learning Language Model BERT

Data

Training our BERT Model

Evaluation

Web App

Production

About

Releases

Packages

Languages

vnguyendc/fake_news_classifier

Folders and files

Latest commit

History

Repository files navigation

Fake News Classifier Web App with Deep Learning Language Model BERT

Data

Training our BERT Model

Evaluation

Web App

Production

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages