Skip to content

Latest commit

 

History

History
85 lines (57 loc) · 3.57 KB

README.md

File metadata and controls

85 lines (57 loc) · 3.57 KB

Sentiment Analysis of COVID-19 News Articles

This repository provides code and supplementary materials for the paper titled 'Large language models for sentiment analysis of newspaper articles during COVID-19: The Guardian'.

Seminar

Publication

  • Chandra, R., Zhu, B., Fang, Q., & Shinjikashvili, E. (2024). Large language models for sentiment analysis of newspaper articles during COVID-19: The Guardian. arXiv preprint arXiv:2405.13056: arXiv paper

Our framework is produced by visio, and the URL is: Visio Framework

Preparing Dataset

We used a dataset of 10,000 manually labeled English tweets containing 10 different sentiments for training and testing. Additionally, the SenWave dataset from GitHub was utilised: SenWave Dataset

After fine-tuning the model, we used it to label sentiments in articles from The Guardian on Kaggle. Sections including Australia News, UK News, World News, and Opinion were selected for a detailed analysis. It's worth noting that the project also uses the Guardian News Articles dataset from Kaggle: Guardian News Articles Dataset

Load Dataset using Google Drive

Note that the following code demonstration is mainly applied to the RoBERTa model. To load the dataset in Google Colab, follow these steps:

  1. Mount Google Drive: Use the command drive.mount('/content/drive') in your notebook to mount Google Drive.
  2. Load Dataset: Utilize the pd.read_csv() function to read the CSV file. Replace the file_path variable with your CSV file path.
# Mount Google Drive
drive.mount('/content/drive')
# Read the CSV file into a DataFrame
file_path = "/content/drive/MyDrive/Colab Notebooks/labeledEn.csv"
df = pd.read_csv(file_path)

Saving DataFrames as CSV Files

Note that the following code demonstration is mainly applied to the RoBERTa model. To save a Pandas DataFrame as a CSV file, you can use the to_csv() function. Here's how you can do it:

# Assuming `sen_train` and `sen_test` are your Pandas DataFrames for the training and testing sets
sen_train.to_csv("train.csv", index=False)
sen_test.to_csv("test.csv", index=False)

Notebooks for project: main run code.

The repository includes individual Jupyter Notebook files for BERT model, RoBERTa model, visualisation and result part, namely:

model part:

BERT_model/BERT_model.ipynb

RoBERTa_model/Roberta_finetune1.0.ipynb

Saving a PyTorch Model

To save a PyTorch model, you can use the torch.save() function. Here's how you can do it: Note that the following code demonstration is mainly applied to the RoBERTa model.

import torch

# Assuming `model` is your PyTorch model
model = ...

# File path to save the model
file_path = '/content/drive/MyDrive/RoBERTa_ft.pth'

# Save the model
torch.save(model, f=file_path)

Visualisation part:

Note that the Jupyter Notebook files in the visualization section contains images of our results.

Visualisation/visualization2.ipynb

Visualisation/target_ngrams.ipynb

Visualisation/polarity_scores.py

Results part

We have article files labelled using two models, which we named BERT and RoBERTa.