This repository provides code and supplementary materials for the paper titled 'Large language models for sentiment analysis of newspaper articles during COVID-19: The Guardian'.
- t~ai Seminar Series, Large language models for sentiment analysis of newspaper articles during COVID-19
- Chandra, R., Zhu, B., Fang, Q., & Shinjikashvili, E. (2024). Large language models for sentiment analysis of newspaper articles during COVID-19: The Guardian. arXiv preprint arXiv:2405.13056: arXiv paper
Our framework is produced by visio, and the URL is: Visio Framework
We used a dataset of 10,000 manually labeled English tweets containing 10 different sentiments for training and testing. Additionally, the SenWave dataset from GitHub was utilised: SenWave Dataset
After fine-tuning the model, we used it to label sentiments in articles from The Guardian on Kaggle. Sections including Australia News, UK News, World News, and Opinion were selected for a detailed analysis. It's worth noting that the project also uses the Guardian News Articles dataset from Kaggle: Guardian News Articles Dataset
Note that the following code demonstration is mainly applied to the RoBERTa model. To load the dataset in Google Colab, follow these steps:
- Mount Google Drive: Use the command
drive.mount('/content/drive')
in your notebook to mount Google Drive. - Load Dataset: Utilize the
pd.read_csv()
function to read the CSV file. Replace thefile_path
variable with your CSV file path.
# Mount Google Drive
drive.mount('/content/drive')
# Read the CSV file into a DataFrame
file_path = "/content/drive/MyDrive/Colab Notebooks/labeledEn.csv"
df = pd.read_csv(file_path)
Note that the following code demonstration is mainly applied to the RoBERTa model.
To save a Pandas DataFrame as a CSV file, you can use the to_csv()
function. Here's how you can do it:
# Assuming `sen_train` and `sen_test` are your Pandas DataFrames for the training and testing sets
sen_train.to_csv("train.csv", index=False)
sen_test.to_csv("test.csv", index=False)
The repository includes individual Jupyter Notebook files for BERT model, RoBERTa model, visualisation and result part, namely:
model part:
BERT_model/BERT_model.ipynb
RoBERTa_model/Roberta_finetune1.0.ipynb
To save a PyTorch model, you can use the torch.save()
function. Here's how you can do it:
Note that the following code demonstration is mainly applied to the RoBERTa model.
import torch
# Assuming `model` is your PyTorch model
model = ...
# File path to save the model
file_path = '/content/drive/MyDrive/RoBERTa_ft.pth'
# Save the model
torch.save(model, f=file_path)
Note that the Jupyter Notebook files in the visualization section contains images of our results.
Visualisation/visualization2.ipynb
Visualisation/target_ngrams.ipynb
Visualisation/polarity_scores.py
We have article files labelled using two models, which we named BERT and RoBERTa.