Sentiment Analysis Project

Overview

This repository contains the code and findings from an exploration of Twitter data preprocessing and sentiment analysis. The analysis compares sentiment analysis models VADER, SpacyTextBlob, and Hugging Face's RoBERTa. The focus is on understanding their performance characteristics and drawing insights from the results.

Project Structure

├── LICENSE
├── README.md
├── bonus (screenshots for bonus tasks)
├── codebase_daniil
│   └── preprocessor.py
├── combined_codebase
│   ├── combine_versions.sh
│   ├── main.py
│   ├── preprocessor_a.py
│   ├── sentiment_analyser_a.py
├── data
├── project_requirements.txt (specifies what requirements we attempted to fulfill)
└── requirements.txt

Usage

To run the preprocessing and sentiment analysis, execute the provided bash script:

bash combined_codebase/combine_versions.sh

In the bash script, the file path should be provided through the corresponding flag. Also, the other flag can be used for choosing if the sentiment analysis should be performed or not. The flags are showcased below:

--file_path data/data.csv --sentiment_analysis

Data Preprocessing

The data preprocessing phase covers various steps, including:

Handling missing values
Converting data types
Lowercasing text
Removing non-ASCII characters, emojis, stopwords
Stemming words
Removing numbers, punctuation, non-English words
Fixing labels and removing empty tweets
These steps collectively create a clean and standardized dataset for effective sentiment analysis.

Sentiment Analysis

The sentiment analysis compares VADER, SpacyTextBlob, and RoBERTa models. The findings indicate that VADER and TextBlob, being lexicon and rule-based models, struggle with nuanced sentiment expressions. RoBERTa, a deep learning model, outperforms with superior precision, recall, and overall F1-score across all sentiment classes.

Discussion

The discussion delves into the factors contributing to RoBERTa's superior performance, emphasizing its deep learning architecture, pre-training on a large dataset, and fine-tuning for sentiment analysis tasks.

Conclusion

This report highlights the importance of robust data preprocessing in preparing Twitter data for sentiment analysis. It emphasizes the limitations of traditional lexicon-based and rule-based models and showcases the advancements achieved with state-of-the-art deep learning models like RoBERTa.

Dependencies

Python 3.x
Required Python packages (install using pip install -r requirements.txt) - included in the bash script

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis Project

Overview

Project Structure

Usage

Data Preprocessing

Sentiment Analysis

Discussion

Conclusion

Dependencies

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
bonus		bonus
codebase_daniil		codebase_daniil
codebase_katharina		codebase_katharina
combined_codebase		combined_codebase
data		data
LICENSE		LICENSE
README.md		README.md
project_requirements.txt		project_requirements.txt
report.pdf		report.pdf
requirements.txt		requirements.txt

License

d-gurgurov/Sentiment-Analysis-Roberta

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis Project

Overview

Project Structure

Usage

Data Preprocessing

Sentiment Analysis

Discussion

Conclusion

Dependencies

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages