Skip to content

NaGho/text_sentiment_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text Sentiment Analysis

We aim to predict the sentiment of the text provided in the IMDB Dataset of 50K Movie Reviews - Kaggle between 'positive' and 'negative'.

We try three approaches on %20 of the data (of size 10k) that was set aside as the testing set:

  • Two-shot LLM evaluation with Qwen2.5: we prompt the LLM with two samples of positive and negative sentiment texts and then ask it to return a new one for the given text. This gives our second to best F1-Score = 0.86.
  • TF-IDF: we use term-frequency-inverse-document-frequency features comptued from the training test, train a classifier, and then apply it on the test data. This gives our highest F1-Score = 0.89.
  • NLTK-Sentiment Analysis: Finally, we use the package ready sentiment analysis from nltk in python to predict sentiments of each text. This approach has the lowest performance with an F1-Score of 0.67.

Details of the models' performances are provided below:

Two-Shot LLM evaluation with Qwen2.5

Classification Report: precision recall f1-score support

       0       0.83      0.93      0.88      4961
       1       0.92      0.82      0.86      5039

accuracy                           0.87     10000

macro avg 0.87 0.87 0.87 10000 weighted avg 0.88 0.87 0.87 10000

Confusion Matrix: [[4592 369] [ 927 4112]]

F1-Score = 0.86

TF-IDF Scores:

Vectorizing text... Training model...

Classification Report: precision recall f1-score support

       0       0.90      0.87      0.89      4961
       1       0.88      0.90      0.89      5039

accuracy                           0.89     10000

macro avg 0.89 0.89 0.89 10000 weighted avg 0.89 0.89 0.89 10000

Confusion Matrix: [[4332 629] [ 487 4552]]

F1-Score = 0.89

Nltk Sentiment Analysis:

Classification Report: precision recall f1-score support

       0       0.78      0.01      0.01      4961
       1       0.50      1.00      0.67      5039

accuracy                           0.51     10000

macro avg 0.64 0.50 0.34 10000 weighted avg 0.64 0.51 0.34 10000

Confusion Matrix: [[ 29 4932] [ 8 5031]]

F1-Score = 0.67

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published