Skip to content

Performed sentiment analysis for XYZ company on playstore reviews to categorize customer reviews as 'POSITIVE' or 'NEGATIVE'

Notifications You must be signed in to change notification settings

NishthaSharma-22/Sentiment-Analysis-with-Playstore-reviews

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Sentiment-Analysis-with-Playstore-reviews

Performed sentiment analysis for XYZ company on playstore reviews to categorize customer reviews as 'POSITIVE' or 'NEGATIVE'

INTRODUCTION:

AIM: To perform sentiment analysis on Google Play Store app reviews, classifying them as either "Positive" or "Negative"
DATASET USED: Sample dataset was sourced from kaggle.
TOOLS AND LIBRARIES: This project is made with Python and uses:

  • NLTK for text preprocessing
  • sci-kit learn for machine learning for ML models (Logistic regression, naive bayes)
  • Pandas for data manipulation
  • Seaborn and Matplotlib for data visualization (making confusion matrices)

DATA UNDERSTANDING:

Dataset had 2 useful columns with user reviews and another one with score for those reviews on a scale of 1 to 5, where:

  • 1 = Very Negative
  • 2 = Negative
  • 3 = Neutral
  • 4 = Positive
  • 5 = Very Positive

DATA PREPROCESSING:

Text Cleaning: Used NLTK for:

  • Tokenization
  • Stop word removal
  • Lemmatization

Label Assignment:
Scores of 1 and 2 are labeled as negative
Scores of 4 and 5 are labeled as positive
Neutral Scores (3) are removed from the dataset

FEATURE EXTRACTION:

TF-IDF (Text-frequency inverse document frequency):
Used TfidfVectorizer to convert the cleaned text into numerical features suitable for machine learning models.
Limited the feature size to 6000 terms for efficient computation while preventing overfit.

MODEL IMPLEMENTATION:

Logistic Regression:

  • Initially implemented logistic regression
  • Accuracy achieved: 87%
  • Pros: Simple and easy to interpret, excellent for binary classification
  • Cons: Assumes linear relation between features, and best useful when datasets are small- medium sized.

Naive Bayes:

  • Decided to implement naive bayes to compare accuracy
  • Achieved accuracy of 85%
  • Pros: Simple and effective for text processing
  • Cons: Assumes no interrelation between words, hence ‘naive’

CONCLUSION:

Logistic Regression was the best-performing model with an accuracy of 87%. Naive Bayes came close but was slightly lower in performance.

Future Work:

Use advanced models and explore word embeddings.

About

Performed sentiment analysis for XYZ company on playstore reviews to categorize customer reviews as 'POSITIVE' or 'NEGATIVE'

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published