Skip to content

This project gives whether a movie review is positive or negative

Notifications You must be signed in to change notification settings

RohitAudit/sentiment-analysis

Repository files navigation

Sentiment Analysis

This project helps in identifying whether a particular movie review is positive or negative depending on the training set present which is used to train our classifier.

Prerequisite:

  • Prior knowledge of Supervised Learning Methods.
  • Basics of python and how to use external libraries.
  • Basics of NLP

Installation required

  • Numpy
  • Pandas
  • Matplotlib
  • ScikitLearn

For Windows Powershell

python -m pip install numpy

or if you have Anaconda installed

conda install numpy

in Unix you can directly use pip install

Theory

Processing the text
Simple words can't be processed directly as computer doesn't understand them. So, they have to be parsed and converted into meaningful format which is easily understood by our computer.
For doing what is mentioned above follow the steps below:
Step 1: Tokenize them into smaller segments(mostly words).
Step 2: Remove stopwords(words that occur very oftenly like a,the,of,and etc.).
Step 3: Make bag of frequency matrix which keep count of all the different words occuring in a text.
Step 4: (Optional) Perform tf-idf vectorization on the matrix formed.

Classifying the text
In this problem movie reviews were tagged with the sentiment beforehand. Thus, we can use them to train our classifier according to the classification rules.
Step 5: Split the dataset into training and testing datasets
Step 6: Select the classifier and train the model with training dataset
Step 7: Predict the test dataset and measure accuracy

Analysing the text
AUC score (Area under curve) gives the measure of accuracy of the algorithm
Confusion Matrix gives a measure of the actual and predicted values

Results Explained

We used Bayesian Statistics and performed Gaussian and Multimonial methods on our text data.

Gaussian Naive Bayes alt text

ROC score= 0.6418341932040562

Confusion Matrix Predicted negative Predicted positive
Actual negative 421 216
Actual positive 296 476

Multimonial Naive Bayes alt text

ROC score =0.734385715685314(Area under curve)

Confusion Matrix Predicted negative Predicted positive
Actual negative 381 48
Actual positive 327 644

*** Thus, it can be seen that Multimonial Naive Bayes is more accurate than Gaussian Naive Bayes***

About

This project gives whether a movie review is positive or negative

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages