Skip to content

Latest commit

 

History

History
39 lines (37 loc) · 1.71 KB

File metadata and controls

39 lines (37 loc) · 1.71 KB

Twitter_US_Airline_Sentiment_Analysis

Dataset

This data originally came from Crowdflower's Data for Everyone library: http://www.crowdflower.com/data-for-everyone which states:
A sentiment analysis job about the problems of each major U.S. airline. Twitter data was scraped from February of 2015 and contributors were asked to first classify positive, negative, and neutral tweets, followed by categorizing negative reasons (such as "late flight" or "rude service").

It contains whether the sentiment of the tweets in this set was positive, neutral, or negative for six US airlines:
Screen Shot 2019-03-31 at 5 50 43 PM

Features

The csv file has been added to the repo as Tweets_data.csv.It contains the following features (columns):
tweet_id
airline_sentiment
airline_sentiment_confidence
negativereason
negativereason_confidence
airline
airline_sentiment_gold
name
negativereason_gold
retweet_count
text
tweet_coord
tweet_created
tweet_location
user_timezone
Tweets

Implementation

The data was cleaned using Natural Language Toolkit (NLTK).
For the analysis, Multinomial Naive Bayes and Supprt Vector Machine were used.

Multinomial Naive Bayes Results

MultinomialNB classifier from sklearn was used.
Training Accuracy: 80.87%
Testing Accuracy: 77.18%

Support Vector Machine Results

SVC classifier from sklearn was used.
Training Accuracy: 87.94%
Training Weighted Average F1-score: 0.88
Testing Accuracy: 78.79%
Testing Weighted Average F1-score: 0.79