The 3rd core project in Udacity's Data Analyst Nanodegree.
This project aims to challenge what was learned in the Data Wrangling chapter. It's based on wrangling Twitter data from an account named "WeRateDogs". The outcome is creating interesting and trustworthy analyses and visualization.
Udacity's Introduction:
The dataset that you will be wrangling (and analyzing and visualizing) is the tweet archive of Twitter user @dog_rates, also known as WeRateDogs. WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog. These ratings almost always have a denominator of 10. The numerators, though? Almost always greater than 10. 11/10, 12/10, 13/10, etc. Why? Because "they're good dogs Brent." WeRateDogs has over 4 million followers and has received international media coverage.
The project should follow these 6 steps:
- Gathering data
- Assessing data
- Cleaning data
- Storing data
- Analyzing and visualizing data
- Reporting the wrangling efforts and the analyses (and visualizations)
I was required to collect data from 3 different sources, and resulting in 3 different file types. Each of these must be imported into a seperate pandas DataFrame at first.
Sources collected:
-
WeRateDogs Twitter archive File was provided by Udacity.
-
Tweet image predictions Using the Requests Library to request it from a given link.
-
Twitter API Using
Tweepy
and the IDs from the twitter archive gathering every tweet's retweet and favorite count was possible.
File types (respective to the sources):
- csv
- tsv
- txt
WIP