NYC-taxi-data-analysis

Scripts in support of this post: "Where ya headed"? Analyzing Over 450 Million Taxi Trips Using Hadoop and PySpark.

This repo provides code to download, process, and analyze data for NYC 's taxi data. The data is stored using Hadoop Distributed File Systems (HDFS) on Amazon EC2 instances. A great guide on how to set this up yourself can be found here. I plan to make a post soon on how to run PySpark in tandem with HDFS however, this post will be in video format. All of the code is written in PySpark so if you're unfamiliar with the package a tutorial is here.

Coordinate-neighborhood mapping

Code and GeoJson file needed to map taxi trip coordinates to neighborhoods.

Taxi data

Folder containing links to all of the taxi data sets used for this project.

'Section' jupyter notebooks

Each jupyter notebook with 'Section' in the title contains the code used for that specific section of the post "Where ya headed"? Analyzing Over 450 Million Taxi Trips Using Hadoop and PySpark.

Sections 4 & 5.ipynb: Taxi ridership trends. How much will taxi ridership change in the future?
Section 6.ipynb: Which neighborhoods give taxis the most business?
Section 7.ipynb: How do rides change on weekdays vs. weekends?
Section 8.ipynb: What factors determine how much a customer is going to tip?
Section 9.ipynb: Do customers tip more on holidays?

Questions/issues/contact

[email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NYC-taxi-data-analysis

Coordinate-neighborhood mapping

Taxi data

'Section' jupyter notebooks

Questions/issues/contact

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
coordinate-neighborhood mapping		coordinate-neighborhood mapping
taxi data		taxi data
README.md		README.md
Section 6.ipynb		Section 6.ipynb
Section 7.ipynb		Section 7.ipynb
Section 8.ipynb		Section 8.ipynb
Sections 4 & 5.ipynb		Sections 4 & 5.ipynb

am2786/NYC-taxi-data-analysis

Folders and files

Latest commit

History

Repository files navigation

NYC-taxi-data-analysis

Coordinate-neighborhood mapping

Taxi data

'Section' jupyter notebooks

Questions/issues/contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages