If you need help at any time, put your red sticky note on the back of your laptop. If you finish, you can put your green sticky note on the back of your laptop, then try to ask and answer another question.
At this stage, you are ready to undertake an independent project using everything you have learned this week and last week. Your task is to:
- Define an interesting question that one might ask and answer based on the Citi Bike data set (see some examples on the back of this page).
- In the README file that's in the Github repository you created at the beginning of today's session, write a short description of the question you intend to answer.
- Write a Python script or notebook to find the answer to the question that you posed.
- Commit your Python code to the Github repository you created at the beginning of today's session.
- In your README, explain what the answer to your question was, and how you found it.
- Include visualizations that help the reader understand the question and
answer. (Use
matplotlib
.)
Of course, the instructors are here to help you as you work on this open-ended project :) But your main resource will be the Internet - Python is a very popular language, and has great documentation online.
Some potential questions you might ask are listed on the back of this page, but you're not in any way limited to those!
\pagebreak
- Which stations are trending up in popularity? Which are trending down? (You might want to look at more than just 2016 for this)
- Which stations are the least balanced (more bikes leaving than entering, or vice versa)?
- Which stations support the youngest bikers? The oldest?
- Which are the starting stations that start bikers off on the longest trips (in terms of trip duration)?
- What are the most popular start-end station pairs?
- What is the most popular hour of the day to start a trip on weekdays? Weekends?
- Which stations serve mostly annual subscribers, and which are more likely to serve tourists?
Some more advanced questions you may want to answer if you've already finished one of the questions above:
- Using the
networkx
library, can you visualize the Citi Bike network as a directed graph, with the edge weight of each link giving the number of trips on that path? - Can you find and download Central Park precipitation data, and find the relationship between Citi Bike usage and temperature/precipitation?
- What stations are more popular during rush hour, and what stations have more even demand throughout the day?
- Using the
networkx
library: find the most used bike in the system (by duration), and show its trips as a directed graph. How often is the bike moved manually?
Some of the questions on this page may have been derived from or inspired by the following sources:
- "A Tale of Twenty-Two Million Citi Bike Rides: Analyzing the NYC Bike Share System", Todd W. Schneider,
- "Mapping Citi Bike’s Riders, Not Just Rides", Ben Wellington,
http://iquantny.tumblr.com/post/81465368612/mapping-citi-bikes-riders-not-just-rides
- "Transforming Citi Bike Data Into New Insights", Carto Blog,