This project (Write a Data Science Blog Post) is part of Udacity Data Scientist Nanodegree Program. I used Toronto Airbnb Dataset for this project as its the city I live in. I'm interested in using data science techniques to analyze ways to improve future listings. The questions analyzed may be similar to data sources one might encounter in a business setting. Additionally, many of the approaches and skills used in this project can be applicable to future work projects.
Using the data, I answered the following questions:
- What are the most common amenities in the dataset?
- Which neighborhoods have the highest number of listings and rating review scores?
- What is the relationship between the type of room and price listing?
- What are the most influential features of the dataset to predict the price of a listing?
The dataset describes the listing activities. The original dataset can be found here: https://www.kaggle.com/robinkongninglo/toronto-airbnb-dataset
Determined the most common amenities in Toronto listings are:
- Wifi
- Heating
- Smoke Alarm
- Essentials
- Kitchen
-
Waterfront Communities - The Island has the most listings, followed by Niagara, and then Annex.
-
Forest Hill South, Ionview, and High Park-Swansea have the highest review score ratings.
-
Entire home/apt has the highest median price compared to the other room type listing. Shared room is at the lowest median.
-
The features that has the most influence on the price listing are bedrooms, followed by Entire home/apt, then accommodates.
Here is the Medium blog post I have written: https://le-peter1993.medium.com/data-exploration-for-toronto-airbnb-56b5387d7007
I use Python3 in my Jupyter Notebook:
- Numpy
- Pandas
- Scikit Learn
- Matplotlib
- Seaborn
- Folium
- Collections
- Math
- Toronto Airbnb Dataset.ipynb - Jupyter notebook with complete analysis, answers to the questions, explanations and visualisations
- listings_sep_09_2020.csv - Original Toronto Airbnb Dataset from Sept 2020 in csv format