1st_Project - King County House Data Set

This repository includes the results of my first data science project during my data science bootcamp at neuefische in Hamburg.

This project includes work with the complete data science lifecycle on a dataset. The given dataset is the King County House dataset, which includes sales in the King County Area from begining of May 2014 to the end of May 2015. My task in this project is to figure out at least 3 recommendations for buyers based on the given dataset and developed features. Furthermore a multivariate linear regression model to predict the price of houses is developed. To get an impression about the area of King County, the map of King County is shown here.

This github repository includes the following data:

Jupiter notebook with all code and descriptive text for all steps of the data science lifecycle Jupyter Notebook
Slides of the Presentation as a PDF-file Slides
under figures the ouput figures are stored
under rawdata the given original dataset and the given column description are stored

The online presentation on GSlides is here

The focus of this project is mainly on EDA (Exploratory Data Analysis), but during the project all steps of the data science lifecycle are conducted. These steps are summed up in the following:

Business Understanding

What is the objective of this project?
What prolems need to be tackled?

Data Mining

Get data or scrape data.

Data Cleaning

Fix missing data.
Fix inconsistencies based on assumptions.

Data Exploration

Visually analyze your data by using:

Correlation analysis
Heatmap
Histograms
Scatter Plots
Box plots
Surface Plots

Feature Engineering

Select important features and develop new and more meaningful data. In this project new features regarding the age, distance to Seattle and renovation were developed.

Predictive Modelling

Use machine algorithms to make predictions. In this case a multivariate linear regression model was developed including new features.

Data Visualization

Communicate the key findings using plots and visualizations. In this project this is a presentation to non-technical stakeholders.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
figures		figures
rawdata		rawdata
.DS_Store		.DS_Store
.gitignore		.gitignore
1st_Project_King_County_Houses_Slides.pdf		1st_Project_King_County_Houses_Slides.pdf
First_Project_King_County_Housing_Prices.ipynb		First_Project_King_County_Housing_Prices.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1st_Project - King County House Data Set

Business Understanding

Data Mining

Data Cleaning

Data Exploration

Feature Engineering

Predictive Modelling

Data Visualization

About

Releases

Packages

Languages

jb-ds2020/1st_Project

Folders and files

Latest commit

History

Repository files navigation

1st_Project - King County House Data Set

Business Understanding

Data Mining

Data Cleaning

Data Exploration

Feature Engineering

Predictive Modelling

Data Visualization

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages