💡 The goal of the project is to predict house prices in different Italian cities using the following datasets:
- In
train.csv
you will find a set of rows, each containing data for one apartment for sale. The target variable that you have to predict is the sale price. poi.csv
contains the coordinates of points of insterest that you can use to further enrich the features available in your training dataset.
Metric: Submissions were evaluated on Mean-Squared-Error (MSE) between the predicted value and the observed sales price
The file is currently structured as follows:
Introduction
Part 0: setting up environment
Part 1: Data Exploration
└── 1.1 Data Visualization └── 1.2 Data Cleaning │ └── 1.2.1 missing data │ └── 1.2.2 outliers 1.3 Feature Engineering 1.4 Mixing the ingredients Part 2 : Modelling
└── 2.1 splitting thedata └── 2.2 modelling └── 2.3 comparing models Part 3: A-B testing / CV Conclusions