Skip to content

Latest commit

 

History

History
80 lines (64 loc) · 2.76 KB

MODELCARD.md

File metadata and controls

80 lines (64 loc) · 2.76 KB

Model card

Remark

random_forest.pkl file is not in the repository because it is too big. You can runt the train_random_forest.py file to generate the model.

Project context

Predicting the price of Houses, Apartments and Villas in Belgium.

Data

Input dataset: Is scraped from leading real estate websites in Belgium. Target variable: Price Features: 'Bathroom Count', 'Bedroom Count', 'Habitable Surface', 'Land Surface', 'Consumption', 'Postal Code', 'Facades', 'Subtype', 'Toilet Count', 'Kitchen Type', 'State of Building', # 'Sea view', 'Swimming Pool', 'Price', 'Longitude', 'Latitude', 'EPC', 'cd_munty_refnis', 'PopDensity', 'MedianPropertyValue', 'NetIncomePerResident'

Model details

Basic Linear regression model

Basic linear regression model

Advanced Linear regression model

Advanced linear regression model with log scaling for making non linear features linear.

Random Forest model:

Random Forest model with max_depth of 20

Performance

Basic Linear regression model

r_squared score on training data itself: 97.71% r_squared score on testing data: 87.29% basic linear

Advanced Linear regression model

r_squared score on training data itself: 67.83% r_squared score on testing data: 68.53% advanced linear

Random Forest model

r_squared score on training data itself: 97.71% r_squared score on testing data: 87.29% random forest

Limitations

Random Forest model

It has a really great score on the training data itself and should be further tested with cross validation to see if it is overwriting or not.

Usage

install requirements

Before charlie can predict the price of a house, we need to install the requirements.

pip install -r requirements.txt

OPTIONAL: Update external data

If you want to update the external data, you can download the latest data from the following links: Go to statbel.fgov.be to download the latest geojson (ZIP), extract the file and copy the sh_statbel_statistical_sectors_31370_20230101.geojson file and copy it to ./data/external_data/REFNIS_2023.geojson and run the following command in the terminal:

cd src # move to the src folder
python join_external_data.py

train a model

It best to move to root src folder before running the train model file.

# train a model
python ./models/train_basic_linearregression.py
# or
python ./models/train_linearregression_log10.py
# or
python ./models/train_random_forest.py

Maintainers

LinkedIn