random_forest.pkl
file is not in the repository because it is too big. You can runt the train_random_forest.py file
to generate the model.
Predicting the price of Houses, Apartments and Villas in Belgium.
Input dataset: Is scraped from leading real estate websites in Belgium. Target variable: Price Features: 'Bathroom Count', 'Bedroom Count', 'Habitable Surface', 'Land Surface', 'Consumption', 'Postal Code', 'Facades', 'Subtype', 'Toilet Count', 'Kitchen Type', 'State of Building', # 'Sea view', 'Swimming Pool', 'Price', 'Longitude', 'Latitude', 'EPC', 'cd_munty_refnis', 'PopDensity', 'MedianPropertyValue', 'NetIncomePerResident'
Basic linear regression model
Advanced linear regression model with log scaling for making non linear features linear.
Random Forest model with max_depth of 20
r_squared score on training data itself: 97.71% r_squared score on testing data: 87.29%
r_squared score on training data itself: 67.83% r_squared score on testing data: 68.53%
r_squared score on training data itself: 97.71% r_squared score on testing data: 87.29%
It has a really great score on the training data itself and should be further tested with cross validation to see if it is overwriting or not.
Before charlie can predict the price of a house, we need to install the requirements.
pip install -r requirements.txt
If you want to update the external data, you can download the latest data from the following links:
Go to statbel.fgov.be to download the latest
geojson (ZIP), extract the file and copy the sh_statbel_statistical_sectors_31370_20230101.geojson
file and copy it
to ./data/external_data/REFNIS_2023.geojson
and run the following command in the terminal:
cd src # move to the src folder
python join_external_data.py
It best to move to root src folder before running the train model file.
# train a model
python ./models/train_basic_linearregression.py
# or
python ./models/train_linearregression_log10.py
# or
python ./models/train_random_forest.py