This project focuses on predicting market prices using various regression models. The goal is to evaluate the performance of different models and identify the best one for this task. The dataset includes features such as output_own_price
, output_comp_price
, and output_own_profits
, among others.
The dataset used in this project is stored in output_data.csv
. It contains the following key features:
output_date
: The date of the market data.mkt_id
: The market identifier.output_own_price
: The target variable representing the market price.output_comp_price
: Competitor's price.output_own_profits
: Own profits.
- Encoding: Categorical variables (
output_date
andmkt_id
) are encoded usingLabelEncoder
. - Imputation: Missing values are handled using
SimpleImputer
with a mean strategy. - Feature Selection: Features are selected based on their importance using
SelectFromModel
.
The following regression models are implemented and evaluated:
- Mean Squared Error (MSE): 0.0663859081668457
- Best Features:
[output_own_cost, output_comp_price]
- Mean Squared Error (MSE): 0.0663859081668457
- Best Features:
[output_own_profits]
- Note: The model shows signs of underfitting.
- Mean Squared Error (MSE): 0.06638594717669895
- Best Features:
[output_own_cost, output_comp_price]
- Mean Squared Error (MSE): 0.12300046352838176
- Best Features:
[output_own_profits]
- Mean Squared Error (MSE): 0.014679371421792334
- Best Features:
[output_comp_price, output_X]
- Mean Squared Error (MSE): 0.05191168906110088
- Best Features:
[output_comp_price, output_X]
- Mean Squared Error (MSE): 0.019841374133488583
- Best Features:
[output_comp_price]
- Mean Squared Error (MSE): 0.22620928336071672
- Best Features:
[output_comp_price, output_X, output_own_profits]
The Random Forest Regressor achieved the best performance with the lowest Mean Squared Error (MSE) of 0.014679371421792334. The XGBoost Regressor also performed well with an MSE of 0.019841374133488583.
The script includes a function to plot actual vs. predicted values, providing a visual comparison of the model's performance. Below are the graphical representations of the models:
- Linear Regression:
![linear](https://private-user-images.githubusercontent.com/105394210/413629657-0402f260-0c8f-4977-893e-f3c1beb61755.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk3NjExOTEsIm5iZiI6MTczOTc2MDg5MSwicGF0aCI6Ii8xMDUzOTQyMTAvNDEzNjI5NjU3LTA0MDJmMjYwLTBjOGYtNDk3Ny04OTNlLWYzYzFiZWI2MTc1NS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjE3JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxN1QwMjU0NTFaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1lNzUwNmRhNmMyNjJkMmZjNTBmZmRmMDVlZmRmNTRjYjZlOWZiZDYxOTQ1MGM3NjQ2Yzg4ZTUwMGZlM2EwOWU3JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.cUvzv-R86Y0BXa8ni8UE2GKe2EVKTcc3EIBRVd7-q4U)
- Lasso Regression:
![lasso](https://private-user-images.githubusercontent.com/105394210/413629679-08fd70da-1dd6-4610-8c10-f3afb63069bd.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk3NjExOTEsIm5iZiI6MTczOTc2MDg5MSwicGF0aCI6Ii8xMDUzOTQyMTAvNDEzNjI5Njc5LTA4ZmQ3MGRhLTFkZDYtNDYxMC04YzEwLWYzYWZiNjMwNjliZC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjE3JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxN1QwMjU0NTFaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT03YTc3MjNmZGZkMzUyYmUxM2JjNzJhOTA1YjViNjJkNzA1YzA5ZTU0YzhkMDU2YzAwY2Q1Yzc4ZTBlOGRiMjAzJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.QxG-WOlFy1kjbgw0GRMaKEFkVw8wzLTU4WaYZFrkLxc)
- Ridge Regression:
![ridge](https://private-user-images.githubusercontent.com/105394210/413629693-d5128bf5-835d-4050-8d85-082eb4ee9fcc.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk3NjExOTEsIm5iZiI6MTczOTc2MDg5MSwicGF0aCI6Ii8xMDUzOTQyMTAvNDEzNjI5NjkzLWQ1MTI4YmY1LTgzNWQtNDA1MC04ZDg1LTA4MmViNGVlOWZjYy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjE3JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxN1QwMjU0NTFaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1kNDNjOWJiZmNkNzgzNjIyYjExY2M1ODAzODg4ZTI2ZjNjZThhOThjZWY5YTEyMzg3YzAzYmIxYjAyM2JiYWViJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.hwTYLeSARaoLeL7yWmhWBIsvXOMy7h165JFITNEKT1U)
- Elastic Net Regression:
![elastic net](https://private-user-images.githubusercontent.com/105394210/413629709-bac4613c-b090-4dd4-88e1-abda0a28c989.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk3NjExOTEsIm5iZiI6MTczOTc2MDg5MSwicGF0aCI6Ii8xMDUzOTQyMTAvNDEzNjI5NzA5LWJhYzQ2MTNjLWIwOTAtNGRkNC04OGUxLWFiZGEwYTI4Yzk4OS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjE3JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxN1QwMjU0NTFaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT04NDU5ZmYyM2RkNGQ4ZjI0ZWY1ZTQ4YjJlZjg3NzViMDQyZjg5NWJhYzY0YTYyZDFiZmY4YmY3NmUyN2VhYWVhJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.1vFW3lzHs9yIp85NVFLAiUubBau5IUorCwVST6uLU2E)
- Random Forest Regressor:
![random forest](https://private-user-images.githubusercontent.com/105394210/413629727-69b20987-e98a-4ccc-90ff-b7a7285e736a.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk3NjExOTEsIm5iZiI6MTczOTc2MDg5MSwicGF0aCI6Ii8xMDUzOTQyMTAvNDEzNjI5NzI3LTY5YjIwOTg3LWU5OGEtNGNjYy05MGZmLWI3YTcyODVlNzM2YS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjE3JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxN1QwMjU0NTFaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0yYzk5MmI0NDUxODgxMzlkMjVjODZkYzQ3NWE0OTI0ZGE3NTQ4MGMzYmVkNDVhNmI3NzhiOTJmN2Q2MjU2MzY2JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.6fuOFIlHJ7LQMRvrNE4VrUKtGVbsBAjWpPafY9miLYI)
- Gradient Boosting Regressor:
![gradient boosting](https://private-user-images.githubusercontent.com/105394210/413629741-aae79331-8a72-4273-bd8f-0fff7d867826.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk3NjExOTEsIm5iZiI6MTczOTc2MDg5MSwicGF0aCI6Ii8xMDUzOTQyMTAvNDEzNjI5NzQxLWFhZTc5MzMxLThhNzItNDI3My1iZDhmLTBmZmY3ZDg2NzgyNi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjE3JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxN1QwMjU0NTFaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT01ZTY3ZWE5NWYyODk1MzVhZGNlMjNlMTY4MjkwMzRhZGZjZjhkZDdiNzQ5ZjBjMjJkYWFkMDRkNTNjM2RmZTRjJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.EaqKAYn2YyOwf8MouEg6CmR_m0WYLg_hwhPMH-poSVI)
- XGBoost Regressor:
![XGB](https://private-user-images.githubusercontent.com/105394210/413629751-06b91c89-3936-4263-a40a-94cc113e9473.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk3NjExOTEsIm5iZiI6MTczOTc2MDg5MSwicGF0aCI6Ii8xMDUzOTQyMTAvNDEzNjI5NzUxLTA2YjkxYzg5LTM5MzYtNDI2My1hNDBhLTk0Y2MxMTNlOTQ3My5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjE3JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxN1QwMjU0NTFaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1jZWZiZDIwNGMyZTg2Yjk2MGUyN2Y3ZWFmNmQzZTlhNTViNWZmMjc1NzVlMzU0ZmNmNDM2ZmNmMDI3ZTEyMjUxJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.EpqTVN5TF8_G_uva4-ErCZBOgUCAYx8qlrMkMwYgjdA)
- Neural Network Regressor:
![neural network](https://private-user-images.githubusercontent.com/105394210/413629759-0d3902cb-a849-4d01-bb47-acd85396a76a.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk3NjExOTEsIm5iZiI6MTczOTc2MDg5MSwicGF0aCI6Ii8xMDUzOTQyMTAvNDEzNjI5NzU5LTBkMzkwMmNiLWE4NDktNGQwMS1iYjQ3LWFjZDg1Mzk2YTc2YS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjE3JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxN1QwMjU0NTFaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT00MjYwNjhhOGMzOTY1Nzk1Y2QwOGZhNTc3N2EzNmE0ZDcyMWE3ZWQ2NWY5MTdhZGNhYzJjNDA4YTM0ZTQ0OWY5JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.RBQ_CczY33HjWbizQgA9e68QZKygRBVjvMKxfm3PHSA)
- Clone the repository:
git clone https://github.com/MAHMOUD2ABDALLAH/furniture-Sales.git
- Navigate to the project directory:
cd furniture-Sales
- Install the required dependencies:
pip install -r requirements.txt
- Run the script:
python main.py
The project requires the following Python libraries:
pandas
numpy
scikit-learn
matplotlib
seaborn
xgboost
You can install them using:
pip install pandas numpy scikit-learn matplotlib seaborn xgboost
- Thanks to the contributors of the
scikit-learn
andxgboost
libraries for providing robust machine learning tools. - Special thanks to Upwork for the freelancing opportunity.
- Model Performance: The README highlights the best-performing models (Random Forest and XGBoost) and their MSE scores.
- Visualization: The graphs are linked to the corresponding models for easy reference.
- Usage Instructions: Clear steps are provided for running the project.