Salary Prediction Project
Companies like Glassdoor and Paysa are popping up which give potential employees and recruiters access to all sorts of information regarding the profile of a company. These companies provide information ranging from reviews, to jobs and also salaries for said jobs that a given company might have posted.
Since not all employees are not willing to share how much they really make and those that do, do so because of anonymity, we are unable to really see what companies pay with regards to the jobs they hire for.
The problem that arises now is missing data regarding salaries though we do have some insight as to what companies hire for.
We can tackle this problem by using Machine Learning and the power of prediction that comes with Machine Learning to estimate salaries for the jobs we dont have any salary data for.
Here are the files that might be of importance to you...
The jupyter notebook called exploratory_data_analysis.ipynb is for the EDA and to explore the data to see what the distributions look like and how the data is structured.
The notebook called 'Modeling' is a breakdown of the script.py file and executed in chunks to show the progress at each step.
(These two files reside in the notebooks folder.)
The script.py file is a script that contains the entire code for the Modeling.py file. This script cleans the data, encodes it, standardizes it, creates 4 different models and outputs the results and feature importances to a txt and csv respectively. (This file resides in the src folder, here you can see all the source code)
All the charts you see in the EDA notebook can be found in the reports section.
The feature importances of the data as ranked by the model can be found in the feature importance csv file in the root folder, as well as can be found as an image with filename feature_importances.png.
The predicted salaries are in a csv file in the root folder with the name predictions_salaries.csv