Data analysis with Python to building and evaluating data models
Tasks | Topic | Content Included |
---|---|---|
Task 1 | User Car Pricing | Data Wrangling: Handling missing values, Data Format, Data Standardization, Data Normalization, Binning and Indicator variable |
Task 2 | Laptops Pricing | EDA, groups and pivot tables, Pearson Correlation |
Task 3 | automobile Price | Model Development; Linear Regression, Polynomial Regression, Pipeline, Visualization, R-square, Mean Square Error, Prediction & Decision Making |
Task 4 | Laptop Pricing | Model Development, Linear Regression, Polynomial Regression, Pipeline, Plot |
Task 5 | automobile pricing | Model Evaluation, overfitting, Ridge Regression, Cross validation, Grid Search |
Task 6 | Model Evaluation and Refinement | train, test, cross validation,overfitting, Ridge, Grid search |
Task 7 (Project) | Black Friday Sales Prediction | EDA, Data cleaning, Visualization, Linear Regression, Ridge Regression, DecisionTreeRegressor, RandomForestRegressor, ExtraTreesRegressor, XGBRegressor |
Task 8 (Project) | Bank Customer Churn Prediction | Feature Scaling, Logistic Regression, SCV, KNeighbor Classifier, Decision Tree Classifier, Random Forest Classifier, Gradient Boosting Classifier, XGBoost |
Objectives:
- Handling missing values
- Correct data formatting
- Standardize and normalize data
We use data wrangling to convert data from an initial format to a format that may be better for analysis.
Objectives:
- Visualize individual feature patterns
- Run descriptive statistical analysis on the dataset
- Use group and pivot tables to find the effect of categorical varaibles on price
- User Pearson Correlation to measure the independence between variables
Objectives: Develop prediction models
In this task, I'll develop several models that will predict the price of the car using the variables or features. This is just an estimate but should give us an objective idea of how much the car should cost.
-
Use Linear Regression in one variable to fit the parameters to a model
-
Use Linear Regression in multiple variables to fit the parameters to a model
-
Use Polynomial Regression in single variable to fit the parameters to a model
-
Create a pipeline for performing linear regression using multiple features in polynomial scaling
-
Evaluate the performance of different forms of regression on basis of MSE and R^2 parameters.
- Evaluate and refine prediction models
- Model Evaluation
- Cross validation
- Over-fitting, Under-fitting and Model Selection
- Ridge Regression
- Grid Search
In this lab, I'll try to refine our model's performance in predicting the price of a labtop.
- Use training, testing and cross validation to improve the performance of the dataset.
- Identify the point of overfitting of a model
- Use Ridge Regression to identify the change in performance of a model based on its hyperparameters
- Use Grid Search to identify the best performing model using different hyperparameters
Project: Black Friday Sales Prediction
Explore the dynamics of Black Friday sales with predictive modeling. From feature engineering to machine learning, explore the dynamics of one of the largest shopping events globally.Join me as we analyze customer behavior, identify key predictors, and predict sales with machine learning techniques.
Project: Customer Churn Prediction
Problem Statement : Customer churn or customer attrition is a tendency of clients or customers to abandon a brand and stop being a paying client of a particular business or organization. The percentage of customers that discontinue using a company’s services or products during a specific period is called a customer churn rate. Several bad experiences (or just one) are enough, and a customer may quit. And if a large chunk of unsatisfied customers churn at a time interval, both material losses and damage to reputation would be enormous.
Working Flow : In order to create a model these are the following procedure
- Split the dataset in 70% of Train set and 30% of Test Set
- Feature engineering
- Check the accuracy score for both Training and Test Set
- Compare the accuracies for both Training and Test set, in order to check for the overfitting issues