Skip to content

I used lending data to create machine learning models that classify the risk level of given loans. Specifically, I compared the performance of the Logistic Regression model and the Random Forest Classifier.

Notifications You must be signed in to change notification settings

madinalikes/supervised-machine-learning-challenge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Supervised Machine Learning Objectives


  • Explain how machine learning algorithms are used in data analytics.
  • Create training and testing sets from a specified dataset.
  • Implement linear and logistic regressions by using scikit-learn.
  • Create confusion matrixes for classification outputs.
  • Calculate and apply fundamental classification algorithms: logistic regression, support vector machine (SVM), and k-nearest neighbors (KNN).
  • Quantify and evaluate classification models by using confusion matrixes.
  • Implement one-hot encoding in Pandas, and scaling and normalization with scikit-learn.
  • Calculate and apply bagging and boosting methods to create and use ensemble algorithms.
  • Describe regularization parameters for regressions, and select appropriate parameters for a given problem.
  • Use Random Forests and LASSO regressions to assist in the feature selection process.

Predicting Credit Risk


In this assignment, you will be building a machine learning model that attempts to predict whether a loan will be approved or not.

Background


Lending services companies allow individual investors to partially fund personal loans as well as buy and sell notes backing the loans on a secondary market. This data will be used to

You will be using this data to create machine learning models to classify the risk level of given loans. Specifically, you will be comparing the Logistic Regression model and Random Forest Classifier.

Instructions


Retrieve the data

The data is located in the Resources folder.

lending_data.csv Import the data using Pandas.

Consider the models


You will be creating and comparing two models on this data: a logistic regression, and a random forests classifier. Before you create, fit, and score the models, make a prediction as to which model you think will perform better. You do not need to be correct! Write down (in markdown cells in your Jupyter Notebook or in a separate document) your prediction, and provide justification for your educated guess.

Fit a LogisticRegression model and RandomForestClassifier model


Create a LogisticRegression model, fit it to the data, and print the model's score. Do the same for a RandomForestClassifier. You may choose any starting hyperparameters you like. Which model performed better? How does that compare to your prediction? Write down your results and thoughts.

About

I used lending data to create machine learning models that classify the risk level of given loans. Specifically, I compared the performance of the Logistic Regression model and the Random Forest Classifier.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published