** Open Project Report for Details
link for dataset: https://www.kaggle.com/wendykan/lending-club-loan-data
Introduction:
For this project, we worked on deciding the core algorithm to analyze the lending loan risk by classifying available loan data to categorize it as good loans and bad loans. We have considered 5 statistical machine learning models to get the analysis along with their accuracy and efficiency towards the available Lending Loan Club dataset. We worked on Decision trees Classification, KNN Classification, Logistic Regression, and Random Forest Classification. After observing the overall performance of these 4 models, for this instance, we have selected Random Forest Classification as the core algorithm for the system. Further, we worked on the 5th algorithm SVM, which is not covered in the class and tried to fine-tune it to get better outcome. We then compared all algorithms with each other and selected Random Forest Classifier as the core algorithm.
The Goal:
Investment in loan lending business is financially risky without a proper system to analyze the possibility of the existing loans being a good loan or bad loans. The investors should check the historic as well as current statistics of the borrower and deduce the result to invest more money towards improving bad loans or maintaining good loans. For this herculean task, we are proposing this model based on the historic and currently available data to find out the maximum possibility of existing loans becoming a good loan or a bad loan for investors.