This project utilizes the Lending Club dataset to perform credit score assessment. The dataset includes detailed information about loans issued by Lending Club, such as loan amount, interest rate, borrower characteristics, and loan status.
- Source: Lending Club
- Content: Loan information, borrower details, repayment status
- Format: Parquet
The primary goal is to assess the creditworthiness of borrowers using machine learning techniques in Databricks. This involves:
- Data Preprocessing: Cleaning and preparing the data for analysis.
- Exploratory Data Analysis (EDA): Understanding the data distribution and identifying key features.
- Feature Engineering: Creating new features to improve model performance.
- Modeling: Building and evaluating datasets to calculate credit scores.
- Platform: Databricks
- Languages: Python, SQL
- Libraries: PySpark
- Data Ingestion: Load the Lending Club dataset into Databricks.
- Data Cleaning: Handle missing values, outliers, and data inconsistencies.
- Feature Engineering: Create new features and transform existing ones to enhance dataset performance.
- Credit score: Calculating credit score
The project aims to develop a reliable credit scoring data of loan repayment, aiding in making informed lending decisions.