This project aims to predict the likelihood of bankruptcy for companies in Poland using historical financial data. The project utilizes machine learning models to assess a company’s financial health based on various financial ratios and features. The goal is to provide insights that can help companies and stakeholders identify financial risks and take preventive measures.
- Objective: Build a predictive model that assesses a company’s bankruptcy risk using historical financial data.
- Data Source: Dataset collected by Polish economists, containing financial information of companies over several years.
- Tools Used: Python, Pandas, scikit-learn, matplotlib, seaborn, and Linux command line.
- Techniques: Logistic regression, resampling techniques, decision trees, classification metrics, and data wrangling.
-
Data Collection:
- Accessed a publicly available dataset from Polish economists studying corporate bankruptcy.
- Loaded and processed the data using the Linux command line and Python.
-
Data Preprocessing:
- Performed extensive data cleaning using Pandas, addressing missing values and outliers.
- Handled imbalanced data using oversampling techniques to ensure that the model could accurately predict minority cases of bankruptcy.
- Conducted exploratory data analysis (EDA) to identify key financial ratios and features that influence bankruptcy risk.
-
Modeling:
- Built multiple classification models, including logistic regression and decision tree classifiers, to predict bankruptcy risk.
- Fine-tuned model hyperparameters using grid search and cross-validation to optimize performance.
- Evaluated models using classification metrics such as accuracy, precision, recall, and F1-score to determine the best-performing model.
-
Prediction:
- Deployed the best-performing model to predict the likelihood of bankruptcy for new companies.
- Visualized prediction results to highlight the most critical factors influencing bankruptcy.
- The logistic regression model performed well in predicting bankruptcy, with high precision and recall for the positive (bankruptcy) class.
- The model was able to capture significant financial trends and patterns that indicate a company’s potential risk of bankruptcy.
- The use of resampling techniques helped in effectively handling the imbalanced nature of the dataset.
- Financial Analysis: Gained deeper insights into how financial ratios and key indicators can affect a company's bankruptcy risk.
- Data Imbalance: Successfully handled imbalanced datasets using resampling techniques to ensure that the minority class (bankruptcy cases) was adequately represented in the model.
- Machine Learning: Improved skills in building and optimizing classification models for real-world financial problems, with a focus on logistic regression and decision trees.
- Incorporate Time-Series Data: Add historical data trends and time-series analysis to improve the model’s ability to predict long-term bankruptcy risks.
- Advanced Ensemble Models: Experiment with advanced ensemble models such as Random Forest, Gradient Boosting, or XGBoost to improve prediction accuracy.
- Interactive Dashboard: Develop a user-friendly dashboard that allows businesses to input their financial data and receive real-time bankruptcy risk predictions.
-
Clone this repository:
git clone https://github.com/nisha2k21/bankruptcy-prediction-poland.git
-
Navigate to the project directory:
cd bankruptcy-prediction-poland
-
Install the required dependencies:
pip install -r requirements.txt
-
Run the model to make predictions:
python src/bankruptcy_prediction.py
├── data
│ └── bankruptcy_data.csv # Raw data file
├── notebooks
│ └── EDA_and_Modeling.ipynb # Jupyter notebook for analysis and model building
├── src
│ └── bankruptcy_prediction.py # Python script for running the model
├── requirements.txt # List of required dependencies
└── README.md # Project documentation
- Programming Language: Python
- Libraries: Pandas, NumPy, scikit-learn, matplotlib, seaborn
- Machine Learning: Logistic regression, Decision tree classifiers, Resampling techniques
- Model Evaluation: Precision, Recall, F1-score, Accuracy
- Platform: Linux command line for data handling
For any questions, feedback, or collaboration opportunities, feel free to reach out:
- Email: [email protected]
- LinkedIn: Nisha Kumari