DonorsChoose Funding Success Prediction

This project contains a machine learning pipeline for predicting the funding success of DonorsChoose projects, with a particular focus on the impact of poverty levels.

Project Overview

DonorsChoose is a platform where teachers can request resources for their classrooms. This project aims to predict whether a project will be successfully funded based on various features, including poverty levels of the school districts.

Project Structure

run_pipeline.py: Main script to run the entire pipeline
config.json: Configuration file for dataset paths and feature selection
data/: Directory containing all raw datasets
preprocessing/: Directory containing preprocessing scripts
- feature_selection.py: Script for merging datasets and selecting features
- data_cleaning.py: Script for cleaning and preprocessing the data
- data_segment_and_balance.py: Script for segmenting and balancing the data
features/: Directory for feature engineering scripts
- feature_engineering.py: Script for feature engineering
split/: Directory for data splitting scripts
- train_test_split.py: Script for splitting data into training and testing sets
model/: Directory for machine learning models, including training, validation, and selection scripts
- model_training.py: Script for training the model
- model_evaluation.py: Script for evaluating the model
- feature_importance.py: Script for determining feature importance
- recommendation.py: Script for generating recommendations
outputs/: Directory where processed datasets are saved
figures/: Directory for saving generated figures and plots
notebooks/: Directory for Jupyter notebooks used for generating graphs and as a playground for experimentation

Getting Started

Clone this repository to your local machine.
Ensure you have Python installed (preferably Python 3.7+).
Install the required dependencies:
```
pip install pandas numpy
```
Place your raw DonorsChoose datasets in the data/ directory (Make sure it match the name in config.json).
Review and update the config.json file if necessary.

Running the Pipeline

You can run the entire pipeline using the run_pipeline.py script:

python run_pipeline.py

This will execute the following steps:

Merge datasets and select features
Clean and preprocess the data
Merge datasets and select features
Perform feature engineering
Clean and preprocess the data
Split data into training and testing sets
Segment and balance the data
Train the machine learning model
Evaluate the model
Determine feature importance
Generate recommendations

Alternatively, you can run the scripts individually in the following order:

python preprocessing/feature_selection.py
python features/feature_engineering.py
python preprocessing/data_cleaning.py
python split/train_test_split.py
python preprocessing/data_segment_and_balance.py
python model/model_training.py
python model/model_evaluation.py
python model/feature_importance.py
python model/recommendation.py

Configuration

The config.json file contains important settings for the pipeline:

raw_datasets: Paths to the input CSV files (donations, essays, projects, resources, outcomes)
dataset: Path for the output cleaned dataset
features_to_use: List of features to select from the merged dataset
one_hot_encode_features: List of categorical features to one-hot encode
models: List of models to use
projects_imputation: Methods for imputing missing values in the projects dataset
poverty_columns: Mapping of poverty levels
split_by_poverty: Whether to split data by poverty level
test_splits: Configuration for test splits
poverty_level_replacements: Replacements for poverty levels
quant_variables: List of quantitative variables
stem_cols: List of STEM-related columns

Output

The pipeline generates several types of output files in the outputs/ directory, including:

CSV files: Containing datasets at various stages of processing (e.g., selected features, cleaned data, training and testing sets, model outputs)
PKL files: Serialized model objects

Additionally, figures and plots generated during the analysis are saved in the figures/ directory.

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
.idea		.idea
project		project
.DS_Store		.DS_Store
ML Plan.pdf		ML Plan.pdf
Project_proposal.pdf		Project_proposal.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DonorsChoose Funding Success Prediction

Project Overview

Project Structure

Getting Started

Running the Pipeline

Configuration

Output

About

Releases

Packages

Contributors 3

Languages

luciafang/94889-DonorsChoose

Folders and files

Latest commit

History

Repository files navigation

DonorsChoose Funding Success Prediction

Project Overview

Project Structure

Getting Started

Running the Pipeline

Configuration

Output

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages