• Developed multiple ML models to identify if a patient was at risk for heart disease.
• Preprocessed the unbalanced raw CDC dataset with over 300k observations and 18 features using Python.
• Used Data Visualization packages such as Seaborn to describe the data and select target features.
• Trained and compared a decision tree, random forest, KNN, and logistic regression models.
• Utilized ROC curves to identify the most powerful model, achieving 80% accuracy on real-life data.