This repository contains a curated selection of projects that I have completed during my graduate studies. I will continue on-going work for the next couple of years to expand my portfolio. Each project listed below showcases specific skills and theoretical applications in various domains of data science and artificial intelligence. These projects demonstrate my ability to apply complex concepts to solve real-world problems and to develop predictive models that are both robust and scalable.
Please click on the Links to view the YouTube Video and/or Project.
Course: Probability and Statistics AAI-500 is an introductory course focused on probability, statistics, and Python programming, aimed at providing students with the foundational skills necessary for advanced AI studies. It covers a range of topics including random variables, probability distributions, hypothesis testing, and logistic regression, alongside practical Python applications. The course integrates case studies and real-world problem-solving, culminating in a team project that enhances students' skills in collaboration, presentation, and academic writing.
Course: Probability and Statistics
Tools Used: Python, Jupyter, Statistical Analysis Methods
This project involved statistical analysis and probability theory to understand the factors influencing heart disease mortality rates. Through rigorous data cleaning, exploratory data analysis, and the application of statistical tests, I developed a comprehensive understanding of the significant predictors of heart disease. This project emphasized the practical application of biostatistics in public health.
GitHub project
Course: Introduction to Artificial Intelligence
AAI-501 provides a comprehensive introduction to the field of Artificial Intelligence (AI), focusing on modern advancements in machine learning, deep learning, big data, and computational power. It covers essential AI concepts, techniques, and challenges across multiple domains such as Natural Language Processing (NLP), Computer Vision (CV), and more. Students will learn and apply a variety of AI methodologies, including heuristic search, genetic algorithms, Bayesian networks, and neural network models. Practical applications will be explored in areas like image processing, biomedical systems, and robotics, using Python. The course also emphasizes ethical considerations in AI development, such as fairness, trust, bias, and safety, aiming to foster skills in project management, teamwork, and leadership.
Course: Intro to AI
Tools Used: Python, Scikit-learn, Jupyter, Tensorflow, and Machine Learning Algorithms
In this project, I implemented several machine learning models to classify real estate properties into different pricing tiers based on a variety of features such as location, size, and condition. The project involved data preprocessing, feature engineering, model selection, and extensive evaluation of classifiers including, Random Forests, K-Nearest Neighbors, Gradient Boosting Machines and more. The final model was deployed to provide real-time predictions that can assist investors and buyers in making informed decisions.
Tools Used: Python, Jupyter, Scikit-learn, Random Forest, GridSearchCV
This project aimed to predict life expectancy across different US states for males and females using advanced machine learning techniques. It involved an in-depth analysis of life expectancy data from 2010-2015 and 2020, examining the impact of gender and geographic location on life expectancy rates. I performed exploratory data analysis, data preprocessing, and predictive modeling using Linear Regression and Random Forest Regressor, with hyperparameter tuning using GridSearchCV.
Tools Used: Python, Jupyter, Scikit-learn, Imbalanced-learn, Decision Tree, Random Forest
The objective of this project was to predict stroke occurrences using the "healthcare-dataset-stroke-data.csv" dataset. We developed predictive models using Decision Tree and Random Forest classifiers to identify individuals at high risk of stroke based on various health and demographic features. The project involved handling data imbalance using SMOTE and evaluating the models' performance using metrics such as accuracy, precision, recall, F1-score, and ROC AUC.
Course: Machine Learning
Tools Used: Python, Jupyter, Scikit-learn, XGBoost, CatBoost
This project aimed to tackle the issue of vehicle insurance fraud, which causes significant financial losses for insurance companies and erodes consumer trust. By leveraging historical vehicle and policy data, our objective was to develop a robust predictive model to accurately detect and prevent fraudulent claims. The implementation of this model helps insurance companies minimize financial losses, enhance the efficiency of claims processing, and maintain fair premium pricing for customers.
- Statistical Analysis
- Machine Learning
- Data Visualization
- Predictive Modeling
- Data Cleaning and Preparation
- Programming (Python)
- Feature Engineering
- Hyperparameter Tuning
- Model Evaluation
- Exploratory Data Analysis (EDA)
- Handling Missing Data
- Data Encoding
- Model Deployment
- Fraud Detection
- Handling Class Imbalance
- Model Comparison
- Bayesian Optimization
- Data Balancing (SMOTE)
- Anomaly Detection (Isolation Forest)
- Ensemble Learning (Gradient Boosting, Random Forest)
- Hyperparameter Tuning (GridSearchCV, RandomizedSearchCV, Bayesian Optimization)
Feel free to reach out if you have any questions or if you are interested in collaborating on future projects.
Outhai Xayavongsa (Ms. Thai)