In this project, we undertake an extensive statistical analysis to predict student performance within Portuguese secondary education. Utilizing a dataset comprising 33 variables across 1,044 observations, we begin with a thorough descriptive analysis to understand the distribution of key predictors and their relationship with student grades. Acknowledging the limitations of traditional linear approaches, we juxtapose a linear regression model against a Poisson model, ultimately selecting the former due to its robustness as evidenced by well-behaved residuals and consistent bootstrapped confidence intervals.
We further explore a binary discrete choice model to delineate factors that significantly contribute to the likelihood of passing a course, revealing distinctions between Mathematics and Portuguese in terms of influential predictors. To refine our predictive capabilities, we employ binomial models alongside random forests, achieving a predictive accuracy of 87% for Mathematics and 93% for Portuguese. Our methodology emphasizes the importance of model selection and iterative validation, ensuring the reliability of our findings and demonstrating the efficacy of integrating classical statistical methods with advanced machine learning techniques for educational predictions.