Contributors: Joe Gyorda (G'24), Ben Levesque ('24), Chao Wang (Professor of Engineering, Arizona State University), Scott Pauls (DIFUSE PI, Professor of Mathematics), and Laura Ray (DIFUSE PI, Professor of Engineering), Taylor Hickey ('23, Project Manager)
This module was developed through the DIFUSE project at Dartmouth College and funded by the National Science Foundation award IUSE-1917002.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. |
---|
The primary objective for this module is to reinforce the statistical concepts introduced in class through the process of building a data analysis pipeline. Statistical concepts are first implemented to gain an understanding of the data, and this understanding is then used to implement machine learning models. The module is split into two assignments: the first part has students work with the dataset to gain insight into its variables and underlying relationships (exploratory data analysis), and the second part has students implement supervised machine learning models to generate predictions that may inform the decision making of airlines. The machine learning assignment provides additional introductions to the underlying concepts and focuses on interpreting the results rather than coding the models by providing function wrappers to expedite the process.
- Apply the methods of exploratory data analysis and draw conclusions about the relationships in the data.
- Differentiate between between regression and classification and the conclusions that can be drawn from each.
- Create and interpret data visualizations to evaluate models and answer questions.
- Apply these concepts and techniques to a real-world dataset, interpret and effectively communicate the results.
The module focuses on airline data and applications to decision making in the airline industry. In the first part of the module, students engage in exploratory data analysis using univariate and bivariate analyses to generate hypotheses about factors that contribute to flight delays. In the second part, students assume the role of a consultant employed by Phoenix Sky Harbor airport to investigate the role of delays in the performance of airlines. At the end of Part 2, students make an informed recommendation for Sky Harbor based on the output of various machine learning models they implement.
The module uses airline data concerning flight delays and their causes.
The module uses MATLAB livescripts.
Use this page to get an idea of the timeline of the module, what components are involved, and what documents are related to each component. This is the schedule intended for module deployment by the DIFUSE team, though instructors are welcome to modify the timeline to fit their course environment.
Date | In/Out of Class | Assignment Description | Assignment Files (Linked to Repository Contents) |
---|---|---|---|
1 | In-class | Introduce Module, have students start Part 1 MATLAB activity, complete for homework | Part 1, Statistical Measures |
1+ | Out of class | Students are expected to collaborate in groups to develop a presentation that will be uploaded to Canvas. | Canvas Quiz administered by professor |
2 | In-class | Introduce Part 2 of Module, have students start Part 2 MATLAB activity in groups of 3-4 | Part 2, ML Techniques |
2+ | Out of class | Students submit their finished presentation slidedecks from Part 2 to instructor for review. | Part 2, Sample Presentations |
This module was designed as part of ASU's Random Signal Analysis. The module also assumes students have covered some basic probability (e.g., normal distributions), statistical (e.g., hypothesis testing) concepts in their class, and that students have a basic familiarity with MATLAB.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. |
---|
For instructors and interested parties, the history of this repository (with detailed commits), can be found here.