Skip to content

Latest commit

 

History

History
676 lines (460 loc) · 49 KB

learning_goals.md

File metadata and controls

676 lines (460 loc) · 49 KB

Learning Goals: Machine Learning in Production / AI Engineering (17-445/17-645/17-745/11-695)

Lecture: Introduction and Motivation

Content:

  • Lecture illustrates traditional view of machine learning and contrasts it with the challenges of building systems
  • Contrasting software engineering and data scientist roles, outline need for collaboration
  • Syllabus and class structure; introductions, and survey

Learning goals:

  • Illustrate the engineering challenges for building a production system with ML components, beyond creating the model
  • Summarize the respective goals and challenges of software engineers vs data scientists

Assignment:

  • Case study analysis of an ML product

Lecture: From Models to AI-Enabled Systems (Systems Thinking) Requirements Architecture QA Process

Overview:

  • Machine learning is typically a component of a larger system in production: AI-enabled systems consist of ML and non-ML components, developed with different processes, need to be integrated; AI is more or less dominant in those systems
  • The lack of specifications and its consequences for composition and abstraction: Contrasting ML with non-ML components; inductive vs deductive reasoning
  • System-level strategies to engineering systems from imprecise specifications and unreliable components (e.g., guardrails and other safety mechanisms)
  • Thinking in pipelines not models
  • Components of intelligent experiences and corresponding challenges (experience, intelligence, orchestration) within a larger system architecture; overview of design options and automation degrees, e.g., forcefulness of the experience
  • Qualities of interest (beyond model accuracy)

Learning goals:

  • Explain the consequences of the shift from deductive to inductive reasoning for abstraction and composition
  • Explain how machine learning fits into the larger picture of building and maintaining production systems
  • Explain the modularity implications of having machine-learning components without specifications
  • Describe the typical components relating to AI in an AI-enabled system and typical design decisions to be made

References:

Blog post/lecture notes:

Lecture: Model Quality and Unit Testing (2 lectures) Quality Assurance

Overview:

  • Traditional model accuracy measures, confusion matrix, precision/recall, ROC, …
  • Establishing baselines, comparison against heuristics approaches
  • Measuring generalization, overfitting, train/validation/test split, …
  • Setting expectations for correctness, bugs,
  • Notions of test suits and coverage for models (e.g., test by population segment), black box test case design, coverage
  • The oracle problem, metamorphic testing, fuzzing, and simulation
  • Pitfalls of data leakage
  • Automated assessment, regression testing, dashboards, continuous integration, experiment tracking (e.g., MLFlow, ModelDB)

Learning goals:

  • Select a suitable metric to evaluate prediction accuracy of a model and to compare multiple models
  • Select a suitable baseline when evaluating model accuracy
  • Explain how software testing differs from measuring prediction accuracy of a model
  • Curate validation datasets for assessing model quality, covering subpopulations as needed
  • Use invariants to check partial model properties with automated testing
  • Avoid common pitfalls in evaluating model quality
  • Select and deploy automated infrastructure to evaluate and monitor model quality

Assignment (part of project):

  • Assess model quality offline with suitable accuracy measure; establish baseline, avoid common pitfalls; automate accuracy measurement and track results with continuous integration

References:

Blog posts/lecture notes:

Lecture: Goals and Success Measures for AI-Enabled Systems Requirements

Overview:

  • Thinking about the system: System goals vs model goals
  • Business consideration for using machine learning
    • When and how AI can support system goals
    • Overall cost of operating an ML-component (e.g., data, learning, updating, inference cost)
  • Brief intro into measurement
  • Defining and measuring a systems goals

Learning goals:

  • Judge when to apply AI for a problem in a system
  • Understand that system goals may not directly relate to model accuracy
  • Define system goals and map them to goals for the AI component
  • Design and implement suitable measures and corresponding telemetry

Assignments:

  • For a case study (Smart Dashcam scenario), describe system and model goals and their relation; define concrete measures

References:

  • 🕮 Hulten, Geoff. "Building Intelligent Systems: A Guide to Machine Learning Engineering." (2018), Chapters 2 (Knowing when to use IS) and 4 (Defining the IS’s Goals)
  • 🕮 Ajay Agrawal, Joshua Gans, Avi Goldfarb. “Prediction Machines: The Simple Economics of Artificial Intelligence” 2018
  • 🗎 Bernardi, Lucas, Themistoklis Mavridis, and Pablo Estevez. "150 successful machine learning models: 6 lessons learned at Booking.com." In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1743-1751. 2019.

Lecture: Quality Assessment in Production Quality Assurance Implementation/Operations

Overview:

  • Linking models to system goals: Model accuracy vs system quality
  • Limitations of unit testing, especially for AI components
  • History of testing software in production, from beta tests to A/B testing and chaos experiments; feature flags and corresponding infrastructure
  • Design of telemetry to assess business goals, model quality, and other indicators; discussion of proxy metrics and engineering challenges
  • Introduction to monitoring infrastructure
  • Online experimentation
    • Testing in production, chaos engineering
    • A/B testing
    • Necessary statistics foundation
    • Mitigating risks of testing in production
  • Infrastructure for experimentation, planning and tracking experiments; introduction to MLOps

Learning goals:

  • Explain the limitations of unit testing and the rationale for testing in production
  • Design telemetry to assess model and system quality in production
  • Build monitoring infrastructure to collect and show telemetry data
  • Understand the rationale for beta tests and chaos experiments
  • Plan and execute experiments (chaos, A/B, shadow releases, ...) in production
  • Examine experimental results with statistical rigor
  • Support data scientists with platforms providing insights from production data

Assignment:

  • Part of group project: Design an experimentation platform to conduct A/B tests and compare results with statistical rigor

References:

Lecture: Risk and Planning for Mistakes (2 lectures) Requirements

Overview:

  • Inevitability of wrong predictions: Lack of specifications, deductive reasoning, common sources of wrong predictions
  • System-level strategies to deal with unreliable components:
    • User interface design, incl. forcefulness, undo, setting expectations, …
    • Humans in the loop, incl. avoiding complacency, deciding where and when to ask for human judgment, …
    • Safeguards outside the model: guardrails, redundancies, voting, fallback, graceful degradation, …
  • Decomposing requirements to understand problems
    • The world and the machine, explicit environment assumptions from specifications
    • Considering drift, feedback loops, adversaries
  • Introduction to risk analysis and fault trees: anticipate problems
    • Fault tree analysis, failure mode and effects analysis, hazard and interoperability study

Learning goals:

  • Describe common reasons for why ML predictions can fail
  • Analyze how mistake in an AI component can influence the behavior of a system
  • Analyze system requirements at the boundary between the machine and world, consider drift, feedback loops, and adversaries
  • Evaluate risk of a mistake from the AI component using fault trees
  • Design and justify a mitigation strategy for a concrete system

Assignment:

  • Write requirements and plan mechanisms for dealing with mistakes; set system goals and define success measures; perform risk analysis

References:

  • 🕮 Hulten, Geoff. "Building Intelligent Systems: A Guide to Machine Learning Engineering." (2018), Chapters 6--8, and 24.
  • 🗎 Kocielnik, Rafal, Saleema Amershi, and Paul N. Bennett. "Will you accept an imperfect AI? Exploring designs for adjusting end-user expectations of AI systems." In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, pp. 1-14. 2019.

Blog post/lecture notes:

Lecture: Tradeoffs among Modeling Techniques ArchitectureRequirements

Overview:

  • Survey quality attributes of interest in production ML settings (e.g., accuracy, model size, inference time, learning time, incremental learning, robustness)
  • Contrasting internals of two learning techniques: decision trees and deep learning and implications on various qualities
  • Brief survey of other classes of machine learning and brief primer on symbolic AI
  • Constraints and tradeoff analysis for selecting ML techniques in production ML settings

Learning goals:

  • Organize and prioritize the relevant qualities of concern for a given project
  • Explain they key ideas behind decision trees and random forests and analyze consequences for various qualities
  • Explain the key ideas of deep learning and the reason for high resource needs during learning and inference and the ability for incremental learning
  • Plan and execute an evaluation of the qualities of alternative AI components for a given purpose

Assignment:

  • Present tradeoff analysis among two techniques (prepare memo for broad audience); for a given dataset evaluate which technique is more suitable after measuring various qualities

References:

Lecture: Architectural Design for AI-enabled Systems Architecture

Overview:

  • Introduction to software architecture, data collection, and domain-specific modeling
  • Discussion how quality goals for the system influence system architecture of production ML systems
    • Consider latency and data volume requirements and constraints when deciding on deployment architecture
    • Consider update frequency when deciding on system design and deployment
    • Consider information needs when designing telemetry and relevant parts of the system
    • Consider privacy requirements when deciding where and when to training and deployment the system and how to collect telemetry
    • Consider system requirements when selecting modeling techniques (revisit ML tradeoff lecture)
    • Consider the design and operating costs of different alternative designs
  • Deploying inference services as microservices; model evolution
  • Composing complex systems with ML and non-ML components: case study Apollo self-driving cars
  • Architectural patterns and design patterns for ML

Learning goals:

  • Understand important quality considerations when using ML components
  • Follow a design process to explicitly reason about alternative designs and their quality tradeoffs
  • Gather data to make informed decisions about what ML technique to use and where and how to deploy it
  • Critique the decision of where an AI model lives (e.g., cloud vs edge vs hybrid), considering the relevant tradeoffs
  • Deliberate how and when to update models and how to collect telemetry
  • Create an architectural model describing the relevant characteristics to reason about update frequency and costs
  • Critique the decision of where an AI model lives (e.g., cloud vs edge vs hybrid), considering the relevant tradeoffs

Assignment:

  • Design and justify a system architecture for a given scenario, considering computing and network resources

References:

  • 🕮 Hulten, Geoff. "Building Intelligent Systems: A Guide to Machine Learning Engineering." (2018), Chapter 13 (Where Intelligence Lives)

  • 🗎 Yokoyama, Haruki. "Machine learning system architectural pattern for improving operational stability." In 2019 IEEE International Conference on Software Architecture Companion (ICSA-C), pp. 267-274. IEEE, 2019.

  • 📰 Daniel Smith. "Exploring Development Patterns in Data Science." TheoryLane Blog Post. 2017.

  • 🗎 Hazelwood, Kim, Sarah Bird, David Brooks, Soumith Chintala, Utku Diril, Dmytro Dzhulgakov, Mohamed Fawzy et al. "Applied machine learning at facebook: A datacenter infrastructure perspective." In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 620-629. IEEE, 2018.

  • 🗎 Peng, Zi, Jinqiu Yang, Tse-Hsun Chen, and Lei Ma. "A first look at the integration of machine learning models in complex autonomous driving systems: a case study on Apollo." In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1240-1250. 2020.

Lecture: Data Quality Quality Assurance

Overview:

  • Overview of complexities in data acquisition, data cleaning, and feature extraction steps, both in training and in production
  • The tradeoff between more data vs better data in machine learning and the role of random vs systematic data errors
  • Overview of common data quality problems
  • Data schema enforcement, consistency rules, and unit testing for data; tools for defining and checking schemas and constraints (e.g., databases, xml, Avro, Great Expectations, ...)
  • Using ML to detect quality problems, inconsistencies, rules; discovery of rules and probabilistic repair (e.g., HoloClean)
  • Separating different notions and sources of drift; comparing data distributions and detecting data drift; overview of solutions of handling drift in ML systems

Learning goals:

  • Describe common data cleaning steps and their purpose and risks
  • Design and implement automated quality assurance steps that check data schema conformance and distributions
  • Devise comparison strategies and thresholds for detecting drift
  • Understanding the better data vs more data tradeoffs

Assignments:

  • As part of group project: Perform basic data quality checks, at least schema enforcement

References:

(possible excursion for data debugging possible here, e.g. techniques to find influential instances or the Training Set Debugging Using Trusted Items paper)

Lecture: Infrastructure Quality, Deployment, and Operations Implementation/Operations QA

Overview:

  • Overview of common problems in ML pipelines, including “silent” problems
  • Testing all parts of the ML-pipeline; code reviews
  • Overview of robustness testing with stubs, fire drills, chaos engineering
  • Test automation with Continuous Integration tools
  • Introduction to DevOps and Continuous Deployment
    • Containers, configuration management, monitoring
    • Canary releases and rolling releases
  • Overview of MLOps

Learning goals:

  • Implement and automate tests for all parts of the ML pipeline
  • Understand testing opportunities beyond functional correctness
  • Test whether the infrastructure is robust to various kinds of problems
  • Automate test execution with continuous integration
  • Deploy a service for models using container infrastructure
  • Automate common configuration management tasks
  • Devise a monitoring strategy and suggest suitable components for implementing it
  • Diagnose common operations problems
  • Understand the typical concerns and concepts of MLOps

Assignment:

  • Part of group project: Design a pipeline to build, evaluate, and serve models that (a) performs automated tests offline, (b) enables experimentation, (c) detects and reports data quality issues and data drift, and (d) provides a monitoring dashboard and sends alerts

Reading:

(could be two lectures if going deeper into DevOps and MLOps)

Lecture: Managing and Processing Large Datasets Architecture Implementation/Operations

Overview:

  • Illustrate the need for operating at massive scale in some systems, both for learning, inference, and telemetry; need for distributed data storage and computing
  • Distributed data storage strategies and their tradeoffs
  • Common patterns for distributed data processing: batch processing, stream processing, and the lambda architecture
  • Event sourcing (immutable data) and related design tradeoffs
  • Brief introduction to challenges of distributed systems
  • Brief overview of performance analysis and planning
  • Excursion: Distributed deep learning

Learning goals:

  • Organize different data management solutions and their tradeoffs
  • Understand the scalability challenges involved in large-scale machine learning and specifically deep learning
  • Explain the tradeoffs between batch processing and stream processing and the lambda architecture
  • Recommend and justify a design and corresponding technologies for a given system
  • Outline how machine learning can be parallelized
  • Explain the challenges of distributed systems

References:

  • 🕮 Martin Kleppmann. Designing Data-Intensive Applications. O’Reilly, 2017

  • 🕮 Nathan Marz and James Warren. "Big Data: Principles and Best Practices of Scalable Realtime Data Systems." Manning, 2015.

  • 🗎 Li, Mu, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su. "Scaling distributed machine learning with the parameter server." In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 583-598. 2014.

Lecture: Process and Technical Debt Process

Overview:

  • Overview of common data science workflows (e.g., CRISP-DM)
    • Importance of iteration and experimentation
    • Role of computational notebooks in supporting data science workflows
  • Overview of software engineering processes and lifecycles: costs and benefits of process, common process models, role of iteration and experimentation
  • Contrasting data science and software engineering processes, goals and conflicts
  • Integrating data science and software engineering workflows in process model for engineering AI-enabled systems with ML and non-ML components; contrasting different kinds of AI-enabled systems with data science trajectories
  • Overview of technical debt as metaphor for process management; common sources of technical debt in AI-enabled systems

Learning goals:

  • Contrast development processes of software engineers and data scientists
  • Outline process conflicts between different roles and suggest ways to mitigate them
  • Recognize the importance of process
  • Describe common agile practices and their goals
  • Plan the process for developing AI-enabled systems following different data science trajectories
  • Understand and correctly use the metaphor of technical debt
  • Describe how ML can incur reckless and inadvertent technical debt, outline common sources of technical debt

References:

Blog post/lecture notes:

Lecture: Human AI Interaction RequirementsImplementation/Operations

Overview:

  • High-level overview of design space: automation degree, forcefulness, transparency, …
  • Overview of usability
  • Aligning mental models
  • Building trust in AI-enabled systems (transparency, setting expectations, mental models, explanations, …)
  • AI-design guidelines

Learning goals:

​ tbd.

References:

Lecture: Ethics + Fairness (3 lectures) Requirements Quality Assurance

Overview:

  • Introductions to ethics and responsible AI
    • Moral vs ethical vs legal
    • Safety concerns, broadly
    • Discrimination (harms of allocation and representation)
    • Algorithmic transparency and explainability
    • Security and privacy
    • Reproducibility and accountability
    • Amplification through feedback loops
  • Fairness concepts, legal and practical definitions
  • Common sources of bias in machine learning
  • Fairness at the model level: fairness measures (anti-classification, separation, independence), fairness testing, interventions, and their tradeoffs
  • Fairness beyond the model: requirements engineering, dataset construction, monitoring and auditing, checklists, process integration and enforcement

Learning goals:

  • Review the importance of ethical considerations in designing AI-enabled systems
  • Recall basic strategies to reason about ethical challenges
  • Diagnose potential ethical issues in a given system
  • Understand the types of harm that can be caused by ML
  • Understand the sources of bias in ML
  • Design and execute tests to check for bias/fairness issues
  • Evaluate and apply mitigation strategies
  • Consider achieving fairness in AI-based systems as an activity throughout the entire development cycle
  • Understand the role of requirements engineering in selecting ML fairness criteria
  • Understand the process of constructing datasets for fairness
  • Consider the potential impact of feedback loops on AI-based systems and need for continuous monitoring

Assignment:

  • Analyze a given component for potential bias, design a mitigation, and deploy automated tests

References:

Lecture: Transparency, Interpretability, and Explainability (2 lectures) Requirements Machine Learning

Overview:

  • Introduction to use cases, concepts, and measures for interpretability
  • Inherent interpretability of different ML models vs retrofitting explanations
  • Various approaches to provide explanations for black-box models, including local and global surrogates, feature importance, invariants, counterfactuals, prototypes, and influential instances
  • Discussion of trustworthiness of post-hoc explanations and involved tradeoffs
  • Algorithmic transparency: arguments, benefits, drawbacks, perceptions
  • Interface design for explanations and influences on human-AI interactions (e.g., building mental models; trust and too much trust)
  • Discussion on regulation and policy around responsible AI

Learning goals:

  • Understand the importance of and use cases for interpretability
  • Explain the tradeoffs between inherently interpretable models and post-hoc explanations
  • Measure interpretability of a model
  • Select and apply techniques to debug/provide explanations for data, models and model predictions
  • Evaluate when to use interpretable models rather than ex-post explanations

References:

(possibly go further in debugging here or next lecture, independent of explainability)

Lecture: Versioning, Provenance, and Reproducibility Requirements Implementation/Operations Quality Assurance

Overview:

  • Challenge of reproducing data, models, and decisions, especially in complex systems
  • Documenting and tracking data provenance (modeling), "visibility debt", techniques for automated tracking
  • Versioning of code, data, and models
  • Curbing nondeterminism in ML pipelines
  • Logging and audit traces
  • Overview of corresponding MLOps tools like DVC, ModelDB, and MLFlow

Learning goals:

  • Judge the importance of data provenance, reproducibility and explainability for a given system
  • Create documentation for data dependencies and provenance in a given system
  • Propose versioning strategies for data and models
  • Test systems for reproducibility

References:

  • 🗎 Halevy, Alon, Flip Korn, Natalya F. Noy, Christopher Olston, Neoklis Polyzotis, Sudip Roy, and Steven Euijong Whang. "Goods: Organizing google's datasets." In Proceedings of the 2016 International Conference on Management of Data, pp. 795-806. ACM, 2016.
  • 🗎 Gulzar, Muhammad Ali, Matteo Interlandi, Tyson Condie, and Miryung Kim. "Debugging big data analytics in spark with bigdebug." In Proceedings of the 2017 ACM International Conference on Management of Data, pp. 1627-1630. ACM, 2017.
  • 🕮 Hulten, Geoff. "Building Intelligent Systems: A Guide to Machine Learning Engineering." (2018), Chapter 21 (Organizing Intelligence) – 23 (Orchestration)
  • 🗎 Sugimura, Peter, and Florian Hartl. "Building a Reproducible Machine Learning Pipeline." arXiv preprint arXiv:1810.04570 (2018).

Lecture: Security and Privacy Requirements Quality Assurance Process

Overview:

  • Introduction to security: Confidentiality, integrity, and availability
  • Security and privacy at the model level
    • Attack scenarios against AI components and possible defenses
    • Poisoning attacks
    • Evasion attacks
    • Adversarial examples and model hardening
    • Model inversion attacks
    • Generative adversarial networks
    • Federated learning
  • Security and privacy at the system level
    • Requirements and risk analysis
    • Threat modeling
    • Defense strategies outside the model, including trust mechanisms
    • Designing for security, least privilege, isolation
    • Anomaly detection
  • Basics of adversarial learning techniques
  • Feedback loops and how to detect them
  • Dangers of leaking sensitive data, deanonymization, and differential privacy
  • Threat modeling
  • Overview of common security patterns/tactics
  • Anomaly detection, intrusion detection

Learning goals:

  • Explain key concerns in security (in general and with regard to ML models)
  • Analyze a system with regard to attacker goals, attack surface, attacker capabilities
  • Describe common attacks against AI component
  • Conduct threat modeling for a given system and derive security requirements
  • Suggest counter measures against attacks for specific systems, both at the model and at the system level
  • Discuss challenges in anonymizing data
  • Apply key design principles for secure system design

Reading:

Lecture: Safety and Robustness (2 lectures) Requirements Architecture/Design Quality Assurance

Overview:

  • Introduction to safety and ethics; safety vs reliability; safety beyond traditional safety-critical systems
  • Revisiting risk and requirements analysis (fault trees, FMEA, HAZOP)
  • Robustness analysis and limitations of robustness for assessing safety
  • Architectural safety tactics -- how to build safe systems from unreliable components
  • Layers of safety engineering: safe practices, safety culture, safety regulation
  • Introduction to assurance cases and software certification; evidence collection for safety claims

Learning goals:

  • Understand safety concerns in traditional and AI-enabled systems
  • Summarize the state of the art robustness analysis strategies for machine-learned models
  • Perform a hazard analysis for a system to derive safety requirements
  • Diagnose potential safety issues in a given system
  • Collect evidence and sketch an argument for a safety case
  • Design architectural safeguards against safety-relevant mistakes from AI components
  • Describe the typical processes for safety evaluations and their limitations

Assignment: (?)

  • Perform a hazard analysis of a given system, identify suitable mitigations, and sketch an argument for a safety case

References:

Lecture: Fostering Interdisciplinary Teams Process

Overview:

  • Different roles in developing AI-enabled systems and their respective goals and concerns
  • The importance of interdisciplinary teams; unicorns
  • Collaboration points in building AI-enabled systems; revisiting process considerations and data science trajectories
  • Communication costs in teams, team composition, and socio-technical congruence
  • Managing conflicting goals: team organization, agile practices, DevOps, T-shaped people
  • Overcoming groupthink: hype, diversity, culture, agile practices
  • Mitigating social loafing: responsibilities, motivation, agile practices
  • Discussion on the future of software engineering for AI-enabled systems
    • The role of ML to automate software engineering tasks
    • The role of AutoML to automate data science tasks
    • Empowering team members & responsible engineering

Learning goals:

  • Understand different roles in projects for AI-enabled systems
  • Plan development activities in an inclusive fashion for participants in different roles
  • Diagnose and address common teamwork issues
  • Describe agile techniques to address common process and communication issues

References: