-
Notifications
You must be signed in to change notification settings - Fork 1
Postmortem: Build Breakage on 2016 11 08
Status: final
Owners: chinmay
Description: Travis reported failures on builds
Component: flutter repository
Date/time: 2016-11-07 21:30
Duration: 16h 45m
User impact: Flutter team members were unable to merge new PRs. Users would have been unable to run flutter tests if they upgraded, though we did not receive complaints during the outage.
A change to package:args is committed (591f9c) that introduces a bug whereby run()
no longer returns the value returned by the command.
15:11 The change to package:args is merged into the args repository.
16:27 Dart package:args tag 0.13.6+1 is cut -- and shortly after is pushed to pub <START OF OUTAGE>
21:36 ianh reports that Travis is upset and all PRs are failing
2016-11-08
07:52 danrubel reports that Travis is still failing
11:03 chinmaygarde reports he’s facing the same breakage in his pending PR
11:07 Issue is reproduced locally. chinmaygarde, jsimmons and danrubel begin looking for the root cause of the breakage.
13:08 Root cause of outage identified as a new version of package:args that Flutter picked up whereby run()
no longer returns the value returned by the command (so we couldn’t get accurate exit codes).
13:19 Flutter PR #6765 sent to pin Flutter to a known good version of package:args
13:42 fb3bf7a identified as root cause of the internal breakage.
14:15 Fix lands. <END OF OUTAGE>
A bug was introduced in package:args that was picked up by Flutter. Flutter was vulnerable to this bug because our external dependencies have open-ended version constraints, so the stability of our codebase is not hermetic. This was an intentional choice; we have experienced this failure mode previously, and have been running on the basis that we are not yet stable enough to deal with the costs of being hermetic.
Action Item | Owner | Tracking bug | Notes |
---|---|---|---|
Pin our external Dart dependencies to specific versions to ensure that our public stability is hermetic. | chinmay | #6767 |
Action Item | Owner | Tracking bug | Notes |
---|---|---|---|
We should have a continuous monitoring bot that tries to run all our tests | ianh | #6777 |
None.
None.
Action Item | Owner | Tracking bug | Notes |
---|---|---|---|
Update our package:args dependency to a known good version | danrubel | PR #6575 | Done |
Deploy a forward-rolling bot that goes red if our dependencies release a breaking change, and otherwise updates us to the latest versions of everything. | ianh | #4696 |
- Once the Flutter team had a clear set of owners for the issue, it was root-caused and resolved quickly.
- The outage did not break users. It likely would have if we had a larger userbase.
- There were indications of the breakage as early as 2016/11/07 21:30, yet the team didn’t start looking into it in earnest until 2016/11/08 11:00. Once we get to the point where our build is hermetic (so we control our own stability) and we separate production artifacts from development artifacts (e.g., have a release branch), then we should consider providing an SLA, at which time we’d have to create processes around how to maintain that SLA.
- Home of the Wiki
- Roadmap
- API Reference (stable)
- API Reference (master)
- Glossary
- Contributor Guide
- Chat on Discord
- Code of Conduct
- Issue triage reports
- Our Values
- Tree hygiene
- Issue hygiene and Triage
- Style guide for Flutter repo
- Project teams
- Contributor access
- What should I work on?
- Running and writing tests
- Release process
- Rolling Dart
- Manual Engine Roll with Breaking Commits
- Updating Material Design Fonts & Icons
- Postmortems
- Setting up the Framework development environment
- The Framework architecture
- The flutter tool
- API Docs code block generation
- Running examples
- Using the Dart analyzer
- The flutter run variants
- Test coverage for package:flutter
- Writing a golden-file test for package:flutter
- Setting up the Engine development environment
- Compiling the engine
- Debugging the engine
- Using Sanitizers with the Flutter Engine
- Testing the engine
- The Engine architecture
- Flutter's modes
- Engine disk footprint
- Comparing AOT Snapshot Sizes
- Custom Flutter engine embedders
- Custom Flutter Engine Embedding in AOT Mode
- Flutter engine operation in AOT Mode
- Engine-specific Service Protocol extensions
- Crashes
- Supporting legacy platforms
- Metal on iOS FAQ
- Engine Clang Tidy Linter
- Why we have a separate engine repo
- Reduce Flutter engine size with MLGO
- Setting up the Plugins development environment
- Setting up the Packages development environment
- Plugins and Packages repository structure
- Plugin Tests
- Contributing to Plugins and Packages
- Releasing a Plugin or Package
- Unexpected Plugins and Packages failures