Skip to content

Commit

Permalink
Updating Incident Response guide
Browse files Browse the repository at this point in the history
  • Loading branch information
alxhrck committed Jul 2, 2024
1 parent 6f56a86 commit 031984c
Show file tree
Hide file tree
Showing 7 changed files with 538 additions and 559 deletions.
4 changes: 0 additions & 4 deletions _articles/definition-of-done.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,3 @@ Key items as part of an acceptance thread:
- Scrum Master
- Anyone tagged as a reviewer in the ticket
- A UX team member (if designs were referenced in the ticket)

Once verified, the reviewer can accept the thread:
- Move the JIRA ticket to Done
- Link to the acceptance thread in the JIRA ticket (preferably in the comment for the "move to done" step)
335 changes: 214 additions & 121 deletions _articles/incident-response-checklist.md

Large diffs are not rendered by default.

316 changes: 316 additions & 0 deletions _articles/incident-response-guide.md

Large diffs are not rendered by default.

12 changes: 6 additions & 6 deletions _articles/logs-schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,18 +8,18 @@ subcategory: "Data Warehouse"

This is a guide to provide the schema definitions for the four log tables in our Data Warehouse:

- `logs.events`
- `logs.production`
- `logs.unextracted_events`
- `logs.unextracted_production`
- `events.logs`
- `production.logs`
- `unextracted_events.logs`
- `unextracted_production.logs`

## logs.production

The `logs.production` table contains the following fields:

- `cloudwatch_timestamp`
- `message`
- `uuid`
- `uuid` (primary key)
- `method`
- `path`
- `format`
Expand All @@ -42,7 +42,7 @@ The `logs.events` table contains the following fields:

- `cloudwatch_timestamp`
- `message`
- `id`
- `id` (primary key)
- `name`
- `time`
- `visitor_id`
Expand Down
4 changes: 2 additions & 2 deletions _articles/platform-oncall-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ emergency contact list and other private information
* **Appropriately respond to alerts** - Assess an alert's impact to end users and service providers and judge severity, acting as Incident Response reporter/Situation Lead if appropriate
* **Check production (`prod`) environment** - Review systems and logs for indicators of issues which are not yet monitored, or unexpected behaviors
* **Alert `@login-appdev-oncall` if production may be impacted** - Make sure they are aware anytime things are going poorly in production
* **Initiate Incident Response (IR) process** - Act as Situation Lead/Incident Commander following the [Security Incident Response Guide]({% link _articles/secops-incident-response-guide.md %})
* **Initiate Incident Response (IR) process** - Act as Situation Lead/Incident Commander following the [Security Incident Response Guide]({% link _articles/incident-response-guide.md %})
* **Monitor Channels** - Keep an eye on [`#login-events`](https://gsa-tts.slack.com/archives/C42TZ3K5H) for problems requiring response or investigation
* **Review any open PRs that have been sitting over 48 hours in [`identity-devops`](https://github.com/18F/identity-devops/pulls), [`identity-terraform`](https://github.com/18F/identity-terraform/pulls), [`identity-base-image`](https://github.com/18F/identity-base-image/pulls), or [`identity-cookbooks`](https://github.com/18F/identity-cookbooks/pulls)**
* **Ensure clean handoff of ongoing issues** - Review and update as is appropriate in the [LG Platform - Interrupts board](https://github.com/orgs/18F/projects/34)
Expand Down Expand Up @@ -222,7 +222,7 @@ Before joining the Primary/Secondary On-Call rotation schedules for the Platform
* Comfortable navigating APM and Infrastructure areas in NewRelic
* Comfortable reviewing logs in AWS CloudWatch and/or with `tail-cw` SSM command
* Shadowed full set of deploys: `dev`, `int`, `staging`, `dm`, and `prod` application deployments, and other platform code (**Deployment** rotation)
* Reviewed [Security Incident Response Guide]({% link _articles/secops-incident-response-guide.md %})
* Reviewed [Security Incident Response Guide]({% link _articles/incident-response-guide.md %})
* Reviewed [past postmortems](https://drive.google.com/drive/folders/1ZdroGfCbGmeUPuCqiR8BetUhEXRfk4ui)
* Joined [`#login-situation`](https://gsa-tts.slack.com/archives/C5QUGUANN) channel
* Participated in at least one bi-weekly Contingency Plan Training Wargames session
Expand Down
Loading

0 comments on commit 031984c

Please sign in to comment.