Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TECH ED] Write an incident report #336

Open
aishaathmanlali opened this issue Jun 8, 2024 · 0 comments
Open

[TECH ED] Write an incident report #336

aishaathmanlali opened this issue Jun 8, 2024 · 0 comments

Comments

@aishaathmanlali
Copy link
Owner

From Module-Cloud created by SallyMcGrath: CodeYourFuture/Module-Cloud#30

Link to the coursework

https://www.atlassian.com/incident-management/postmortem/templates#incident-summary

Why are we doing this?

Write up a problem or outage you have handled during this module, using the template provided:

Summary

This section should provide a high-level overview of the incident, including a brief description of what happened, the severity level, the affected services or components, and the impact on customers or users.

Timeline

This section should chronologically document the key events and actions taken during the incident, from the initial detection to the final resolution. It should include timestamps for each event and the person or team responsible for each action.

Root cause

This section should analyze and identify the underlying cause(s) of the incident. It should provide a detailed explanation of what went wrong, including any contributing factors or related issues.

Resolution and recovery

This section should describe the steps taken to resolve the incident and restore normal operations. It should include details about any workarounds or temporary solutions implemented, as well as the final fix or permanent resolution.

Corrective and preventive measures

This section should outline the actions that will be taken to prevent similar incidents from occurring in the future. It should include both short-term corrective measures and long-term preventive measures, such as process improvements, system upgrades, or training initiatives.

Maximum time in hours

3

How to get help

Use GenAI to evaluate and improve your incident report. Use this prompt to help you:

Act as a straightforward senior DevOps engineer with lots of experience in technical communication. Evaluate my incident report and tell me how I can improve it. Don't rewrite my report, but give me examples and corrections piece by piece. Please be honest and kind.

How to submit

  1. Write up your incident report as a feature on your portfolio website
  2. Share the link in Slack.

It must be on your portfolio website and not in a message or doc, because the purpose of this exercise is to demonstrate this valuable skill to employers.

How to review

Share your links in Slack and ask a colleague to review. Make sure to review someone else's report too. What can you learn from theirs?

@aishaathmanlali aishaathmanlali moved this to 📋 Backlog in CLOUD Jun 8, 2024
@aishaathmanlali aishaathmanlali moved this from 📋 Backlog to 🔖 Ready in CLOUD Jun 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🔖 Ready
Development

No branches or pull requests

1 participant