Skip to content

Commit

Permalink
Add HIP for obtaining output from failed hook jobs and pods
Browse files Browse the repository at this point in the history
Signed-off-by: Ian Zink <[email protected]>
  • Loading branch information
Ian Zink committed May 11, 2023
1 parent 0158df7 commit 13c5331
Showing 1 changed file with 240 additions and 0 deletions.
240 changes: 240 additions & 0 deletions hips/hip-9999.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,240 @@
---
hip: 9999
title: "New annotations for displaying job output"
authors: [ "Ian Zink <[email protected]>" ]
created: "2023-01-26"
type: "feature"
status: "draft"
---

## Abstract

This proposes a new annotation to indicate that the output from the job should be displayed to the user.

## Motivation

The primary motivation for this HIP is the ability to display Preflight checks before the main chart installs. This provides vital feedback to the user as to why the chart install is not proceeding.

Often it is important to verify that the kubernetes cluster you are deploying a helm chart into has certain properties. You might need to know that the cluster is of a certain version to use various APIs. You might need to know that it has ingress available, a certain amount of ephemeral storage, memory, or CPUs available. You might want to validate the the service key they provided was correct or that that database they entered is reachable. Letting a chart deploy and then finding debugging to see why it failed is a poor user experience. These things can all be done with preflight checks enabled by the hook proposed in this HIP.

In general, allowing chart developers to run jobs and present that feedback directly to the users could also open up additional use cases beyond just the preflight use case that motivated this HIP. I could imagine scenarios where maybe CVE warnings are presented or specific upgrade feedback is presented instead of just helm install failure.

## Rationale

There are other ways that this could be implemented. For example, we could have a separate preflight hook type. However, this new hook type wouldn't be handled at all by previous versions of helm. With this design, it requires minimial changes to helm and allows for backwards compatibility.

Another strategy could be for helm to include Troubleshoot.sh as a dependent library, but this could result in too tight of a coupling between the projects and lower overall flexibility and adaptability.

## Specification

Templates could include the following annotations on Batch Jobs:

```yaml
"helm.sh/hook": pre-install, pre-upgrade
"helm.sh/hook-output-log-policy": hook-failed # or hook-succeeded or hook-failed,hook-succeeded
```
`helm.sh/hook-output-log-policy` would indicate that helm should display the output of the job to the user.

Additionally a new user flag should be created `--no-log-output` that would skip the output of logs.

## Backwards compatibility

The only backwards compatibility concern would be that scripts parsing `helm install` output would see some additional text in the case of logs being output. The fact that notes already make the output unstructured should mitigate any concern here. Since we already are trusting chart developers to provide output in the form of notes, this is a logical extension of that that allows the developer to provide more dynamic output.

## Security implications

Potentially the preflight checks could check for security misconfigurations that could enhance the security of the chart deployment.

## How to teach this

In the first instance, documentation plus the help text for `helm install` would explain the feature.

An example template could be provided in documentation showing how to use this feature with a generic command used in a hook.

A more advanced example showing how to use the new feature with Troubleshoot.sh to provide preflight checks could be linked in the documentation, provided directly in the documentation, or provided on the Troubleshoot.sh documentation site independently.

## Reference implementation

The `safe-install` plugin (link in references) demonstrates what running preflights could look like, but not in the fashion implemented in this HIP.

## Rejected ideas
N/A

## Open issues
N/A

## References
[Pull Request for Documentation ](https://github.com/helm/helm-www/pull/1242)

[Pull Request for Helm](https://github.com/helm/helm/pull/10309) - third most upvoted open PR

[Troubleshoot.sh](https://troubleshoot.sh/) - the tool that is the motivation for this HIP.

[safe-install plugin](https://github.com/z4ce/helm-safe-install) - Plugin that provides a similiar experience to what I hope this HIP will provide natively.

## Reference - Examples Usage

### Example using `false`

Template:
```yaml
apiVersion: batch/v1
kind: Job
metadata:
name: "{{ .Release.Name }}"-false-job
labels:
app.kubernetes.io/managed-by: {{ .Release.Service | quote }}
app.kubernetes.io/instance: {{ .Release.Name | quote }}
app.kubernetes.io/version: {{ .Chart.AppVersion }}
helm.sh/chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
annotations:
"helm.sh/hook": pre-install, pre-upgrade
"helm.sh/hook-output-log-policy": hook-failed
"helm.sh/hook-weight": "-5"
"helm.sh/hook-delete-policy": hook-succeeded, hook-failed
spec:
backoffLimit: 0
template:
metadata:
name: "{{ .Release.Name }}"
labels:
app.kubernetes.io/managed-by: {{ .Release.Service | quote }}
app.kubernetes.io/instance: {{ .Release.Name | quote }}
helm.sh/chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
spec:
restartPolicy: Never
containers:
- name: post-install-job
image: "alpine:3.3"
command: ["bash", "-c", "echo foo ; false"]
```

What it should loook when running:

```text
$ helm install ./ my-release
Logs for pod: my-release-false-job-bgbz6, container: pre-install-job
foo
Error: INSTALLATION FAILED: failed pre-install: job failed: BackoffLimitExceeded
```

### Example using Troubleshoot Preflight Checks

```yaml
apiVersion: batch/v1
kind: Job
metadata:
name: "{{ .Release.Name }}-preflight-job"
labels:
app.kubernetes.io/managed-by: {{ .Release.Service | quote }}
app.kubernetes.io/instance: {{ .Release.Name | quote }}
app.kubernetes.io/version: {{ .Chart.AppVersion }}
helm.sh/chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
annotations:
"helm.sh/hook": pre-install, pre-upgrade
"helm.sh/hook-output-log-policy": hook-failed
"helm.sh/hook-weight": "-5"
"helm.sh/hook-delete-policy": hook-succeeded, hook-failed
spec:
backoffLimit: 0
template:
metadata:
name: "{{ .Release.Name }}"
labels:
app.kubernetes.io/managed-by: {{ .Release.Service | quote }}
app.kubernetes.io/instance: {{ .Release.Name | quote }}
helm.sh/chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
spec:
restartPolicy: Never
volumes:
- name: preflights
configMap:
name: "{{ .Release.Name }}-preflight-config"
containers:
- name: post-install-job
image: "replicated/preflight:latest"
command: ["preflight", "--interactive=false", "/preflights/preflights.yaml"]
volumeMounts:
- name: preflights
mountPath: /preflights
---
apiVersion: v1
kind: ConfigMap
metadata:
annotations:
"helm.sh/hook": pre-install, pre-upgrade
"helm.sh/hook-weight": "-6"
"helm.sh/hook-delete-policy": hook-succeeded, hook-failed
labels:
app.kubernetes.io/managed-by: {{ .Release.Service | quote }}
app.kubernetes.io/instance: {{ .Release.Name | quote }}
app.kubernetes.io/version: {{ .Chart.AppVersion }}
helm.sh/chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
name: "{{ .Release.Name }}-preflight-config"
data:
preflights.yaml: |
apiVersion: troubleshoot.sh/v1beta2
kind: Preflight
metadata:
name: preflight-tutorial
spec:
collectors:
{{ if eq .Values.mariadb.enabled false }}
- mysql:
collectorName: mysql
uri: '{{ .Values.externalDatabase.user }}:{{ .Values.externalDatabase.password }}@tcp({{ .Values.externalDatabase.host }}:{{ .Values.externalDatabase.port }})/{{ .Values.externalDatabase.database }}?tls=false'
{{ end }}
analyzers:
- clusterVersion:
outcomes:
- fail:
when: "< 1.16.0"
message: The application requires at least Kubernetes 1.16.0, and recommends 1.18.0.
uri: https://kubernetes.io
- warn:
when: "< 1.18.0"
message: Your cluster meets the minimum version of Kubernetes, but we recommend you update to 1.18.0 or later.
uri: https://kubernetes.io
- pass:
message: Your cluster meets the recommended and required versions of Kubernetes.
{{ if eq .Values.mariadb.enabled false }}
- mysql:
checkName: Must be MySQL 8.x or later
collectorName: mysql
outcomes:
- fail:
when: connected == false
message: Cannot connect to MySQL server
- fail:
when: version < 8.x
message: The MySQL server must be at least version 8
- pass:
message: The MySQL server is ready
{{ end }}
```

What it should loook when running:

```text
$ helm install ./ my-release
Logs for pod: my-release-preflight-job-bgbz6, container: pre-install-job
--- FAIL: Required Kubernetes Version
--- The application requires at least Kubernetes 1.16.0, and recommends 1.18.0.
--- FAIL: Must be MySQL 8.x or later
--- Cannot connect to MySQL server
--- FAIL preflight-tutorial
FAILED
name: cluster-resources status: completed completed: 1 total: 3
name: mysql/mysql status: running completed: 1 total: 3
name: mysql/mysql status: completed completed: 2 total: 3
name: cluster-info status: running completed: 2 total: 3
name: cluster-info status: completed completed: 3 total: 3
Error: INSTALLATION FAILED: failed pre-install: job failed: BackoffLimitExceeded
```

0 comments on commit 13c5331

Please sign in to comment.