Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: change headless service to gRPC and expose 9094 TCP #494

Merged
merged 25 commits into from
Apr 9, 2024
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
dcc7660
change alertmanager-svc-headless from http to grpc port
humblebundledore Feb 16, 2023
d8bdc70
edit CHANGELOG.md
humblebundledore Feb 16, 2023
a001e49
expose 9094 TCP UDP for gossip cluster
humblebundledore Feb 22, 2023
b0df50e
Merge branch 'master' into master
humblebundledore Mar 15, 2023
e483cd5
support configuration of gossip cluster port
humblebundledore Mar 15, 2023
84c1331
Merge branch 'master' into master
humblebundledore Mar 21, 2023
05b3403
WIP: configure alertmanager HA cluster mode for sts
nschad May 12, 2023
d29022e
Merge branch 'cortexproject:master' into master
humblebundledore Jun 28, 2023
8d22e3a
Merge remote-tracking branch 'upstream/feature/alertmanager-cluster'
humblebundledore Jun 28, 2023
1d830da
configure alertmanager cluster peers as comma seperated list
humblebundledore Jul 3, 2023
e909803
clarify values.yaml about cluster enable by default
humblebundledore Jul 4, 2023
b9efaee
replicaset should not set -alertmanager-cluster-peers when cluster is…
humblebundledore Jul 4, 2023
87e6eb4
fix wrong name for grpc targetPort
humblebundledore Jul 26, 2023
62c1222
re-introduce alertmanager-dep.yaml
humblebundledore Jul 26, 2023
5d9c403
Merge branch 'master' into alertmanager/expose-9094
locmai Apr 1, 2024
047bd29
docs: update READEME.md
locmai Apr 3, 2024
3236b34
update CHANGELOG.md
locmai Apr 3, 2024
bd4b07f
update CHANGELOG.md
locmai Apr 3, 2024
970ea28
update CHANGELOG.md
locmai Apr 5, 2024
355af4d
update CHANGELOG.md
locmai Apr 5, 2024
ce177c3
update CHANGELOG.md
locmai Apr 5, 2024
2f421f2
update CHANGELOG.md
locmai Apr 5, 2024
d75baf1
add Alertmanager scope
locmai Apr 8, 2024
56238b6
add http-metrics back
locmai Apr 8, 2024
a4f1c17
update CHANGELOG.md
locmai Apr 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

## master / unreleased

* [CHANGE] Replace `http-metrics` port with `grpc` port #494
nschad marked this conversation as resolved.
Show resolved Hide resolved
* [CHANGE] Expose 9094 TCP and UDP for gossip cluster - #494
nschad marked this conversation as resolved.
Show resolved Hide resolved
* If the AlertManager headless service existed prior to applying the change, it will have only one port set, which is a known issue. See [kubernetes/kubernetes#39188](https://github.com/kubernetes/kubernetes/issues/39188). Re-creating the headless service can resolve this issue

## 2.2.0 / 2024-01-16

* [CHANGE] Removed `config.storage.engine` and any reference of it #488
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,7 @@ Kubernetes: `^1.19.0-0`
| compactor.​terminationGracePeriodSeconds | int | `240` | |
| compactor.​tolerations | list | `[]` | |
| compactor.​topologySpreadConstraints | list | `[]` | |
| config.​alertmanager.​cluster | object | `{"listen_address":"0.0.0.0:9094"}` | Disable alertmanager gossip cluster by setting empty listen_address to empty string |
| config.​alertmanager.​enable_api | bool | `false` | Enable the experimental alertmanager config api. |
| config.​alertmanager.​external_url | string | `"/api/prom/alertmanager"` | |
| config.​api.​prometheus_http_prefix | string | `"/prometheus"` | |
Expand Down
2 changes: 1 addition & 1 deletion ci/test-deployment-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ runtimeconfigmap:
annotations:
foo: bar
alertmanager:
replicas: 1
replicas: 3
statefulSet:
enabled: false
extraVolumes:
Expand Down
2 changes: 1 addition & 1 deletion ci/test-sts-values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ runtimeconfigmap:
annotations:
foo: bar
alertmanager:
replicas: 1
replicas: 3
statefulSet:
enabled: true
extraVolumes:
Expand Down
20 changes: 20 additions & 0 deletions templates/alertmanager/alertmanager-statefulset.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
{{- $svcClusterAddress := ((.Values.config.alertmanager.cluster).listen_address) | default "0.0.0.0:9094" }}
{{- $svcClusterPort := (split ":" $svcClusterAddress)._1 }}
{{- if .Values.alertmanager.enabled -}}
{{- if .Values.alertmanager.statefulSet.enabled -}}
apiVersion: apps/v1
Expand Down Expand Up @@ -152,6 +154,15 @@ spec:
args:
- "-target=alertmanager"
- "-config.file=/etc/cortex/cortex.yaml"
{{- if and (gt (int .Values.alertmanager.replicas) 1) (ne .Values.config.alertmanager.cluster.listen_address "") }}
{{- $fullName := include "cortex.alertmanagerFullname" . }}
{{- $peers := list }}
{{- range $i := until (int .Values.alertmanager.replicas) }}
{{- $peer := printf "%s-%d.%s-headless.%s.svc.cluster.local:%s" $fullName $i $fullName $.Release.Namespace $svcClusterPort }}
{{- $peers = append $peers $peer }}
{{- end }}
- "-alertmanager.cluster.peers={{ join "," $peers }}"
{{- end }}
{{- range $key, $value := .Values.alertmanager.extraArgs }}
- "-{{ $key }}={{ $value }}"
{{- end }}
Expand All @@ -175,6 +186,15 @@ spec:
- name: gossip
containerPort: {{ .Values.config.memberlist.bind_port }}
protocol: TCP
- name: grpc
containerPort: {{ .Values.config.server.grpc_listen_port }}
protocol: TCP
- containerPort: {{ $svcClusterPort }}
name: alert-clu-tcp
protocol: TCP
- containerPort: {{ $svcClusterPort }}
name: alert-clu-udp
protocol: UDP
startupProbe:
{{- toYaml .Values.alertmanager.startupProbe | nindent 12 }}
livenessProbe:
Expand Down
16 changes: 11 additions & 5 deletions templates/alertmanager/alertmanager-svc-headless.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
{{- if .Values.alertmanager.enabled -}}
{{- if .Values.alertmanager.statefulSet.enabled -}}
nschad marked this conversation as resolved.
Show resolved Hide resolved
apiVersion: v1
kind: Service
metadata:
Expand All @@ -17,11 +16,18 @@ spec:
clusterIP: None
publishNotReadyAddresses: true
ports:
- port: {{ .Values.config.server.http_listen_port }}
- port: {{ .Values.config.server.grpc_listen_port }}
protocol: TCP
name: http-metrics
targetPort: http-metrics
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should inform the user about this breaking chance in CHANGELOG.md

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the CHANGELOG.md to make this clearer. Or do you think we should add a better indication that this may break if users were using the http-metrics port for their own usage? Similar to the old PR we have here #169

And by breaking - do you mean we should make it to v3 (too much I think?) or a line in the CHANGELOG should be good enough?

Copy link
Collaborator

@nschad nschad Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. I don't have a strong opinion about it. Technically it should be a major release because it will break if somebody depends on it (somehow).

And to be honest so #169 should have been.

CC: @kd7lxl Do you have an opinion?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just not touch the http-metrics port, so that there is no breaking change. Adding gRPC has nothing to do with the http-metrics port.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good. Let me update that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

name: grpc
targetPort: grpc
- port: 9094
protocol: UDP
name: alert-clu-udp
targetPort: alert-clu-udp
- port: 9094
protocol: TCP
name: alert-clu-tcp
targetPort: alert-clu-tcp
nschad marked this conversation as resolved.
Show resolved Hide resolved
selector:
{{- include "cortex.alertmanagerSelectorLabels" . | nindent 4 }}
{{- end -}}
{{- end -}}
4 changes: 4 additions & 0 deletions values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,10 @@ config:
runtime_config:
file: /etc/cortex-runtime-config/runtime_config.yaml
alertmanager:
# -- Enable alertmanager gossip cluster
# -- Disable alertmanager gossip cluster by setting empty listen_address to empty string
cluster:
listen_address: '0.0.0.0:9094'
# -- Enable the experimental alertmanager config api.
enable_api: false
external_url: '/api/prom/alertmanager'
Expand Down
Loading