Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert cs deployment to helm chart #922

Merged
merged 3 commits into from
Dec 13, 2024
Merged

Conversation

tony-schndr
Copy link
Collaborator

@tony-schndr tony-schndr commented Dec 3, 2024

What this PR does

Fixes https://issues.redhat.com/browse/ARO-11949, https://issues.redhat.com/browse/ARO-12801, & https://issues.redhat.com/browse/ARO-13258
Jira:
Link to demo recording:

Special notes for your reviewer

@tony-schndr tony-schndr force-pushed the cluster-service-helm-chart branch 2 times, most recently from 090ef00 to bc89279 Compare December 4, 2024 14:36
Copy link

github-actions bot commented Dec 4, 2024

Please rebase pull request.

@tony-schndr tony-schndr force-pushed the cluster-service-helm-chart branch 2 times, most recently from 6c9d7e3 to a7643b7 Compare December 4, 2024 18:49
Copy link

github-actions bot commented Dec 9, 2024

Please rebase pull request.

@tony-schndr tony-schndr force-pushed the cluster-service-helm-chart branch from a7643b7 to 15c3c9c Compare December 9, 2024 16:31
@tony-schndr tony-schndr force-pushed the cluster-service-helm-chart branch 7 times, most recently from 88cef58 to 0fb42dc Compare December 9, 2024 22:11
@tony-schndr tony-schndr marked this pull request as ready for review December 9, 2024 22:18
@tony-schndr tony-schndr force-pushed the cluster-service-helm-chart branch from 2f2a25e to fbff667 Compare December 10, 2024 14:03
@tony-schndr tony-schndr force-pushed the cluster-service-helm-chart branch 3 times, most recently from 40a0dcf to 9f209f5 Compare December 10, 2024 15:26
@tony-schndr tony-schndr force-pushed the cluster-service-helm-chart branch from 9f209f5 to 86c3852 Compare December 10, 2024 15:58
azureOperatorsManagedIdentitiesConfig: |
controlPlaneOperatorsIdentities:
cloud-controller-manager:
minOpenShiftVersion: 4.17
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this differ from Env to Env? Or will this see progressive updates? Just asking, cause this large key might make the config harder to read then (especially when thinking about later rollouts with more envs).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will differ from env to env and it may see updates if CS needs to add an identity.. I don't like it but I was having a hard time templatizing this object and making it work nice with config.mk. The problem I hit was that templatize was not creating a valid multi line string in the config.mk. To work around this I treat it as a string and base64 encode it so that it becomes a single line in config.mk and decode it in the helm template.


DEVOPS_MSI_ID ?= {{ .aroDevopsMsiId }}

# MGMT CLUSTER KVs
MGMT_RESOURCEGROUP ?= {{ .mgmt.rg }}
CX_SECRETS_KV_NAME ?= {{ .cxKeyVault.name }}
CX_MI_KV_NAME ?= {{ .msiKeyVault.name }}

AZURE_OPERATORS_MANAGED_IDENTITIES_CONFIG ?= {{ .clusterService.azureOperatorsManagedIdentitiesConfig | b64enc }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure how well this will translate to EV2 env vars.
but let's not block this PR because of that. we can revisite later when we deploy CS to INT.

btw. this has nothing to do with the translation to helm. we would have faced the same issue with oc templates.

Copy link
Collaborator

@geoberle geoberle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
let's tackle the EV2 related comments next sprint. this chart works nicely on DEV

databasePort: "5432"

# The name of the managed identities data plane audience resource.
managedIdentitiesDataPlaneAudienceResource: "https://dummy.org"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One constraint we've to take onto account regarding this field's value is that we cannot check it in code for higher envs (stage/prod). So when we get to those envs, please make sure that this is not checked in. @tony-schndr

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should the values for those environments be?

Copy link
Collaborator

@machi1990 machi1990 Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know yet, but whatever the value of them, they shouldn't be checked in any repo.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be specific,
The dev and int envs value can be any dummy https url; this can be checked in the repo.
For stage and prod envs, these will be real value of the msi dataplane. They can't be checked in any repo.

Comment on lines +1 to +4
# TODO: This parameter isn't currently used, but kept to avoid failures in the
# execution of saasherder. It will be removed once the version of the service
# that doesn't use it is deployed to all environments.
debugPort: ""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this used?

Comment on lines +18 to +25
# Duration since cluster creation after which the first notification for stale cluster should be sent.
firstStaleClusterNotification: "24h" # 1 day

# Duration since cluster creation after which the second notification for stale cluster should be sent.
secondStaleClusterNotification: "600h" # 25 days

# Duration after which a stale cluster can be cleaned up.
staleClusterAutocleanupWindow: "720h" # 30 days
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These don't apply to aro-hcp.
They can be deleted.

Comment on lines +77 to +82
# Period between executions of day-1 machine pool migration worker. Useful time units are "m" or "h".
machinePoolMigrationWorkerPeriod: "1h"

# The name of the DNS base domain for creating a user defined domains.
# Note: this is defaulted to a commercial value. This should be overriden in fedramp app-interface
userDefinedDnsBaseDomain: "i1.devshift.org"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two values don't apply in aro-hcp

Comment on lines +153 to +162
- --gateway-url={{ .Values.gatewayURL }}
- --client-id=@/secrets/service/client.id
- --client-secret=@/secrets/service/client.secret
- --client-scopes={{ .Values.clientScopes }}
- --user-defined-dns-base-domain={{ .Values.userDefinedDnsBaseDomain }}
- --jwks-url={{ .Values.jwksUrl }}
- --jwks-file=/configs/authentication/jwks.json
- --acl-file=/configs/authentication/acl.yml
- --token-url={{ .Values.tokenUrl }}
- --insecure={{ .Values.insecure }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These flags are perhaps not needed for aro-hcp.

We can clean up later.

- serve
- --log-level={{ .Values.logLevel }}
- --namespace={{ .Release.Namespace }}
- --runtime-mode={{ .Values.runtimeMode }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could hardcode the mode here as we are in the context of aro-hcp.

Suggested change
- --runtime-mode={{ .Values.runtimeMode }}
- --runtime-mode=aro-hcp

Comment on lines +144 to +145
- --default-expiration={{ .Values.defaultExpiration }}
- --maximum-expiration={{ .Values.maximumExpiration }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- --default-expiration={{ .Values.defaultExpiration }}
- --maximum-expiration={{ .Values.maximumExpiration }}

Probably not applicable for aro-hcp, but we can cleanup later on.

- --backplane-url={{ .Values.backplaneURL }}
- --provision-shards-config=/secrets/shards/config
- --proxy-config-file=/configs/proxy/config.yaml
- --aws-sts-policy-directory=/configs/policies
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flag isn't applicable for aro-hcp.
We can clean up later;
At the moment if we don't define it, CS starts normally but we'll get a

2024-12-12T15:00:50Z ERROR loader/aws_sts_policy_loader.go:50 [opid='dc2b7851-64e2-4869-b48a-d4a21c5a407a'] Got error while trying to load policies: open : no such file or directory
gitlab.cee.redhat.com/service/uhc-clusters-service/pkg/aws/loader.(*AWSPolicyLoader).run
	/cs/pkg/aws/loader/aws_sts_policy_loader.go:50
gitlab.cee.redhat.com/service/uhc-clusters-service/pkg/aws/loader.(*AWSPolicyLoader).Start
	cs/pkg/aws/loader/aws_sts_policy_loader.go:43
gitlab.cee.redhat.com/service/uhc-clusters-service/cmd/clusters-service/servecmd.runServ

error message at start time, which can be confusing.

Copy link
Collaborator

@machi1990 machi1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left cleanup comments that can be addressed later on.
Many of the flags may not directly apply to aro-hcp.

clientScopes: "openid"

# Environment associated with this instance.
environment: "aro-hcp-dev"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geoberle @tony-schndr A note on the environment name.
It shouldn't be longer than 10 chars.
Otherwise, it breaks the cluster's base domain prefix limit.

For example "aro-hcp-dev" is 11 chars, this means that the cluster's base domain prefix can't be longer than 14 chars while CS allows for it to be <= 15 chars.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also worth noting that, this field is immutable and it is used in multiple other places that are critical to CS.
To change it, it implies re-creating the env with a new name.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will address this in a followup. We can enforce the length in the config json schema.

@tony-schndr tony-schndr merged commit ea5be9f into main Dec 13, 2024
9 checks passed
@tony-schndr tony-schndr deleted the cluster-service-helm-chart branch December 13, 2024 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants