New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Scripts out Deployment Validation #670

Closed

iamkirkbater wants to merge 1 commit into openshift:master from iamkirkbater:cluster-testing

Contributor

iamkirkbater commented Feb 9, 2022 •

edited

Loading

Currently tests both non-ccs and ccs cluster creation using OCM.

This script assumes you're already logged into whatever OCM environment you are attempting to test.

This script also assumes you have a valid set of credentials for an IAM User: osdCCSAdmin in your ~/.aws/credentials file (you pass the profile name in as the parameter).

Example:

$ cat ~/.aws/credentials
[osd-staging-2-ccsadmin]
aws_access_key_id = myaccesskeyid
aws_secret_access_key_id = mysecretaccesskeyid

Given the above file, you would call the script as such: ./hack/scripts/validate_deployment.sh -p osd-staging-2-ccsadmin

This script runs the creation in parallel, allowing you to run the script and just wait for the output to finish. The script also only waits for the cluster to move from pending to installing as once the cluster is installing the work of AWS Account Operator is complete.

Future work: Script out STS creation.

Contributor

openshift-ci bot commented Feb 9, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: iamkirkbater

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [iamkirkbater]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci bot requested review from dastergon and NautiluX

February 9, 2022 22:37

openshift-ci bot added the approved label

codecov-commenter commented Feb 9, 2022

Codecov Report

Merging #670 (0edc607) into master (2afa2ae) will decrease coverage by 0.09%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master     #670      +/-   ##
==========================================
- Coverage   31.64%   31.55%   -0.10%     
==========================================
  Files          46       47       +1     
  Lines        4601     4649      +48     
==========================================
+ Hits         1456     1467      +11     
- Misses       3047     3084      +37     
  Partials       98       98

Impacted Files	Coverage Δ
pkg/controller/account/byoc.go	`69.54% <0.00%> (-0.72%)`	⬇️
pkg/totalaccountwatcher/totalaccountwatcher.go	`61.95% <0.00%> (-0.69%)`	⬇️
pkg/controller/account/account_controller.go	`13.66% <0.00%> (-0.15%)`	⬇️
...ountaccess/awsfederatedaccountaccess_controller.go	`27.60% <0.00%> (-0.13%)`	⬇️
...controller/accountclaim/accountclaim_controller.go	`20.41% <0.00%> (-0.07%)`	⬇️
cmd/manager/main.go	`0.00% <0.00%> (ø)`
pkg/controller/account/iam.go	`65.18% <0.00%> (ø)`
...er/awsfederatedrole/awsfederatedrole_controller.go	`0.00% <0.00%> (ø)`
config/config.go	`47.82% <0.00%> (ø)`

macgregor suggested changes

View reviewed changes

Contributor

macgregor left a comment

looks pretty good so far. few questions/asks

hack/scripts/validate_deployment.sh Show resolved Hide resolved

hack/scripts/validate_deployment.sh Outdated Show resolved Hide resolved

hack/scripts/validate_deployment.sh


		install_wait_loop $CLUSTER_ID $CREATE_TIMEOUT $CHECK_INTERVAL

		ocm delete cluster $CLUSTER_ID

Contributor

macgregor Feb 10, 2022

how paranoid do we want to be about ensuring tests clean up after themselves? You may want to consider using a trap (https://www.linuxjournal.com/content/bash-trap-command) to ensure clean up happens even in script failure scenarios. That may require more engineering than is necessary though.

Contributor Author

iamkirkbater Feb 14, 2022

Yeah, I was thinking about that, but then like you mentioned the engineering into that started to yak-shave.

Like, I could add the trap, but then I'd need to know whether or not the clusters are initiating already or not and then clean it up. And then catch it in the subprocesses, and ignore it in the delete wait loop, so I just opted out of doing it at all.

It might be worth adding a simple trap though that just lets you know that depending on the state of each thread you might have to manually clean something up.


          Adds post-deployment validation

fad759b

This script adds the ability to quickly validate an AWS Account Operator
deployment to various environments using OCM, to validate the same steps
a customer would take to request a cluster, but without using the UI or
having to wait for a full cluster install.

This script takes a multi-threaded approach in order to run the various
scenarios in parallel.

iamkirkbater force-pushed the cluster-testing branch from 00fad3c to fad759b Compare

February 14, 2022 17:05

georgettica reviewed

View reviewed changes

hack/scripts/validate_deployment.sh

+                    fi
+                    # get the cluster ID from the OCM output
+                    local CLUSTER_ID=$(echo $OCM_CREATE | grep '^ID:' | awk '{print $2}')

georgettica May 1, 2022

created an ocm-cli issue to make this less janky openshift-online/ocm-cli#383

hack/scripts/validate_deployment.sh

+                    ACCOUNT=$(aws sts get-caller-identity | jq -r .Account)
+                    echo -e "$PENDING AWS Account $ACCOUNT info collected."
+                    OCM_CREATE=$(ocm create cluster --region us-east-1 --ccs --aws-access-key-id $AWS_ACCESS_KEY_ID --aws-secret-access-key $AWS_SECRET_ACCESS_KEY --aws-account-id $ACCOUNT $CLUSTER_NAME)

georgettica May 1, 2022

that is a very nice way to create a CCS cluster. I hope there is something similar to GCP as well @rafael-azevedo

hack/scripts/validate_deployment.sh

+                  TIMEOUT=$(( TIMEOUT - CHECK_INTERVAL ))
+                  local STATUS=$(ocm get /api/clusters_mgmt/v1/clusters/$CLUSTER_ID/status | jq -r .state)
+                  if [[ $STATUS == "installing" ]]; then
+                    echo -e "$SUCCESS $TEST_NAME has passed. Beginning Cleanup..."

georgettica May 1, 2022

maybe split this into 2:
so the echo here will just note the test has passed

and the begin cleanup will be higher in the stack?

Contributor

openshift-ci bot commented Jun 15, 2022

@iamkirkbater: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Contributor

dastergon commented Jul 11, 2022

@iamkirkbater is there anything blocking this PR?

openshift-bot commented Oct 10, 2022

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-ci bot added the lifecycle/stale label

openshift-bot commented Nov 9, 2022

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-ci bot added lifecycle/rotten and removed lifecycle/stale labels

openshift-bot commented Dec 10, 2022

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci bot closed this

Contributor

openshift-ci bot commented Dec 10, 2022

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lifecycle/rotten