Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scripts out Deployment Validation #670

Closed
wants to merge 1 commit into from

Conversation

iamkirkbater
Copy link
Contributor

@iamkirkbater iamkirkbater commented Feb 9, 2022

Currently tests both non-ccs and ccs cluster creation using OCM.

This script assumes you're already logged into whatever OCM environment you are attempting to test.

This script also assumes you have a valid set of credentials for an IAM User: osdCCSAdmin in your ~/.aws/credentials file (you pass the profile name in as the parameter).

Example:

$ cat ~/.aws/credentials
[osd-staging-2-ccsadmin]
aws_access_key_id = myaccesskeyid
aws_secret_access_key_id = mysecretaccesskeyid

Given the above file, you would call the script as such: ./hack/scripts/validate_deployment.sh -p osd-staging-2-ccsadmin

This script runs the creation in parallel, allowing you to run the script and just wait for the output to finish. The script also only waits for the cluster to move from pending to installing as once the cluster is installing the work of AWS Account Operator is complete.

Future work: Script out STS creation.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 9, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: iamkirkbater

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot requested review from dastergon and NautiluX February 9, 2022 22:37
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 9, 2022
@codecov-commenter
Copy link

Codecov Report

Merging #670 (0edc607) into master (2afa2ae) will decrease coverage by 0.09%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #670      +/-   ##
==========================================
- Coverage   31.64%   31.55%   -0.10%     
==========================================
  Files          46       47       +1     
  Lines        4601     4649      +48     
==========================================
+ Hits         1456     1467      +11     
- Misses       3047     3084      +37     
  Partials       98       98              
Impacted Files Coverage Δ
pkg/controller/account/byoc.go 69.54% <0.00%> (-0.72%) ⬇️
pkg/totalaccountwatcher/totalaccountwatcher.go 61.95% <0.00%> (-0.69%) ⬇️
pkg/controller/account/account_controller.go 13.66% <0.00%> (-0.15%) ⬇️
...ountaccess/awsfederatedaccountaccess_controller.go 27.60% <0.00%> (-0.13%) ⬇️
...controller/accountclaim/accountclaim_controller.go 20.41% <0.00%> (-0.07%) ⬇️
cmd/manager/main.go 0.00% <0.00%> (ø)
pkg/controller/account/iam.go 65.18% <0.00%> (ø)
...er/awsfederatedrole/awsfederatedrole_controller.go 0.00% <0.00%> (ø)
config/config.go 47.82% <0.00%> (ø)

Copy link
Contributor

@macgregor macgregor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks pretty good so far. few questions/asks

hack/scripts/validate_deployment.sh Show resolved Hide resolved
hack/scripts/validate_deployment.sh Outdated Show resolved Hide resolved

install_wait_loop $CLUSTER_ID $CREATE_TIMEOUT $CHECK_INTERVAL

ocm delete cluster $CLUSTER_ID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how paranoid do we want to be about ensuring tests clean up after themselves? You may want to consider using a trap (https://www.linuxjournal.com/content/bash-trap-command) to ensure clean up happens even in script failure scenarios. That may require more engineering than is necessary though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking about that, but then like you mentioned the engineering into that started to yak-shave.

Like, I could add the trap, but then I'd need to know whether or not the clusters are initiating already or not and then clean it up. And then catch it in the subprocesses, and ignore it in the delete wait loop, so I just opted out of doing it at all.

It might be worth adding a simple trap though that just lets you know that depending on the state of each thread you might have to manually clean something up.

This script adds the ability to quickly validate an AWS Account Operator
deployment to various environments using OCM, to validate the same steps
a customer would take to request a cluster, but without using the UI or
having to wait for a full cluster install.

This script takes a multi-threaded approach in order to run the various
scenarios in parallel.
fi

# get the cluster ID from the OCM output
local CLUSTER_ID=$(echo $OCM_CREATE | grep '^ID:' | awk '{print $2}')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created an ocm-cli issue to make this less janky openshift-online/ocm-cli#383

ACCOUNT=$(aws sts get-caller-identity | jq -r .Account)
echo -e "$PENDING AWS Account $ACCOUNT info collected."

OCM_CREATE=$(ocm create cluster --region us-east-1 --ccs --aws-access-key-id $AWS_ACCESS_KEY_ID --aws-secret-access-key $AWS_SECRET_ACCESS_KEY --aws-account-id $ACCOUNT $CLUSTER_NAME)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is a very nice way to create a CCS cluster. I hope there is something similar to GCP as well @rafael-azevedo

TIMEOUT=$(( TIMEOUT - CHECK_INTERVAL ))
local STATUS=$(ocm get /api/clusters_mgmt/v1/clusters/$CLUSTER_ID/status | jq -r .state)
if [[ $STATUS == "installing" ]]; then
echo -e "$SUCCESS $TEST_NAME has passed. Beginning Cleanup..."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe split this into 2:
so the echo here will just note the test has passed

and the begin cleanup will be higher in the stack?

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 15, 2022

@iamkirkbater: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@dastergon
Copy link
Contributor

@iamkirkbater is there anything blocking this PR?

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 10, 2022
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 9, 2022
@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Dec 10, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 10, 2022

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants