Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update nightly release testing github action #1549

Merged
merged 1 commit into from
Dec 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 46 additions & 52 deletions .github/workflows/nightly_release_testing.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,16 +35,9 @@ jobs:
KITCHEN_TESTER_USERNAME: ${{ secrets.KITCHEN_TESTER_USERNAME }}
ORG_MEMBER_TOKEN: ${{ secrets.ORG_MEMBER_PROD_TOKEN }}
ORG_MEMBER_USERNAME: ${{ secrets.ORG_MEMBER_USERNAME }}
run: pytest --level release tests -k "not cluster" --detached
run: pytest --level release tests -k "not clustertest and not ondemand and not multinode"
timeout-minutes: 60

- name: Teardown all clusters
if: always()
run: |
sky status
sky down --all -y
sky status

cluster-tests:
runs-on: ubuntu-latest
permissions:
Expand Down Expand Up @@ -74,17 +67,10 @@ jobs:
KITCHEN_TESTER_USERNAME: ${{ secrets.KITCHEN_TESTER_USERNAME }}
ORG_MEMBER_TOKEN: ${{ secrets.ORG_MEMBER_PROD_TOKEN }}
ORG_MEMBER_USERNAME: ${{ secrets.ORG_MEMBER_USERNAME }}
run: pytest --level release tests -k "clustertest and not ondemand" --detached
run: pytest --level release tests -k "clustertest and not ondemand and not multinode"
timeout-minutes: 60

- name: Teardown all cluster-tests clusters
if: always()
run: |
sky status
sky down --all -y
sky status

ondemand-aws-tests:
ondemand-aws-tests-local-launcher:
runs-on: ubuntu-latest
permissions:
id-token: write
Expand Down Expand Up @@ -116,17 +102,10 @@ jobs:
# running with on-demand cluster that not using docker image, because the latter causing nightly tests in CI to
# run for a very long time (does not happen locally).
# TODO: [JL / SB]: check how we could make CI run with docker on-demand cluster
run: pytest --level release tests -k "ondemand_aws_https_cluster_with_auth" --detached
run: pytest --level release tests -k "ondemand_aws_https_cluster_with_auth"
timeout-minutes: 60

- name: Teardown all ondemand-aws-tests clusters
if: always()
run: |
sky status
sky down --all -y
sky status

ondemand-aws-multinode-tests:
ondemand-aws-tests-den-launcher:
runs-on: ubuntu-latest
permissions:
id-token: write
Expand Down Expand Up @@ -155,16 +134,43 @@ jobs:
KITCHEN_TESTER_USERNAME: ${{ secrets.KITCHEN_TESTER_USERNAME }}
ORG_MEMBER_TOKEN: ${{ secrets.ORG_MEMBER_PROD_TOKEN }}
ORG_MEMBER_USERNAME: ${{ secrets.ORG_MEMBER_USERNAME }}
run: pytest --level release tests -k "TestMultiNodeCluster" --detached
# running with on-demand cluster that not using docker image, because the latter causing nightly tests in CI to
# run for a very long time (does not happen locally).
# TODO: [JL / SB]: check how we could make CI run with docker on-demand cluster
run: pytest --level release tests -k "den_launched_ondemand_aws_docker_cluster"
timeout-minutes: 60

- name: Teardown all ondemand-aws-multinode clusters
if: always()
run: |
sky status
sky down --all -y
sky status
ondemand-aws-multinode-tests:
runs-on: ubuntu-latest
permissions:
id-token: write
contents: read
steps:
- name: Check out repository code
uses: actions/checkout@v3

- name: Setup Release Testing
uses: ./.github/workflows/setup_release_testing
with:
AWS_OSS_ROLE_ARN: ${{ secrets.AWS_OSS_ROLE_ARN }}
DEV_AWS_ACCESS_KEY: ${{ secrets.DEV_AWS_ACCESS_KEY }}
DEV_AWS_SECRET_KEY: ${{ secrets.DEV_AWS_SECRET_KEY }}
KUBECONFIG: ${{ secrets.KUBECONFIG }}
GCP_SERVICE_ACCOUNT_KEY: ${{ secrets.GCP_SERVICE_ACCOUNT_KEY }}
GCP_PROJECT_ID: ${{ secrets.GCP_PROJECT_ID }}
DEN_TESTER_TOKEN: ${{ secrets.DEN_TESTER_PROD_TOKEN }}
DEN_TESTER_USERNAME: ${{ secrets.DEN_TESTER_USERNAME }}
API_SERVER_URL: ${{ env.API_SERVER_URL }}
EKS_ARN: ${{ secrets.EKS_ARN }}

- name: Run on-demand aws tests
env:
KITCHEN_TESTER_TOKEN: ${{ secrets.KITCHEN_TESTER_PROD_TOKEN }}
KITCHEN_TESTER_USERNAME: ${{ secrets.KITCHEN_TESTER_USERNAME }}
ORG_MEMBER_TOKEN: ${{ secrets.ORG_MEMBER_PROD_TOKEN }}
ORG_MEMBER_USERNAME: ${{ secrets.ORG_MEMBER_USERNAME }}
run: pytest --level release tests -k "multinode"
timeout-minutes: 60

ondemand-gcp-tests:
runs-on: ubuntu-latest
Expand Down Expand Up @@ -195,16 +201,9 @@ jobs:
KITCHEN_TESTER_USERNAME: ${{ secrets.KITCHEN_TESTER_USERNAME }}
ORG_MEMBER_TOKEN: ${{ secrets.ORG_MEMBER_PROD_TOKEN }}
ORG_MEMBER_USERNAME: ${{ secrets.ORG_MEMBER_USERNAME }}
run: pytest --level release tests -k "ondemand_gcp_cluster" --detached
run: pytest --level release tests -k "ondemand_gcp_cluster"
timeout-minutes: 60

- name: Teardown all ondemand-gcp-tests clusters
if: always()
run: |
sky status
sky down --all -y
sky status

kubernetes-tests:
runs-on: ubuntu-latest
permissions:
Expand Down Expand Up @@ -234,22 +233,17 @@ jobs:
KITCHEN_TESTER_USERNAME: ${{ secrets.KITCHEN_TESTER_USERNAME }}
ORG_MEMBER_TOKEN: ${{ secrets.ORG_MEMBER_PROD_TOKEN }}
ORG_MEMBER_USERNAME: ${{ secrets.ORG_MEMBER_USERNAME }}
run: pytest --level release tests -k "ondemand_k8s_cluster" --detached
run: pytest --level release tests -k "ondemand_k8s_cluster"
timeout-minutes: 60

- name: Teardown all kubernetes-tests clusters
if: always()
run: |
sky status
sky down --all -y
sky status

# making sure that the clusters were terminated after the test runs.
check-cluster-status:
if: always()
needs:
- not-cluster-tests
- cluster-tests
- ondemand-aws-tests
- ondemand-aws-tests-local-launcher
- ondemand-aws-tests-den-launcher
- ondemand-gcp-tests
- kubernetes-tests
- ondemand-aws-multinode-tests
Expand All @@ -276,7 +270,7 @@ jobs:
EKS_ARN: ${{ secrets.EKS_ARN }}

- name: Wait to check cluster status
run: sleep 600 # 10 minutes
run: sleep 300 # 5 minutes

- name: Check cluster status
run: sky status
run: runhouse cluster list
2 changes: 1 addition & 1 deletion .github/workflows/setup_release_testing/action.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ runs:

- name: Install python packages & dependencies
run: |
pip install runhouse[aws,gcp,kubernetes]
pip install git+https://github.com/run-house/runhouse.git@main#egg=runhouse[aws,gcp,kubernetes]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think it is better to run it on the latest OSS main. Running it on the latest release won't reveal any new bugs, if such exist.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this step below ./.github/workflows/setup_runhouse does install runhouse rom the GH workspace / main, maybe this was more so for making sure the dependencies are also installed? I think this change is still fine though

pip install -r tests/requirements.txt
shell: bash

Expand Down
4 changes: 3 additions & 1 deletion tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,7 @@ def pytest_addoption(parser):

def pytest_generate_tests(metafunc):
level = metafunc.config.getoption("level")

level_fixtures = getattr(
metafunc.cls or metafunc.module, level.upper(), default_fixtures[level]
)
Expand Down Expand Up @@ -368,6 +369,7 @@ def event_loop():
"ondemand_aws_https_cluster_with_auth",
"multinode_cpu_docker_conda_cluster",
"static_cpu_pwd_cluster",
"static_gpu_pwd_cluster_den_launcher", # for testing cluster status on single-node gpu.
"multinode_gpu_cluster", # for testing cluster status on multinode gpu.
]
],
}
Loading
Loading