Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate CI jobs to Github actions #239

Closed
wainersm opened this issue Jul 29, 2023 · 12 comments
Closed

Migrate CI jobs to Github actions #239

wainersm opened this issue Jul 29, 2023 · 12 comments
Labels

Comments

@wainersm
Copy link
Member

As the Kata Containers project has migrated their jobs out of Jenkins and we have ours running on their Jenkins instance, it makes sense to follow that movement (unless we want to host our own Jenkins which seems not the case).

The following jobs can be more easily migrated as long as we have self-hosted Azure runners enabled on this repository:

  • tests-e2e-ubuntu-20.04-x86_64-containerd_kata-clh
  • tests-e2e-ubuntu-20.04-x86_64-containerd_kata-qemu

The jobs for SEV, SNP and s390x can be also easily migrated but then we need to provide the bare-metal machines as self-hosted runners on this repository. Currently the TDX job run behind firewall on Intel's lab, they might want to keep the job running as is.

@fidencio
Copy link
Member

At least for the Intel case, I'd say we will only take this bullet after the merge to main is completed, as then we'll be able to use the exactly same machine that's been used by Kata Containers GHA, which has a different version of the artefacts being used than the one used in Jenkins.

@fidencio
Copy link
Member

I've put some though into this, and I sincerely thing that what we should test as part of the operator is:

  • Install
  • Removal
  • Reinstall
  • Ensure user-facing options are being propagated to the runtime payloads
    • This could even be done using a fake payload that just gets whatever is passed from the CRD and prints the info
  • Update (we're not there yet)
  • Upgrade (we're not there yet)

Then, finally, having a smoke test just to make sure the payload used can run one workload. Everything else should be done on the runtime side.

/cc @stevenhorsman

@wainersm, WDYT?

@stevenhorsman
Copy link
Member

Yeah - I think that we assume that we have tested and trust the kata payload (and if that's not the case then we need to fix that in kata-containers) and then the operator testing should be focussed on the specifics that it owns/set-ups. Having the smoke test/BVT to check that the e2e basics work would give extra confidence that we've integrated the payload correctly, but repeating all the kata tests feels like overkill and currently leads to some issues where we change configurations in test jobs that we don't really want in the operator install tests.

I do wonder if we could have a test that would check for the case that the payload is out of date/we have integration issues, but we don't have a proper CI system, so this is probably out of scope.

@fitzthum
Copy link
Member

fitzthum commented Aug 1, 2023

I think it is reasonable to have the operator tests focus on operator functionality and do a basic check of the payload. Does that mean that we would only test the most recent release bundle with the operator tests?

Also if we're going to rely on the kata tests for more in-depth testing of the payload, that would seem to suggest that we will be running a KBS in the Kata tests?

@fidencio
Copy link
Member

fidencio commented Aug 4, 2023

Does that mean that we would only test the most recent release bundle with the operator tests?

That's the current situation, Tobin.
Kata Containers tests do NOT test the bundle, as the deployment done in the current Kata Containers tests is basically:

  • Build the Kata Containers tarball that will be used in the payload
  • Unpack the tarball
  • Run the tests against the unpacked contents

Whatever happens in the way we build the payload and the scripts used to deploy it, are NOT currently tested on the Kata Containers side.

With the change in the CI that we've been working on main, we DO test the payload, with similar (if not the same) options that the CoCo Operator will be using. I have a PR opened to make the CoCo Operator "dumber", in a way it'll rely more on the payload, so we can improve the tests on that area as part of the specific projects providing the payload to be consumed by the Operator, and this is covered in the following topic "Ensure user-facing options are being propagated to the runtime payloads -- This could even be done using a fake payload that just gets whatever is passed from the CRD and prints the info"

Also if we're going to rely on the kata tests for more in-depth testing of the payload, that would seem to suggest that we will be running a KBS in the Kata tests?

Don't we already have one? I'm confused here. The operator tests running Today are exactly the same as the tests running on Kata Containers, using exactly the same infrastructure. Do we have a KBS running to run those tests? I think so, and I think absolutely nothing changes in that regard.

@wainersm
Copy link
Member Author

I've put some though into this, and I sincerely thing that what we should test as part of the operator is:

* Install

* Removal

* Reinstall

Right, we already have tests suite for the operator in https://github.com/confidential-containers/operator/blob/main/tests/e2e/operator_tests.bats

* Ensure user-facing options are being propagated to the runtime payloads
  
  * This could even be done using a fake payload that just gets whatever is passed from the CRD and prints the info

Opened an issue to track that idea: #247 . Please, fix and/or complement the description.

* Update (we're not there yet)

* Upgrade (we're not there yet)

Then, finally, having a smoke test just to make sure the payload used can run one workload. Everything else should be done on the runtime side.

IMHO when we started the CoCo project we didn't draw a clear line of separation between Kata and CoCo. As a result, we have things like running the Kata tests on CoCo's operator CI; building the CoCo's runtime-payload on github actions on Kata...etc. With the new initiatives to merge main/CCv0 and migrate CI to GHA we have the opportunity to review those decisions. That said, I agree with you @fidencio, the Operator CI should test the operator features only.

Use case: As a developer, opening a pull request to confidential-containers/operator, I want the CI to run tests for the operator features so that I get assured I didn't introduce regressions on the operator.

Note: The operator CI should run on pinned components, i.e., the runtime-payload should not be drifting as the goal is to test the operator, not the integration. Unless that you are adding a feature that required the bump of the runtime-payload, in this case you update the payload on the PR itself. @stevenhorsman knows very well my opinion on this regard :)

There should exist an 2nd CI on CoCo side to really test the kata <-> CoCo integration plus the CoCo use cases where:

  • Uses latest CI runtime-payload and operator
  • Run tests that are CoCo-specific, i.e., not covered on Kata' s CI.
    • Example 1: currently on CCv0 the SEV test uses simpleKBS, not clear to me if we will port that to main and if not, this would be a good candidate to run on CoCo CI.
    • Example 2: e2e SEV test with deploying the new KBS operator

/cc @stevenhorsman

@wainersm, WDYT?

@wainersm
Copy link
Member Author

Does that mean that we would only test the most recent release bundle with the operator tests?

That's the current situation, Tobin. Kata Containers tests do NOT test the bundle, as the deployment done in the current Kata Containers tests is basically:

* Build the Kata Containers tarball that will be used in the payload

* Unpack the tarball

* Run the tests against the unpacked contents

Whatever happens in the way we build the payload and the scripts used to deploy it, are NOT currently tested on the Kata Containers side.

With the change in the CI that we've been working on main, we DO test the payload, with similar (if not the same) options that the CoCo Operator will be using. I have a PR opened to make the CoCo Operator "dumber", in a way it'll rely more on the payload, so we can improve the tests on that area as part of the specific projects providing the payload to be consumed by the Operator, and this is covered in the following topic "Ensure user-facing options are being propagated to the runtime payloads -- This could even be done using a fake payload that just gets whatever is passed from the CRD and prints the info"

Also if we're going to rely on the kata tests for more in-depth testing of the payload, that would seem to suggest that we will be running a KBS in the Kata tests?

Don't we already have one? I'm confused here. The operator tests running Today are exactly the same as the tests running on Kata Containers, using exactly the same infrastructure. Do we have a KBS running to run those tests? I think so, and I think absolutely nothing changes in that regard.

I think @fitzthum meant, for example, the SEV test on the CCv0 branch (https://github.com/kata-containers/tests/blob/CCv0/integration/kubernetes/confidential/sev.bats) that leverages the simpleKBS among other stuffs (e.g. launch measurement). The confidential test for SEV being added on main (kata-containers/kata-containers#7185) does not test with simpleKBS nor launch measurements. For me, the question is does it belong to Kata or CoCo suite of tests? Again, as I mentioned before, we should draw a line between kata and CoCo, and TEE tests with KBS belongs to kata or CoCo?

@wainersm
Copy link
Member Author

@fidencio @fitzthum @stevenhorsman one last note, I really would like to close the scope of this so that @ldoktor could be working on it with my assistance. What do you suggest? A more formal plan? Discuss on CoCo community meeting? TSC?

@wainersm
Copy link
Member Author

I just created a self-managed runner for the operator repo on our GARM setup. In other words, workflows that runs-on: az-ubuntu-2204 will be executed on an Azure VM with nested virt enabled. So we are able to perform the migration of the non-tee jobs to github actions....

@wainersm
Copy link
Member Author

hi @ldoktor , those Jenkins job to be migrated are generate via jenkins-job-builder (JJB) and their sources can be found at https://github.com/kata-containers/ci/blob/main/jobs-builder/jobs/cc.yaml#L134

stevenhorsman added a commit to stevenhorsman/cc-operator that referenced this issue Nov 29, 2023
Remove the kata-containers `CCv0` e2e tests in the operator
as these tests are on a dead branch of a dead repo
We also have had issues with them not being suitable as they
programmatically change the configuration, which isn't likely to work
in GHA, but isn't what we want to test either.

Later we should add an e2e smoke test that ensures that the kata-payload
integrated with things like nydus-snapshotter than the operator
is responsible for setting up and configuring as discussed in
confidential-containers#239

Signed-off-by: stevenhorsman <[email protected]>
stevenhorsman added a commit to stevenhorsman/cc-operator that referenced this issue Nov 29, 2023
Remove the kata-containers `CCv0` e2e tests in the operator
as these tests are on a dead branch of a dead repo
We also have had issues with them not being suitable as they
programmatically change the configuration, which isn't likely to work
in GHA, but isn't what we want to test either.

Later we should add an e2e smoke test that ensures that the kata-payload
integrated with things like nydus-snapshotter than the operator
is responsible for setting up and configuring as discussed in
confidential-containers#239

Signed-off-by: stevenhorsman <[email protected]>
stevenhorsman added a commit to stevenhorsman/cc-operator that referenced this issue Nov 30, 2023
Remove the kata-containers `CCv0` e2e tests in the operator
as these tests are on a dead branch of a dead repo
We also have had issues with them not being suitable as they
programmatically change the configuration, which isn't likely to work
in GHA, but isn't what we want to test either.

Later we should add an e2e smoke test that ensures that the kata-payload
integrated with things like nydus-snapshotter than the operator
is responsible for setting up and configuring as discussed in
confidential-containers#239

Signed-off-by: stevenhorsman <[email protected]>
stevenhorsman added a commit to stevenhorsman/cc-operator that referenced this issue Nov 30, 2023
Remove the kata-containers `CCv0` e2e tests in the operator
as these tests are on a dead branch of a dead repo
We also have had issues with them not being suitable as they
programmatically change the configuration, which isn't likely to work
in GHA, but isn't what we want to test either.

Later we should add an e2e smoke test that ensures that the kata-payload
integrated with things like nydus-snapshotter than the operator
is responsible for setting up and configuring as discussed in
confidential-containers#239

Signed-off-by: stevenhorsman <[email protected]>
Co-authored-by: Wainer dos Santos Moschetta <[email protected]>
stevenhorsman added a commit to stevenhorsman/cc-operator that referenced this issue Nov 30, 2023
Remove the kata-containers `CCv0` e2e tests in the operator
as these tests are on a dead branch of a dead repo
We also have had issues with them not being suitable as they
programmatically change the configuration, which isn't likely to work
in GHA, but isn't what we want to test either.

Later we should add an e2e smoke test that ensures that the kata-payload
integrated with things like nydus-snapshotter than the operator
is responsible for setting up and configuring as discussed in
confidential-containers#239

Collapse all the runtime classes to do the same tests for now
and then when we add the smoke test we can make that have the
smarts to handle all the platforms

Signed-off-by: stevenhorsman <[email protected]>
Co-authored-by: Wainer dos Santos Moschetta <[email protected]>
@stevenhorsman
Copy link
Member

@wainersm @ldoktor - can this issue be closed as complete now as the basic jobs have been migrated and we have #309 to track improvements to it?

@wainersm
Copy link
Member Author

Hi @stevenhorsman !

@wainersm @ldoktor - can this issue be closed as complete now as the basic jobs have been migrated and we have #309 to track improvements to it?

Yes, I think it can be closed. I've created issues related to improvements and/or new jobs to the current set of GhA-based jobs. Getting them tagged with ci-next label: https://github.com/confidential-containers/operator/issues?q=is%3Aissue+is%3Aopen+label%3Aci-next

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants