Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI adjustments #3535

Merged
merged 2 commits into from
Oct 23, 2024
Merged

CI adjustments #3535

merged 2 commits into from
Oct 23, 2024

Conversation

apostasie
Copy link
Contributor

@apostasie apostasie commented Oct 14, 2024

NOTE: this is pending #3572, #3573, #3574, #3575

This is the next part of the test rework effort.

The most important change here

Is that we will now have two separate steps for every integration test run:

  • without retry (this is new) - running all tests that are deemed safe ONCE, and hard failing if one of them fails
  • with retry (this is the previous behavior) - running any other test

Note that the above split does NOT apply to ipv6, nor kube, tests. These are considered safe.

Right now, the balance is about 286 (rootful, no retry) vs. 500 (with retry).
Hopefully, as we fix and migrate more tests, and address bugs in nerdctl, it will shift.

Note that running legacy tests without retries is currently very painful and does surface a lot of conditions that have been previously unseen / ignored for a long time. As a lot of bugs have been recently fixed with regards to concurrency, underlying issues are also starting to crop up, and these very likely run deep / are involved to fix.
These are chiefly:

This is the reason why we are - conservatively - leaving the pre-existing, legacy tests in the "with retries" bucket.

There is now a helper script, test-integration.sh.

When ran without any argument, if will run both steps in order:

If you want to run ONLY the first step (aka "safe tests"), call:
test-integration.sh -test.only-flaky=false

If you want to run ONLY the second step, call:
test-integration.sh -test.only-flaky=true

That works as well with gotestsum, or go test, evidently:

Note though that the default in that case is -test.only-flaky=false:

go test ./cmd/nerdctl/image
go test ./cmd/nerdctl/image -test.only-flaky
# etc...

Other CI changes

  • cosmetic efforts on test display titles (more compact, more consistent)
  • removal of all nick/retries
  • minor tweaks to the Dockerfile to adapt for new script

@apostasie apostasie changed the title Test rework, part 5 [WIP] Test rework, part 5 Oct 14, 2024
.github/workflows/lint.yml Outdated Show resolved Hide resolved
.github/workflows/test.yml Outdated Show resolved Hide resolved
.github/workflows/test.yml Outdated Show resolved Hide resolved
pkg/netutil/netutil.go Outdated Show resolved Hide resolved
pkg/testutil/testutil.go Outdated Show resolved Hide resolved
pkg/testutil/testutil.go Outdated Show resolved Hide resolved
@apostasie apostasie force-pushed the part5 branch 2 times, most recently from 127556a to 4650e5d Compare October 15, 2024 02:24
pkg/testutil/testutil.go Outdated Show resolved Hide resolved
@apostasie apostasie force-pushed the part5 branch 6 times, most recently from fdbf2ac to c32df15 Compare October 15, 2024 03:55

jobs:
test-unit:
# Supposed to work: https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/evaluate-expressions-in-workflows-and-actions#example-returning-a-json-data-type
# Apparently does not
# timeout-minutes: ${{ fromJSON(env.SHORT_TIMEOUT) }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish that worked. Anyone has insight?

max_attempts: 2
retry_on: error
command: docker run -t --rm --privileged test-integration
run: docker run -t --rm --privileged test-integration ./hack/test-integration.sh -test.only-flaky=false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split the two steps.

@@ -129,7 +133,7 @@ jobs:
echo '{"ipv6": true, "fixed-cidr-v6": "2001:db8:1::/64", "experimental": true, "ip6tables": true}' | sudo tee /etc/docker/daemon.json
sudo systemctl restart docker
- name: "Prepare integration test environment"
run: docker build -t test-integration-ipv6 --target test-integration-ipv6 --build-arg UBUNTU_VERSION=${UBUNTU_VERSION} --build-arg CONTAINERD_VERSION=${CONTAINERD_VERSION} .
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the extra stage that has nothing specific. It is just an extra arg.

max_attempts: 2
retry_on: error
command: docker run --network host -t --rm --privileged test-integration-ipv6
run: docker run --network host -t --rm --privileged test-integration ./hack/test-integration.sh -test.only-ipv6
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nick should go

@@ -248,12 +246,13 @@ jobs:
go-version: ${{ matrix.go-version }}
cache: true
check-latest: true
- name: "Cross"
- name: "build"
Copy link
Contributor Author

@apostasie apostasie Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better display. This is not "cross-compilation" it is building for different versions of go.

@@ -787,6 +787,9 @@ func newBase(t *testing.T, ns string, ipv6Compatible bool, kubernetesCompatible
} else if !base.EnableKubernetes && base.KubernetesCompatible {
t.Skip("runner skips Kubernetes compatible tests in the non-Kubernetes environment")
}
if !GetFlakyEnvironment() && !GetEnableKubernetes() && !GetEnableIPv6() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the part that keep the legacy tests into the "retry" part.

@apostasie
Copy link
Contributor Author

Oh docker, I...

@apostasie apostasie force-pushed the part5 branch 3 times, most recently from 33e89d5 to fd7528d Compare October 19, 2024 17:46
@apostasie apostasie mentioned this pull request Oct 19, 2024
@apostasie apostasie force-pushed the part5 branch 7 times, most recently from dc52901 to f4954f0 Compare October 19, 2024 19:37
@apostasie apostasie changed the title Part 5: CI adjustments CI adjustments Oct 19, 2024
@apostasie apostasie marked this pull request as ready for review October 19, 2024 22:26
@apostasie
Copy link
Contributor Author

That was quite a trek.

@AkihiroSuda @djdongjin at your convenience.
(See ticket about CI status / data for more info on where we stand)

fi
done

if [ "$needsudo" != "" ]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can cause a confusion when WITH_SUDO is set to "0", "false", etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair. I'll restrict it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@AkihiroSuda AkihiroSuda requested a review from djdongjin October 22, 2024 06:27
@AkihiroSuda AkihiroSuda merged commit 3dfe8d2 into containerd:main Oct 23, 2024
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ci e.g., CI failure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants