Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re enabling iiso-offline-install-iscsi.bios #2816

Merged
merged 1 commit into from
Feb 6, 2024

Conversation

jbtrystram
Copy link
Contributor

Following the merge of coreos/coreos-assembler#3702

kola-denylist.yaml Outdated Show resolved Hide resolved
@dustymabe
Copy link
Member

I assume this passes on every architecture now?

@jbtrystram
Copy link
Contributor Author

the target container seems to hang when killed after the test finish, which cause kola to timeout the test. I am investigating this and trying to fix it, but it's hard to reproduce.

@jlebon
Copy link
Member

jlebon commented Jan 23, 2024

Anything in the journal logs now that we have them?

I wonder if systemd in the target container isn't tearing down cleanly. One thing to try is to make the container directly run targetclid instead and pass --init when you podman run. Effectively, this replaces systemd with catatonit.

@jbtrystram
Copy link
Contributor Author

jbtrystram commented Jan 24, 2024

Anything in the journal logs now that we have them?

journal.txt

edit : associated console log is there

@jlebon
Copy link
Member

jlebon commented Jan 25, 2024

One thing I notice here looking at the logs for at least CI here is:

+ targetcli /backstores/block create name=coreos dev=/dev/disk/by-id/virtio-target
Rounding down aligned max_sectors from 4294967295 to 4294967288
g-io-error-quark: Could not connect: No such file or directory (1)
2024-01-23 16:44:49.613581372 +0000 UTC m=+0.578572190 container exec_died 2677d88dfe9cd25778370fdf1b043977a02a38874e30366fbe6b76010de7a007 (image=quay.io/jbtrystram/targetcli:latest, name=target, org.label-schema.license=GPLv2, org.label-
schema.name=CentOS Stream 9 Base Image, org.label-schema.schema-version=1.0, org.label-schema.vendor=CentOS, PODMAN_SYSTEMD_UNIT=target.service, io.buildah.version=1.23.1, org.label-schema.build-date=20240116)
setup-targetcli.service: Main process exited, code=exited, status=255/EXCEPTION
setup-targetcli.service: Failed with result 'exit-code'.

which I think might be targetcli not being able to reach out to the targetclid socket? One way this could happen is if systemd in the container hasn't yet reached sockets.target before setup-targetcli.service starts running. I think this is something that changing the container to directly run targetclid would help. Alternatively, you can have the script wait until the socket is up before proceeding.

I have a suspicion that this is also the issue in the logs from #2816 (comment). The journal logs there look truncated, but stop just after that same line

+ targetcli /backstores/block create name=coreos dev=/dev/disk/by-id/virtio-target

I think what happened there is that OnFailure=emergency.target triggered, which in turn triggered coreos-test-entered-emergency-target.service which explicitly does systemctl poweroff, which is why we see the system power off.

Additionally, https://github.com/coreos/coreos-assembler/blob/1982e0fb3e67e157c9fa5e87abd0793d6088be8d/mantle/cmd/kola/testiso.go#L1015 should be testisocompletion (and the Butane config updated accordingly) so that coreos-test-entered-emergency-target.service can actually write to it so that we can detect failure correctly instead of hanging after QEMU exits and erroring with the more abstract "QEMU exited; timed out waiting for completion" error.

@jbtrystram
Copy link
Contributor Author

which I think might be targetcli not being able to reach out to the targetclid socket?

yeah, good point. I added some healthcheck and updated the way the service is marked ready. I'll test that and report back !

Additionally, coreos/coreos-assembler@1982e0f/mantle/cmd/kola/testiso.go#L1015 should be testisocompletion ...

good catch, thanks ! Fixed.

@jbtrystram jbtrystram force-pushed the enable-iscsi-test branch 3 times, most recently from 6187602 to be2b868 Compare February 2, 2024 09:13
@jbtrystram jbtrystram force-pushed the enable-iscsi-test branch 2 times, most recently from 6b2a636 to 499b19d Compare February 5, 2024 20:06
I have found that nested virtualization does not work for
other arches than x86, so the test is disabled for those
arches in the kola testiso code
We can re-enable this so it runs on x86 at least in the meantime

Tracker issue: coreos/fedora-coreos-tracker#1657
@jlebon jlebon enabled auto-merge (rebase) February 6, 2024 15:19
@jlebon jlebon merged commit c2efb86 into coreos:testing-devel Feb 6, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants