Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run kselftest-tpm2 on grunt #1025

Merged
merged 5 commits into from
Mar 18, 2022
Merged

Conversation

nfraprado
Copy link
Contributor

This PR gets kselftest-tpm2 running on grunt.

Lava log: https://lava.collabora.co.uk/scheduler/job/5608585

Basically the results are:

# selftests: tpm2: test_smoke.sh
# test_read_partial_overwrite (tpm2_tests.SmokeTest) ... ok
# test_read_partial_resp (tpm2_tests.SmokeTest) ... ok
# test_seal_with_auth (tpm2_tests.SmokeTest) ... ERROR
# test_seal_with_policy (tpm2_tests.SmokeTest) ... ERROR
# test_seal_with_too_long_auth (tpm2_tests.SmokeTest) ... FAIL
# test_send_two_cmds (tpm2_tests.SmokeTest) ... ok
# test_too_short_cmd (tpm2_tests.SmokeTest) ... ok
# test_unseal_with_wrong_auth (tpm2_tests.SmokeTest) ... ERROR
# test_unseal_with_wrong_policy (tpm2_tests.SmokeTest) ... ERROR

# test_async (tpm2_tests.AsyncTest) ... ok

# selftests: tpm2: test_space.sh
# test_flush_context (tpm2_tests.SpaceTest) ... /usr/lib/python3.7/unittest/case.py:615: ResourceWarning: unclosed file <_io.FileIO name='/dev/tpmrm0' mode='rb+' closefd=True>
#   testMethod()
# ResourceWarning: Enable tracemalloc to get the object allocation traceback
# ok
# test_get_handles (tpm2_tests.SpaceTest) ... ok
# test_invalid_cc (tpm2_tests.SpaceTest) ... ok
# test_make_two_spaces (tpm2_tests.SpaceTest) ... ok

The errors are caused by tpm2.ProtocolError: TPM_RC_LOCKOUT: cc=0x00000153, rc=0x00000921. There was a commit in the past fixing those, but it was later reverted. However the patch might make sense in KernelCI's usecase, where devices are there just for testing. That said, that patch requires the tpm2_clear tool, which is only available in bullseye.

I ran the tests using the bullseye-kselftest rootfs from #1024, with the addition of the tpm2-tools package, and that patch applied (although modified to clear before rather than after the tests are run), and the errors were gone:
https://lava.collabora.co.uk/scheduler/job/5616044. A single FAIL remained from one of the tests.

So one option would be to, after #1024 is merged, add the tpm2-tools package and carry a patch in KernelCI for kselftest. Ideally the change would be merged upstream, but given that it was reverted already, a different approach would be needed for upstream, and I'm not sure what that would be.

@gctucker gctucker requested a review from a team February 4, 2022 19:59
config/core/rootfs-configs.yaml Outdated Show resolved Hide resolved
@@ -1939,6 +1945,7 @@ test_configs:
- kselftest-lkdtm
- kselftest-rtc
- kselftest-seccomp
- kselftest-tpm2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about running it on other devices? Does it require particular hardware only available on grunt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I intend to. I was having a few issues getting the test running on a few other devices, and grunt was the first one I got it working on. But there should be a few other devices in KernelCI where we can run this test on. By the end of next week I should probably be able to get at least one other device running this test.

@gctucker
Copy link
Contributor

gctucker commented Feb 9, 2022

This is conflicting with #823 which is blocked on a new buildroot image to be produced.

@nfraprado
Copy link
Contributor Author

Oops, I totally missed that PR and ended up duplicating some of that work. In any case, that particular PR doesn't look blocked in any way (it's just adding configs). So I'll rebase this PR on top of that one, and then we can start by merging that one. I'll also comment on it to answer some questions and get it unblocked.

@gctucker
Copy link
Contributor

@nfraprado Please rebase one more time, I think this is conflicting with the kselftest-vm changes that have just been merged.

The tpm2 kselftest requires python3 unittest. Add it to the rootfs so we
can run the test.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
As per grunt's SSDT1 ACPI table, the TPM device requires a GPIO IRQ to
be configured, whose controller has HID AMD0030. The driver that matches
it is pinctrl-amd.

Enable the AMD pinctrl driver so the TPM probes in grunt and can be used
in the tpm2 kselftest.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
Add the tpm2 collection to kselftest. It tests TPM devices that support
the TPM 2.0 specification.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
grunt has a cr50 TPM chip that is accessible through an i2c bus. By
running this test, we're checking that the tpm_tis_i2c_cr50 driver
interfaces correctly with the TPM chip.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
@nfraprado
Copy link
Contributor Author

@gctucker OK, rebased.

Copy link
Contributor

@gctucker gctucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the resulting job, manually submitted with the build and rootfs image mentioned above:
https://lava.collabora.co.uk/scheduler/job/5772426
https://staging.kernelci.org/test/plan/id/6218b696bb88e681ca4b318a/
which still showed the same errors as the kernel build didn't have the required patch.

Then the same job resubmitted with @nfraprado's kernel binary:
https://lava.collabora.co.uk/scheduler/job/5772538
which also shows the errors.

Could you please take a look and confirm which kernel changes are required to make this pass? We could then apply them on top of linux-next on a staging build to check the changes in this PR work with that kernel fix in place.

@@ -268,6 +268,7 @@ fragments:
- 'CONFIG_TCG_TIS_I2C_CR50=y'
- 'CONFIG_SPI=y'
- 'CONFIG_SPI_PXA2XX=y'
- 'CONFIG_PINCTRL_AMD=y'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# fragment from: x86-chromebook
CONFIG_SERIAL_8250_DW=y
CONFIG_X86_AMD_PLATFORM_DEVICE=y
CONFIG_MFD_INTEL_LPSS_PCI=y
CONFIG_USB_GADGET=y
CONFIG_USB_ETH=y
CONFIG_USB_RTL8152=y
CONFIG_MMC=y
CONFIG_MMC_SDHCI=y
CONFIG_MMC_SDHCI_PCI=y
CONFIG_MMC_SDHCI_ACPI=y
CONFIG_CHROME_PLATFORMS=y
CONFIG_CHROMEOS_LAPTOP=y
CONFIG_CHROMEOS_TBMC=y
CONFIG_CROS_EC=y
CONFIG_CROS_EC_LPC=y
CONFIG_CROS_EC_I2C=y
CONFIG_IIO=m
CONFIG_IIO_CROS_EC_SENSORS_CORE=m
CONFIG_IIO_CROS_EC_SENSORS=m
CONFIG_MFD_CROS_EC_DEV=m
CONFIG_I2C_DESIGNWARE_PLATFORM=y
CONFIG_TCG_TPM=y
CONFIG_TCG_TIS=y
CONFIG_TCG_TIS_SPI=y
CONFIG_TCG_TIS_SPI_CR50=y
CONFIG_TCG_TIS_I2C_CR50=y
CONFIG_SPI=y
CONFIG_SPI_PXA2XX=y
CONFIG_PINCTRL_AMD=y

Comment on lines +162 to +163
- python3-minimal
- python3-unittest2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked this has been included in the rootfs build:
https://bot.staging.kernelci.org/job/rootfs-builder/337/console

./kci_rootfs build --rootfs-config bullseye-kselftest --arch amd64 --data-path config/rootfs/debos --output 20220225-gtucker
Running /debos --artifactdir /data/workspace/bot.staging.kernelci.org/kernel-builder/workspace/rootfs-builder/kernelci-core/20220225-gtucker/_install_ --template-var crush_image_options:"" --template-var debian_mirror:"" --template-var keyring_package:"" --template-var suite:"bullseye" --template-var basename:"bullseye-kselftest/amd64" --template-var extra_packages_remove:"" --template-var extra_files_remove:"" --template-var test_overlay:"" --template-var keyring_file:"" --template-var architecture:"amd64" --template-var extra_packages:"bc ca-certificates iproute2 jdim libatm1 libcap2-bin libelf1 libgdbm-compat4 libgdbm6 libhugetlbfs0 libmnl0 libnuma1 libpam-cap libpcre2-8-0 libperl5.32 libpsl5 libxtables12 netbase openssl perl perl-modules-5.32 procps publicsuffix python3-minimal python3-unittest2 wget xz-utils" --template-var extra_firmware:"" --template-var script:"" /data/workspace/bot.staging.kernelci.org/kernel-builder/workspace/rootfs-builder/kernelci-core/config/rootfs/debos/rootfs.yaml --internal-image

The tpm2 kselftest currently throws TPM_RC_LOCKOUT errors. A workaround
for the issue is to patch the kselftest to run tpm2_clear before running
the tests. tpm2_clear is part of the tpm2-tools, so install this package
in order to be able to work around those errors.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
@nfraprado
Copy link
Contributor Author

@gctucker To get rid of the ERRORs, the commit I just pushed is needed in order to have tpm2-tools in the rootfs, and also the attached patch (renamed to a .txt extension since GitHub doesn't like .patch) needs to be applied to the kernel. Note that this patch is in kselftest, so it's the kselftest archive that will be affected rather than the kernel image.

0001-DONOTUPSTREAM-selftest-tpm-Clear-TPM-before-running-.txt

But there's one other thing, it seems like some of the grunt devices have trouble running this test. The very same test you ran here on cbg-7 I re-ran explicitly on cbg-4, and as you can see, the errors I get are the TPM_RC_LOCKOUT ones, which can be worked around with the measures on the previous paragraph, while the errors you got were Device or resource busy: '/dev/tpm0' and TPM_RC_BAD_AUTH. So far it seems like cbg-0 and cbg-7 have this issue, while cbg-1 and cbg-4 don't. I still need to test on others.

@gctucker
Copy link
Contributor

Thanks for the update. We've just identified the issue with different grunt variants in the lab, they should be split as different device types.

Meanwhile I'll get your patch applied on staging linux-next to confirm the rest is working with these changes in place. The kernel builds include the kselftest binaries, it's just part of the artifacts so that will all get tested together. I might also re-run some jobs on different grunt devices to compare the results.

@gctucker
Copy link
Contributor

Also, starting another rootfs build with the extra patckage.

@gctucker
Copy link
Contributor

Starting another amd64 bullseye-kselftest rootfs build on staging now as other things have moved since then, and we'll be able to test that on staging.

@gctucker
Copy link
Contributor

@gctucker
Copy link
Contributor

Previously:
https://lava.collabora.co.uk/scheduler/job/5857148

kselftest: Running tests in tpm2
TAP version 13
1..2
# selftests: tpm2: test_smoke.sh
# ./test_smoke.sh: 9: python3: not found
# ./test_smoke.sh: 10: python3: not found
not ok 1 selftests: tpm2: test_smoke.sh # exit=127
# selftests: tpm2: test_space.sh
# ./test_space.sh: 9: python3: not found

Now, with the updated rootfs image:
https://lava.collabora.co.uk/scheduler/job/5857151

[   31.995459] kselftest: Running tests in tpm2
TAP version 13
1..2
# selftests: tpm2: test_smoke.sh
# test_read_partial_overwrite (tpm2_tests.SmokeTest) ... ok
# test_read_partial_resp (tpm2_tests.SmokeTest) ... ok
# test_seal_with_auth (tpm2_tests.SmokeTest) ... ok
# test_seal_with_policy (tpm2_tests.SmokeTest) ... ok
# test_seal_with_too_long_auth (tpm2_tests.SmokeTest) ... FAIL
# test_send_two_cmds (tpm2_tests.SmokeTest) ... ok
# test_too_short_cmd (tpm2_tests.SmokeTest) ... ok
# test_unseal_with_wrong_auth (tpm2_tests.SmokeTest) ... ok
# test_unseal_with_wrong_policy (tpm2_tests.SmokeTest) ... ok
# 
# ======================================================================
# FAIL: test_seal_with_too_long_auth (tpm2_tests.SmokeTest)

So the test results reported to KernelCI are still the same in terms of pass/fail/skip but at least the tests are actually being run now. I guess there could be test sets defined to include the details of each individual test case, then we would have kselftest-tpm2.test_smoke.test_read_partial_overwrite etc. But that's for a follow-up potential improvement in the test-definitions repository.

Strangely, in the job you ran the errors occurred but the results reported to LAVA were all "ok":
https://lava.collabora.co.uk/scheduler/job/5608585

Out of interest, do you know what is causing the error here with test_seal_with_too_long_auth?

Also, do you reckon the skiplist is masking too many tests and some should be run on grunt and maybe other platforms?

Copy link
Contributor

@gctucker gctucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested OK on staging with a few cases failing, but they can be investigated like any kernel failures.

@gctucker gctucker merged commit 7e6737e into kernelci:main Mar 18, 2022
@nfraprado
Copy link
Contributor Author

Sorry for the delay in the feedback. It is indeed strange that my first run passed even with the tests failing, but re-running the tests, I couldn't reproduce that anymore.

And no, I haven't looked further into the test_seal_with_too_long_auth error but I think that would need getting deeper into the TPM 2.0 spec to understand what's going wrong there.

Also, I saw some timeouts happening during some more runs of the tests, but the run times were fluctuating so much (the smoke would take anything between 34s to 69s to run) that I think it might have been just some issue with the boards on the lab. With the test being merged now, we'll get much more runs, and if the timeouts keep happening then I can send a patch upstream setting the timeout configuration for the tpm selftest to a higher value (it's currently at the default 45s).

By the way, is there a way to see which patches are currently being applied by KernelCI to run the tests? Even though I created an issue, kernelci/kernelci-project#101, to track the TPM_RC_LOCKOUT issue that we needed a patch to workaround, it would be even better if it was very clear which patch is being used in KernelCI to workaround the issue. I'm just worried we might forget about this patch even. And since it's not upstreamable, we should have it at hand when looking into a proper solution upstream at a later point.

@gctucker
Copy link
Contributor

OK, if the test takes between 30 and 60s to run then the timeout should probably be like 120s anyway.

The kernel patches applied on staging can be see on the git branches, here's staging-next for example:
https://github.com/kernelci/linux/commits/staging-next

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants