-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serial can't keep up on spherion and tomato (MediaTek Chromebooks) #366
Comments
Bumped into the same issue when checking preliminary results for the watchdog reset test: https://lava.collabora.dev/scheduler/job/14140804#L10766 This is another instance where missing characters can lead to a false regression (here |
Maybe adding somewhere shell script:
and triggering on shorter string "ERR-NODEV" (that also repeats, to increase chance it wont be mixed with some other stuff)? |
@nuclearcat We need to fix the serial because we won't be able to change the code for the upstream tests, and we wouldn't want them to have to handle a flaky serial anyway. @laura-nao So far I've only seen it happen on these two platforms, but I'll add updates here if I see it elsewhere. I think adding |
I think trogdor also have flaky serial |
Add test_character_delay to the Spherion and Tomato platforms to workaround the fact that they're sometimes unable to process serial input fast enough, resulting in mangled commands and consequently flaky test results, as described in kernelci-project#366 [1]. The right place to do this change would be in the device-type template as describe in LAVA's documentation [2]. This overriding in KernelCI is meant only as a temporary workaround to verify whether this fixes the issue. If it does, then we'll do it in LAVA upstream instead. [1] kernelci/kernelci-project#366. [2] https://docs.lavasoftware.org/lava/debugging.html#differences-in-input-speeds Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
Add test_character_delay to the Spherion and Tomato platforms to workaround the fact that they're sometimes unable to process serial input fast enough, resulting in mangled commands and consequently flaky test results, as described in kernelci/kernelci-project#366. The right place to do this change would be in the device-type template as described in LAVA's documentation [1]. This overriding in KernelCI is meant only as a temporary workaround to verify whether this fixes the issue. If it does, then we'll do it in LAVA upstream instead. [1] https://docs.lavasoftware.org/lava/debugging.html#differences-in-input-speeds Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
Add test_character_delay to the Spherion and Tomato platforms to workaround the fact that they're sometimes unable to process serial input fast enough, resulting in mangled commands and consequently flaky test results, as described in kernelci/kernelci-project#366. The right place to do this change would be in the device-type template as described in LAVA's documentation [1]. This overriding in KernelCI is meant only as a temporary workaround to verify whether this fixes the issue. If it does, then we'll do it in LAVA upstream instead. [1] https://docs.lavasoftware.org/lava/debugging.html#differences-in-input-speeds Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
@nuclearcat right, I remember trogdor has flaky serial, but I think it was the output, not the input, and in that case the delay wouldn't help. But I'll check it and the other platforms. For now I've created a PR to for us to test if this does fix it on spherion and tomato: kernelci/kernelci-pipeline#626. |
I noticed the |
Add test_character_delay to the Spherion, Tomato and Steelix platforms to workaround the fact that they're sometimes unable to process serial input fast enough, resulting in mangled commands and consequently flaky test results, as described in kernelci/kernelci-project#366. The right place to do this change would be in the device-type template as described in LAVA's documentation [1]. This overriding in KernelCI is meant only as a temporary workaround to verify whether this fixes the issue. If it does, then we'll do it in LAVA upstream instead. [1] https://docs.lavasoftware.org/lava/debugging.html#differences-in-input-speeds Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
Add test_character_delay to the Spherion, Tomato and Steelix platforms to workaround the fact that they're sometimes unable to process serial input fast enough, resulting in mangled commands and consequently flaky test results, as described in kernelci/kernelci-project#366. The right place to do this change would be in the device-type template as described in LAVA's documentation [1]. This overriding in KernelCI is meant only as a temporary workaround to verify whether this fixes the issue. If it does, then we'll do it in LAVA upstream instead. [1] https://docs.lavasoftware.org/lava/debugging.html#differences-in-input-speeds Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
* src/scheduler: store error message when job fails with "submit_error" It is helpful for debugging to catch error message when scheduler fails to submit job to runtime. Store the error message to `data.error_msg` field. Signed-off-by: Jeny Sadadia <[email protected]> * config: pipeline: Set minimum kernel version for DT kselftest to 6.7 The test was introduced upstream in version 6.7, so no point in trying to run it on earlier versions. Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * configs/: Update volteer device Update volteer devices according lab availability Signed-off-by: Denys Fedoryshchenko <[email protected]> * result_summary templates: detailed output for active/inactive regressions Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: new presets for active regressions Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: update CHANGELOG Signed-off-by: Ricardo Cañuelo <[email protected]> * data: chmod -R 777 ./data/output to avoid permission error Avoid errors like PermissionError: [Errno 13] Permission denied: '/home/kernelci/data/output/stable-rc-boot.html' Signed-off-by: Helen Koike <[email protected]> * result_summary: move code to _get_logs Signed-off-by: Helen Koike <[email protected]> * result_summary: use ThreadPoolExecutor to fetch logs Fetching logs is the bottleneck of the script. Fetch them in parallel with ThreadPoolExecutor. Signed-off-by: Helen Koike <[email protected]> * result_summary: fix result presets stable-rc-build-failures and stable-rc-boot-failures weren't querying specifically for test failures. Signed-off-by: Ricardo Cañuelo <[email protected]> * src/regression_tracker: rework regression detection Take into account "active" and "inactive" regressions when creating them and when processing new passed or failed nodes. When a node passes, it checks if it "inactivates" an existing "active" regression. When a node fails, it checks if it needs to create a new regression or update an existing "active" one. Signed-off-by: Ricardo Cañuelo <[email protected]> * src/regression_tracker: link failed nodes to active regressions When a failed node generates a regression, or when it's a re-run of a run that generated a still active regression, link the node to the regression id. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: support for date ranges for creation and update New command line options to let the user specify date ranges for node creation and last update: --created-from, --created-to, --last-updated-from, --last-updated-to Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: support for date ranges for creation and last update Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: support for extra query parameters in cmdline New command line option: --query-params to specify a set of extra query parameters to complete or override preset parameters. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: html markup in some preset titles Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary changelog: update and move to docs folder Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: move parameter loading and processing to 'setup' Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: refactor and split into two clases (single, run) Split the ResultSummary class into a base class and two child classes: ResultSummarySingle and ResultSummaryLoop (only a stub at this point). Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: WIP initial implementation of the "loop" command Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: huge refactoring Implement "summary" (single-shot) and "monitor" (loop) modes based on preset parameters instead of on the command-line main command. Split the logic into multiple files, move all monitor-specific and summary-specific code to independent files, common code in a separate file. Full of kludges, I don't like how this is looking so far, might consider reimplementing it without any dependencies on pipeline code. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: fix markup and indentation Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: new generic templates for monitor mode Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: examples for "monitor" and "summary" modes Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary changelog: summary and monitor modes Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: fix generic regression report Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: summary: fix last_updated option handling Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: embed css stylesheet in html files Signed-off-by: Ricardo Cañuelo <[email protected]> * regression_tracker: [trivial] make regression active by default Fixup for commit fcb29501663d78920bcd129bd57c36b9af624bc4 If the "result" field is ever made non-optional in the models we can probably remove this. Signed-off-by: Ricardo Cañuelo <[email protected]> * regression_tracker: [trivial] set default empty node sequence Fixup for commit fcb29501663d78920bcd129bd57c36b9af624bc4 If the "node_sequence" field is ever made non-optional in the models we can probably remove this. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: add cmdline option --output-dir Introduce a new command-line option: --output-dir, and rename the old --output to --output-file. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary changelog: command-line options change Signed-off-by: Ricardo Cañuelo <[email protected]> * config: jobs-chromeos: remove meaningless Tast tests Several Tast tests can only fail in the context of KernelCI: * `video.PlatformDecoding.v4l2_state*_vp9_0_svc` do not actually exist, causing the whole test job to fail * `platform.DLCService*` and `platform.Memd` rely on features only present in the downstream Chrom{e,ium}OS kernel (see b/247467814 and b/244479619 for those having access to Google's issue tracker) * `kernel.ConfigVerify.chromeos` relies on downstream-only config options such as `CONFIG_SECURITY_CHROMIUMOS` and other similar ones, and therefore can only fail when testing upstream kernels Signed-off-by: Arnaud Ferraris <[email protected]> * config: scheduler-chromeos: don't execute non-working Tast tests Currently, HEVC-related tests are known to either fail or be skipped as ChromeOS doesn't yet handle hardware decoding of HEVC media. This is expected to be fixed at some point though, so we're keeping the job definitions and only remove the corresponding scheduler entries in order to reinstate those jobs when relevant. Signed-off-by: Arnaud Ferraris <[email protected]> * config: jobs-chromeos: exclude Tast tests known to always fail Several decoder tests always fail on all platforms where they're executed, adding only noise to otherwise useful test results. Disable those for improving the quality of the results. Signed-off-by: Arnaud Ferraris <[email protected]> * config: chromeos: add special case for pre-6.7 qcom codec tests On Qualcomm-based ChromeBooks (`trogdor` being the only model in Collabora's lab), we noticed systematic failures of all `vp9_*_frm_resize` and `vp9_*_sub8x8_sf` tests when using a kernel up to 6.6. With 6.7 and above, all of those tests (except one) now pass. It therefore makes sense to exclude those on pre-6.7 kernels so we don't report known failures and get rid of some noise. This involves "duplicating" affected test jobs (although I did my best to minimize that) and setting rules so only the working variant is executed, based on the version of the kernel being tested. Signed-off-by: Arnaud Ferraris <[email protected]> * lava_callback: Compress the log files to save storage space As storage space in cloud and egress have high costs, better to compress potentially large files. Signed-off-by: Denys Fedoryshchenko <[email protected]> * tests: Add basic yaml validation Add yaml load to figure out earlier issues with yaml Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: chromeos: drop stoneyridge/pineview naming in platforms anchors The "stoneyridge" and "pineview" naming used in the Chromebook platform anchors refers to ChromiumOS specific config fragments, but doesn't necessarily match the actual platform of all the devices listed. Use more generic names to distinguish amd and intel Chromebooks. Signed-off-by: Laura Nao <[email protected]> * config: chromeos: rename test job anchors that use chromeos specific configs Rename test job anchors that use chromeos specific kernel configurations to include the 'chromeos' infix. Signed-off-by: Laura Nao <[email protected]> * config: chromeos: add baseline tests Enable the baseline tests on all the supported Chromebooks with their default kernel configuration. Signed-off-by: Laura Nao <[email protected]> * config: chromeos: drop stoneyridge/pineview naming in job defs The "stoneyridge" and "pineview" naming used in some Chromebook job definitions refers to ChromiumOS specific config fragments, but doesn't necessarily match the actual platforms targeted by the jobs. Replace all occurrences with more generic intel/amd naming. Signed-off-by: Laura Nao <[email protected]> * config: chromeos: drop chromeos infix from baseline jobs Keeping different job names for tests targeting different kernel configs might cause too much duplication. Drop the 'chromeos' infix from the job name for the tests using the chromeos config fragment. Users will be able to filter the results using the data.defconfig/data.config_full fields anyway. Signed-off-by: Laura Nao <[email protected]> * result_summary: post-process results for summary and monitor modes Split the post-processing of nodes to a common function that can be used for both summary and monitor modes. Currently, post-processing involves only the collection of logs. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: update and fix presets and templates Signed-off-by: Ricardo Cañuelo <[email protected]> * doc/result-summary-CHANGELOG: update Signed-off-by: Ricardo Cañuelo <[email protected]> * config/pipeline.yaml: enable 'BayLibre' lab Add lab configuration for BayLibre. Signed-off-by: Jeny Sadadia <[email protected]> * docker-compose.yaml: add `lab-baylibre` runtime Add runtime argument `lab-baylibre` to `scheduler-lava` container. This will enable the pipeline to run and submit jobs to BayLibre. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add `baseline-x86-baylibre` job Add job configuration `baseline-x86-baylibre` for BayLibre. Add scheduler entry as well. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add `baseline-armel-baylibre` job Add job configuration `baseline-armel-baylibre` for BayLibre. Add scheduler entry and platform config as well. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline: enable `android` tree and build configs Monitor linux `android` tree. Add build configs for `android-mainline` branch. Signed-off-by: Helen Koike <[email protected]> * config/pipeline.yaml: add kbuild definitions for android-mainline Add kbuild jobs to compile the kernel for android-mainline branch Signed-off-by: Helen Koike <[email protected]> * config/pipeline.yaml: add entries to schedule to build android-mainline Add entries to `scheduler:` section to run the builds for android-mainline. Signed-off-by: Helen Koike <[email protected]> * result_summary: fix node filter in monitor mode Signed-off-by: Ricardo Cañuelo <[email protected]> * kernelci.toml: set `checkout` node timeout to `180 min` Currently set `60 min` timeout is not enough as some `kbuild` jobs and its sub-tests take around 2 hrs to complete after getting submitted to runtime. Here is an example from staging. See the information for a `checkout` and its child nodes: | id | name | created | updated | timeout | |--------------------------|---------------------|----------------------------|----------------------------|----------------------------| | 661c9d59b60b785eb9fc42b0 | checkout | 2024-04-15T03:22:01.317000 | 2024-04-15T03:51:03.870000 | 2024-04-15T04:22:01.284000 | | 661c9d97b60b785eb9fc42b4 | kbuild-gcc-10-arm64 | 2024-04-15T03:23:03.399000 | 2024-04-15T03:50:15.031000 | 2024-04-15T09:23:03.399000 | | 661ca3f7b60b785eb9fc4ead | baseline-arm64 | 2024-04-15T03:50:15.304000 | 2024-04-15T05:09:45.247000 | 2024-04-15T09:50:15.304000 | Signed-off-by: Jeny Sadadia <[email protected]> * result_summary: add email report capabilities for monitor mode Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: plain text single report templates Signed-off-by: Ricardo Cañuelo <[email protected]> * config: chromeos: add baseline-nfs tests Enable the baseline-nfs tests on all the supported Chromebooks, with both the default and the chromeos kernel configurations. Signed-off-by: Laura Nao <[email protected]> * src/timeout: set `checkout` result For `TIMEOUT` mode, set `checkout` node result to `fail` if its state is `running` as it means code checkout is still going on and node timed-out. Set it to `pass` if its state is any other than `running`. Set `checkout` node result to `pass` if mode is `DONE` as it means once `checkout` has been in `available` or `closing` state and it could successfully complete source code checkout. Signed-off-by: Jeny Sadadia <[email protected]> * regression_tracker: bugfix, failed test with no prior runs Handle the case of a failed test run when it's the first occurence of that test case. Consider it "not a regression" for now, since we're defining a regression as a "breaking point" between a success and a failure. Signed-off-by: Ricardo Cañuelo <[email protected]> * config: platforms-chromeos: fix dalboz device type Due due to a copy/paste mishap, the device type for `asus-CM1400CXA-dalboz` had a trailing `_chromeos`, leading LAVA to fail finding the correct device type, and no job from the new system running on this platform. Signed-off-by: Arnaud Ferraris <[email protected]> * config: jobs-chromes: run Tast tests only on 5.4+ Current ChromeOS images have `ext4` filesystems using options not present in 4.19. Therefore tests cannot run on kernels that old, and this leads to false positives in corrupt device identification, so we should only run those tests on 5.4 and later kernels. Signed-off-by: Arnaud Ferraris <[email protected]> * config: platforms-chromes: drop non-existent platform `hp-x360-12b-ca0500na-n4000-octopus` isn't a device type available in Collabora's LAVA lab, so let's drop its definition. Signed-off-by: Arnaud Ferraris <[email protected]> * config: exclude android tree from kbuild jobs Only Android-specific kbuild jobs should run for this tree, let's not overload our system with unneeded builds. Take this opportunity to limit mediatek kbuilds to 6.1+ as that's the earliest version that has upstream support for at least one of our devices. Signed-off-by: Arnaud Ferraris <[email protected]> * src/timeout: a bug fix in `_submit_lapsed_nodes` Fix a glitch in the code related to setting `checkout` node result. Fixes: 361fc0d ("src/timeout: set `checkout` result") Signed-off-by: Jeny Sadadia <[email protected]> * pipeline.yaml: Update early access FQDN We are moving k8s from eastus to westus3 as it is cheaper Signed-off-by: Denys Fedoryshchenko <[email protected]> * src/tarball: fix `_kdir` in `update_repo` Fix the below error: ``` kernelci-pipeline-tarball | File "/home/kernelci/./pipeline/tarball.py", line 79, in _update_repo kernelci-pipeline-tarball | kernelci.shell_cmd(f"rm -rf {self._kdir}") kernelci-pipeline-tarball | ^^^^^^^^^^ kernelci-pipeline-tarball | AttributeError: 'Tarball' object has no attribute '_kdir' ``` Fixes: 0a2fe9c ("src/patchset.py: Implement Patchset service) Signed-off-by: Jeny Sadadia <[email protected]> * src/timeout: fix method to get child nodes recursively `TimeoutService._get_child_nodes_recursive` is used to get pending child nodes recursively for closing and timed-out nodes. It overwrites the result while being called recursively. Fix the method to make it work properly. Signed-off-by: Jeny Sadadia <[email protected]> * config: pipeline: rename "armel" arch to "arm" `armel` has various meanings depending on the system: for ChromeOS, it is ARMv7, while in Debian it's ARMv{5T,6}. Moreover, this project is *Kernel*CI and the kernel uses `arm` for all 32-bits ARM devices. In order to avoid confusion (including those wondering what the heck does `armel` mean), let's rename `armel` to `arm`. Signed-off-by: Arnaud Ferraris <[email protected]> * config: use per-system arch property where relevant With the new `*arch` fields present in the platform configurations, we don't have to hardcode the architecture strings in some specific cases. Let's adapt the config files so we use `{cros,deb,k}arch` wherever it makes sense. Signed-off-by: Arnaud Ferraris <[email protected]> * src/timeout: set timed-out `checkout` result Set timed-out `checkout` node result to `incomplete` while in `running` state. As it denotes that the node timed-out while checkout was still going on. Also, set error related information i.e. `error_code` and `error_msg`. Signed-off-by: Jeny Sadadia <[email protected]> * src/tarball: update checkout node when update repo fails Tarball updates source code repo and creates tarball. If update repo operation fails even with second attempt, it means it failed to checkout souce code. Hence, update `checkout` node with state `done` state and result `fail`. Also, set appropriate error information to the `data` field. Signed-off-by: Jeny Sadadia <[email protected]> * config: pipeline: enable collabora-next tree and build config Monitor the collabora-next tree. Add build config for the for-kernelci branch. Signed-off-by: Laura Nao <[email protected]> * config: chromeos: enable acpi kselftest on collabora-next tree Run the ACPI kselftest on the for-kernelci branch of the collabora-next tree. See: https://lore.kernel.org/linux-kselftest/[email protected]/T/#t Signed-off-by: Laura Nao <[email protected]> * result_summary: restore missing split_query_params function Restore this function that was accidentally removed during the last refactoring. Signed-off-by: Ricardo Cañuelo <[email protected]> * lava_callback: Don't upload empty files to Azure There is no use for lot of empty files on Azure, that only complicate cleanup. Signed-off-by: Denys Fedoryshchenko <[email protected]> * result_summary presets: unify preset and output names Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: update preset for aferraris Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: new presets for laura.nao Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: fixes and new presets for nfraprado Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: fix arch query parameters Signed-off-by: Ricardo Cañuelo <[email protected]> * k8s: Lot of deployment tested fixes Fixes in yaml files for k8s production deployment. Signed-off-by: Denys Fedoryshchenko <[email protected]> * result-summary presets: Fix build failure and regression monitors Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * result_summary: added debug traces to the monitor Show detailed info of the node filterings in real time. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: fix corner case bug when no logs are found Cover rare case where neither the node nor any of its parents up to the checkout node have any log artifacts. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: refine stable-rc presets Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: add regression info to test reports Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: escape log snippets Signed-off-by: Ricardo Cañuelo <[email protected]> * src: lava_callback: add device ID to node data It can be useful to know the exact device on which a job ran, without having to open the LAVA job page. This is done by querying the device ID from the callback data and appending it to the node data. Signed-off-by: Arnaud Ferraris <[email protected]> * src: lava_callback: upload raw callback data as well Debugging callback issues is complex due to the raw data not being saved after processing. This change ensures we save the callback data as a JSON file in order to ease development. Signed-off-by: Arnaud Ferraris <[email protected]> * DONOTMERGE lava_callback: add debug statements Why the heck doesn't this just work??? Signed-off-by: Arnaud Ferraris <[email protected]> * result_summary_templates: fix error 'node' is undefined The object is named test and not node, so s/node/test Signed-off-by: Helen Koike <[email protected]> * config/runtime/kunit: set architecture info Set architecture field for `kunit` test nodes. If no `arch` argument is supplied, kunit takes `um` (User Mode Linux) as architecture to run tests. Signed-off-by: Jeny Sadadia <[email protected]> * src/timeout: count running child jobs of build nodes Add a method to count running jobs of `kbuild` nodes i.e. jobs being submitted after successful builds. Fox example `baseline` or `tast` jobs. Signed-off-by: Jeny Sadadia <[email protected]> * src/timeout: handle closing `checkout` node differently Usually, `checkout` should be transited to `done` state when all its child nodes are completed. In case of closing `checkout`, take into account running child jobs of build nodes before transiting its state to `done`. Otherwise, `checkout` will be assigned to `done` state even if some child jobs are still running. Signed-off-by: Jeny Sadadia <[email protected]> * src/timeout: handle holdoff reached `checkout` node differently Usually, available `checkout` for which holdoff is reached should be transited to `done` state only when all its child nodes are completed. In case of such `checkout` node, take into account running child jobs of build nodes before transiting its state to `done`. Otherwise, `checkout` will be assigned to `done` state even if some child jobs are still running. Signed-off-by: Jeny Sadadia <[email protected]> * Revert "DONOTMERGE lava_callback: add debug statements" This reverts commit 5ed8218d99840373bbba5830b1976813b52bf4b1. Signed-off-by: Arnaud Ferraris <[email protected]> * Create dependabot.yml * result_summary_templates: make generic-test-failures generic to all results The generic-test-failures templates can be used to show general results just replacing the name "failures" by "results". Makeing it easier to be re-used by communities that want to have pre-sets to list all results of the tests, so: s/generic-test-failures/generic-test-results Signed-off-by: Helen Koike <[email protected]> * result-summary.yaml: add preset to list android build tests Since we now build android, add a preset to allow result-summary.yaml to list all build results from Android tree. Signed-off-by: Helen Koike <[email protected]> * tarball: Implement checkout for specific commit We often need not ToT, but specific commit, implement this. Signed-off-by: Denys Fedoryshchenko <[email protected]> * jobs-chromeos.yaml: Disable module compression for every kernel version Commit d4bbe942098b ("kbuild: remove CONFIG_MODULE_COMPRESS"), introduced in kernel v5.13, substituted CONFIG_MODULE_COMPRESS=n for CONFIG_MODULE_COMPRESS_NONE=y as the way to disable module compression. Since module compression causes "Invalid ELF header magic: != ELF" errors during boot on the ChromeOS base config, add the missing config to disable module compression on kernels > v5.13 as well. Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * src: lava_callback: reduce callback data size The callback data is quite large, especially as it includes the full log which we already upload separately. By dropping it and compressing the whole file with `gzip` we can avoid wasting too much storage space. Signed-off-by: Arnaud Ferraris <[email protected]> * src: lava_callback: don't leak secret token The callback data contains the secret tokens value which shouldn't be leaked. Ensure we drop it from the uploaded data. Signed-off-by: Arnaud Ferraris <[email protected]> * config: platforms-chromeos: use new cros-flash image This ensures we use the new version of the `install-modules` script. Signed-off-by: Arnaud Ferraris <[email protected]> * src: regression_tracker: add the "device" field to regression data This can be helpful. We're not using it as a search param though, as we don't want to narrow down the search that much, using the platform only is better. Signed-off-by: Arnaud Ferraris <[email protected]> * config: result_summary_templates: report device used for job This information is now available, and it can be useful to know the affected device withouth having to look at the LAVA job details. Signed-off-by: Arnaud Ferraris <[email protected]> * kubernetes: Update deployment recipe Update list of labs and add KCI_INSTANCE variable. Signed-off-by: Denys Fedoryshchenko <[email protected]> * lava-callback: Limit threads of lava-callback Due inrush of lava callbacks and slow Azure Files processing, we need to make sure we dont spawn too many threads. Also add hard limit of memory 1Gbyte Signed-off-by: Denys Fedoryshchenko <[email protected]> * result_summary presets: add presetes for fluster test Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Make template generic for all v4l2 tests - Rebase on main * result_summary presets: make the name of fluster test generic Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: enable first fluster test for mt8195-cherry-tomato-r2 Enable first fluster test, AV1-TEST-VECTORS for mt8195-cherry-tomato-r2. Run the test on mainline and next until more trees are added. Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Create generic v4l2-decoder-conformance-job and use anchers from it - Update the rootfs address - Move anchor to _anchor - Update with nitpicks * config: jobs-chromeos: Add kernelci tree for testing purpose Remove this commit before merging. Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: chromeos: Enable cpufreq kselftest Enable cpufreq kselftest on all the trees and branches. Signed-off-by: Shreeya Patel <[email protected]> * result_summary presets: fix preset for kselftest-dt failures monitor Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: new presets for kselftest-cpufreq Signed-off-by: Ricardo Cañuelo <[email protected]> * config: mt8195-cherry-tomato-r2: enable all fluster tests for all branches Add all the trees and branches on which the tests would be ran. Enable all the tests for tomato. Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - The build config cannot be added yet. Just list the trees, it will only use the branches configured in build_configs: - mainline will use master - next will use master - collabora-chromeos-kernel will use for-kernelci - media will use master and fixes - Remove kernelci tree as it was added just for testing purpose * config: mt8183-kukui-jacuzzi-juniper-sku16: enable add all supported fluster tests Signed-off-by: Muhammad Usama Anjum <[email protected]> jacuzzi * config: mt8186-corsola-steelix-sku131072: enable add all supported fluster tests Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: mt8192-asurada-spherion-r0: enable add all supported fluster tests Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Don't specify the platforms manually as they are already mentioned in test-job-arm64-mediatek * config: sc7180-trogdor-kingoftown/lazor-limozeen: enable add all supported fluster tests Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Use test-job-arm64-qualcomm instead and carete separate jobs for qualcomm devices - Don't specify platforms manually as they are already mentioned in test-job-arm64-qualcomm * build(deps): bump uwsgi from 2.0.21 to 2.0.22 in /docker/lava-callback Bumps [uwsgi](https://uwsgi-docs.readthedocs.io/en/latest/) from 2.0.21 to 2.0.22. --- updated-dependencies: - dependency-name: uwsgi dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> * pipeline.yaml: Add stable-rc build variants Add more build variants for stable-rc tree to match legacy system. Signed-off-by: Denys Fedoryshchenko <[email protected]> * result_summary: add error classification Classify errors according to patterns in the logs Signed-off-by: Helen Koike <[email protected]> * result_summary presets: add collabora-chromeos-kernel and media trees for fluster tests Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: Use media-stage instead of media-tree Signed-off-by: Muhammad Usama Anjum <[email protected]> * config/pipeline: enable android branches from legacy Enable all android branches from the legacy system Signed-off-by: Helen Koike <[email protected]> * trigger: Add exclude/include tree list for trigger As we need to restrict list of running kernels on staging, we need to add option allowing that. Also it will be good to exclude staging kernels from production kernel list. So in case of staging we need to run kernels only from tree "kernelci" and sometimes something else, for example "mediatek". Option will look like: --trees kernelci,mediatek or --trees kernelci On production we need to exclude trees kernelci and buggytree: --trees !kernelci,buggytree or just kernelci: --trees !kernelci Purpose of this option is that our compiling capacity is limited, and right now staging and production both compiling very large set of kernels, we need to reduce this amount to drop costs. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: platforms-chromeos: use CrOS R124 files ChromeBooks were upgraded with a new image based on ChromiumOS R124, so we must use those files now. Signed-off-by: Arnaud Ferraris <[email protected]> * config: jobs-chromeos: drop non-existent Tast tests Those were removed between R120 and R124 and therefore cause test failures with the new images. Signed-off-by: Arnaud Ferraris <[email protected]> * result_summary presets: fix acpi kselftest presets We're interested in catching regressions and failures in the both the kselftest-acpi test suites and its test cases. Match the nodes by group in the presets accordingly. Fix template used by the failure monitor preset. Signed-off-by: Laura Nao <[email protected]> * src: update return values of `APIHelper.receive_event_node` `APIHelper.receive_event_node` method is used to receive node data from PubSub event. The method has been updated to return `is_hierarchy` flag as well which represents events related to node hierarchy. Update pipeline services using the method accordingly. Signed-off-by: Jeny Sadadia <[email protected]> * result_summary presets: refine presets for v4l2-decoder-conformance Modify the regression preset to monitor regressions on both the v4l2-decoder-conformance test suites and its test cases, by matching the nodes by group instead of by name. Also, change the failure preset to monitor for all errors caused by runtime errors. Signed-off-by: Laura Nao <[email protected]> * result_summary presets: add summary presets for v4l2-decoder-conformance Add summary presets to fetch regressions and failures on v4l2-decoder-conformance tests. Two of the presets are the same used by the monitor; add one additional preset to fetch all the failures on both the test suites and their test cases. Signed-off-by: Laura Nao <[email protected]> * lava_callback.py: Remove error_code/error_msg on lava-callback Sometimes due congestion node might be set to timeout, but then result might arrive late and we need to use it properly. Signed-off-by: Denys Fedoryshchenko <[email protected]> * result_summary presets: fix dt kselftest presets Fix the dt kselftest preset, just like was done for the acpi one, as the current preset doesn't match the actual results we're interested in. Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * doc/connecting-lab: refine documentation Refine documentation for connecting LAVA labs and submitting jobs to the lab. Signed-off-by: Jeny Sadadia <[email protected]> * lava_callback: Sometimes we get totally invalid log file uploaded Most likely problems lays in threading of flask, and possibly callbacks are getting mixed. This commit attempts to introduce several countermeasures against that. Signed-off-by: Denys Fedoryshchenko <[email protected]> * doc: add `_index.md` page Add index documentation page. Signed-off-by: Jeny Sadadia <[email protected]> * doc: add `pipeline-details` page Move `pipeline-details` documentation from the API repository to this repo to make it close to the source. Signed-off-by: Jeny Sadadia <[email protected]> * doc/connecting-lab: adjust `weight` property Change `weight` property of existing doc page to accommodate with transition of pipeline related docs to pipeline repo. Signed-off-by: Jeny Sadadia <[email protected]> * doc: add `developer-documentation` page Add developer manual documentation. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add lab config for Qualcomm Add an entry to `runtimes` section for Qualcomm lab configurations. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add `baseline-x86` job for qualcomm Add job configuration `baseline-x86-qualcomm` for running baseline job in Qualcomm LAVA lab. Add scheduler entry as well. Signed-off-by: Jeny Sadadia <[email protected]> * docker-compose.yaml: add lab-qualcomm runtime Add runtime argument `lab-qualcomm` to `scheduler-lava` container. This will enable the pipeline to run and submit jobs to Qualcomm LAVA lab. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add `baseline-arm64` job for qualcomm Add job configuration `baseline-arm64-qualcomm` for running baseline job for `arm64` in Qualcomm LAVA lab. Add scheduler entry as well. Signed-off-by: Jeny Sadadia <[email protected]> * pipeline.yaml: Update RISC-V configs 1)rv32 defconfig doesn't exist, remove 2)nommu_k210_defconfig have modules disabled Signed-off-by: Denys Fedoryshchenko <[email protected]> * lava_callback.py: Sanitize lava log data As we use this data in reports, lets remove all non-printable characters as they confuse grafana, browsers and others. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config/runtime/kunit.jinja2: fix result map Fix result map for skipped tests. Initially, API didn't have `skip` available node result in the schema. That's why it was mapped to `None` result. But now API has `skip` result to denote skipped tests. Fix the result mapping accordingly. Signed-off-by: Jeny Sadadia <[email protected]> * config: jobs-chromeos: Add lab-setup fragment Add the lab-setup fragment to the chromebook builds, which contains the architecture independent kernel configs needed to run tests on the platform. Notably this disables IP autoconfig by the kernel. The result of this change is that the 12 seconds boot delay and the consequent deferred probe pending warnings will no longer happen on any platform. Particularly on mt8186-corsola-steelix-sku131072 (due to a different network adapter being used) on which it was still happening. Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * lava_callback: bump up slightly threads number Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: chromeos: enable watchdog reset test on Chromebooks Add a basic test to verify watchdog reset functionality. Enable the test on all ARM64 and AMD x86_64 Chromebooks. For Intel Chromebooks, enable the test only on octopus, as ACPI PM Timer on the other devices has been disabled in coreboot. Signed-off-by: Laura Nao <[email protected]> * src/send_kcidb: use schema version 4.3 Test status `MISS` was added to KCIDB in schema v4.2 and supported by the latest version i.e. v4.3. Hence, use the latest version for submission as API may send a few tests with "MISS" status. Signed-off-by: Jeny Sadadia <[email protected]> * send_kcidb: re-structure code for parsing checkout node Move code for parsing checkout node to a separate method. Add `valid` field to parsed checkout node. It denotes if source code was successfully checked out. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: print more information on invalid data Print details for invalid revision data for the sake of debugging. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: optimize `kcidb` import Remove redundant `kcidb` import and adjust kcidb Client call accordingly. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: remove keys with `None` values KCIDB doesn't allow `None` as field value. Remove all optional fields with `None` value to make it valid data for submitting to KCIDB. Signed-off-by: Jeny Sadadia <[email protected]> * config: add `kcidb_test_suite` property Every KernelCI test will be mapped to a unified test suite for KCIDB data submission. Add `kcidb_test_suite` property to test job definitions in YAML configuration files. The added property will store the mapped KCIDB test suite name. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: parse and submit node test and build data Listen to all the node events with node state `done` or `available` and submit the node to KCIDB. Parse node received from the event and create KCIDB schema compatible object based on type of the node i.e. checkout, build or test. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: set `log_excerpt` for builds and tests Fetch logs from compressed log file(*.log.gz) URL and send last 16*1024 characters for setting `log_excerpt` field for build and test nodes as it is the max allowed length of the KCIDB field. Signed-off-by: Jeny Sadadia <[email protected]> * config/jobs-chromes: add kcidb test suite property for watchdog test Add KCIDB test suite mapping for `watchdog_reset` test. Signed-off-by: Jeny Sadadia <[email protected]> * lava_callback.py: disable log removal from callback data We need it for investigations if we have any critical data loss during log sanitizing. Signed-off-by: Denys Fedoryshchenko <[email protected]> * src/send_kcidb: add error info to build nodes Add error metadata fields such as `error_code` and `error_msg` to `misc` field for build nodes. Signed-off-by: Jeny Sadadia <[email protected]> * result_summary presets: add watchdog-reset presets for mainline/next Add monitor and summary presets to track the results from the watchdog reset test on the mainline and next trees. Signed-off-by: Laura Nao <[email protected]> * pipeline.yaml: Fix fluster rootfs URL Signed-off-by: Denys Fedoryshchenko <[email protected]> * src/send_kcidb: get error metadata for failed/incomplete tests Tweak condition to get error metadata for test nodes. It should get error info for incomplete nodes as well and not just failed nodes. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: send tests only if KCIDB test mapping exists All test suite definitions must have `kcidb_test_suite` property i.e. KCIDB test suite mapping. Only send tests for those the mapping is found. Signed-off-by: Jeny Sadadia <[email protected]> * tests/validate_yaml: add validation for KCIDB mapping To submit KernelCI generated data to KCIDB, it is required to have a mapping for all the job definition with `kcidb_test_suite` property. Add validation to ensure all the jobs have a mapping present to avoid missing data submission. This check is to notify test authors trying to enable tests in maestro to include the required property for the mapping in their definition. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add qcs6490-rb3gen2 boot test Signed-off-by: Milosz Wasilewski <[email protected]> * config: chromeos: Enable kselftest-dt on Qualcomm platforms Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * pipeline.yaml: Add one um build for android trees As per request of Android team it will be good to check for breakages UM builds as well. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: use `kind=job` for test suites As part of re-structuring test hierarachy, `Job` model has been introduced for test suite/job nodes. It uses node kind `job`. Update test configurations in `pipeline.yaml` and `jobs-chromeos.yaml` to use `kind=job` to generate job nodes. Signed-off-by: Jeny Sadadia <[email protected]> * config/runtime/kunit.jinja2: provide `kind` value for child tests In case of submitting test hierarchy, child nodes by default inherit `kind` value from parent node. As we are re-structuring test hierarchy, test suit/job nodes will have `kind=job` where its child test nodes will have `kind=test`. Provide `kind` field explicitly to test result hierarchy to preserve different kind value than the parent node. Signed-off-by: Jeny Sadadia <[email protected]> * config/runtime/kunit.jinja2: fix `NameError` Fix the below error in `_submit` method: ``` Traceback (most recent call last): File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 287, in main job.submit(results) File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 138, in submit self._submit(result) File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 265, in _submit return node NameError: name 'node' is not defined ``` Signed-off-by: Jeny Sadadia <[email protected]> * config/runtime/kunit.jinja2: evaluate job node result Evaluate job node result from child node results if `null` result is receive from test result parser. For example nodes such as `fortify`: https://staging.kernelci.org:9000/viewer?node_id=6670ab43d0b7694b399897c4 Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: fix parsing of KUnit log file Handle both compressed(gzip) and plain text log files for getting log excerpt. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: HTTP exception handling for log excerpt Add HTTP exception handling for getting log excerpt data. Signed-off-by: Jeny Sadadia <[email protected]> * config: platforms-chromeos: Add serial delay for some Mediatek platforms Add test_character_delay to the Spherion, Tomato and Steelix platforms to workaround the fact that they're sometimes unable to process serial input fast enough, resulting in mangled commands and consequently flaky test results, as described in https://github.com/kernelci/kernelci-project/issues/366. The right place to do this change would be in the device-type template as described in LAVA's documentation [1]. This overriding in KernelCI is meant only as a temporary workaround to verify whether this fixes the issue. If it does, then we'll do it in LAVA upstream instead. [1] https://docs.lavasoftware.org/lava/debugging.html#differences-in-input-speeds Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * config: chromeos: Enable error-logs kselftest for MediaTek Chromebooks Run the error-logs kselftest on MediaTek Chromebooks. This test is currently under review upstream [1] so, in the meantime, it has been added to the collabora-next tree so it can prove its value by helping to detect issues upstream. [1] https://lore.kernel.org/all/[email protected] Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * config/pipeline.yaml: enable CIP lab Add configuration for LAVA CIP lab. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add baseline-x86 test for CIP Add `baseline-x86-cip` test to be submitted to CIP LAVA lab. Signed-off-by: Jeny Sadadia <[email protected]> * docker-compose.yaml: add `lab-cip` runtime Add runtime argument `lab-cip` to `scheduler-lava` container. This will enable the pipeline to run and submit jobs to CIP LAVA lab. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: enable `job` node submission to KCIDB Parse newly added job node and its child tests for KCIDB submission. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: don't submit `setup` test suite nodes `setup` test suite has been introduced to store test results for environment setup checks before running actual test suite. KCIDB doesn't require `setup` test suite result as long as main test job result is submitted. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: add a check before sending data Check if parsed data is available before sending revision data to KCIDB. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: fix logs Fix log statement about submitting node to KCIDB as we are not sending all the nodes we receive event for to KCIDB. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: handle skipped tests Do not retrieve artifacts or metadata from parent node for skipped tests as in pratice only kernel revision, test runtime and platform will be available for skipped tests. Signed-off-by: Jeny Sadadia <[email protected]> * result_summary/utils: ignore failures on log retrieval Make the script continue running if there was an error fetching a test log. Signed-off-by: Ricardo Cañuelo <[email protected]> * doc/developer-documentation: add docs for enabling new tests Add developer documentation for enabling new tests. Signed-off-by: Jeny Sadadia <[email protected]> * Fix links after docs page migration Documentation has been migrated to the "docs.*" subdomain. Signed-off-by: Paweł Wieczorek <[email protected]> * pipeline.yaml: Add kcidebug fragment Add useful low-overhead debug option to kernel, and test on most x86 boards we have available, with minimal baseline tests. Signed-off-by: Denys Fedoryshchenko <[email protected]> * configs: update gcc-10 to gcc-12 As we upgrade compiler images, we need update gcc version Signed-off-by: Denys Fedoryshchenko <[email protected]> * regression_tracker: workaround: match node paths programatically Don't use 'path' as an api search parameter. The use of lists as query parameters (path is a list) is undefined. Instead, do the filtering in code. Signed-off-by: Ricardo Cañuelo <[email protected]> * config: remove qemu jobs from lab-qualcomm QEMU jobs use container pulled from hub.docker.com. After the lab move pulling from this registry is no longer possible at Qualcomm. This patch disables QEMU jobs from Qualcomm lab. Signed-off-by: Milosz Wasilewski <[email protected]> * validate_yaml.py: Improve pipeline validation Add validation that scheduler entries have matching job entry, this is critical validation, and job entries have at least one entry in the scheduler. Fix one entry detected by this validation Signed-off-by: Denys Fedoryshchenko <[email protected]> * pipeline.yaml: Add broonie(Mark Brown) trees to pipeline It is time to enable even more trees. Signed-off-by: Denys Fedoryshchenko <[email protected]> * validate_yaml.py: Add additional verification for duplicate keys We might have redefined same keys in different yaml files, this tool will ensure consistency of this entries. Signed-off-by: Denys Fedoryshchenko <[email protected]> * validate_yaml.py: Remove path separator Signed-off-by: Denys Fedoryshchenko <[email protected]> * validate_yaml.py: Rename variable to schedules Signed-off-by: Denys Fedoryshchenko <[email protected]> * config/kernelci.toml: update KCIDB origin name As we agreed to refer new KernelCI API & Pipeline as "maestro", use the new name while submitting data to KCIDB. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: update KCI result mapping with KCIDB status Update evaluation of KCIDB status from KCI result. Create 2 categories for error codes: 1. When pre-check tests completed but actual test suite coudln't run - this will have `MISS` status 2. When pre-check tests completed, actual test suite could run but somehow couldn't complete - this will have `ERROR` status Some LAVA error codes can occur at any point of execution such as `Cancelled` and `Test`. Listed such error codes to the most relevant category based on analysis of available results. Signed-off-by: Jeny Sadadia <[email protected]> * result_summary presets: fix presets for v4l2-decoder-conformance Following recent updates to data representation on KernelCI nodes, the top-level nodes for tests now have their kind set to 'job' instead of 'test'. Update the presets for v4l2-decoder-conformance tests accordingly. Signed-off-by: Laura Nao <[email protected]> * result_summary presets: fix output file name in kselftest-acpi preset Signed-off-by: Laura Nao <[email protected]> * config: enable dmabuf-heaps, exec and iommu kselftest suites Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Add kcidb_test_suite * config: result-summary: add generic rule to monitor failures and regression Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: Add rt-stable builds Copy rt-stable builds from legacy KernelCI. Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Major changes to move to new way of writing kbuild jobs * config: pipeline: Add v6.6-rt branch for builds Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: result-summary: add rt-stable kbuilds presets Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: chromeos: Add 'nfs' suffix to KCIDB suite name for baseline-nfs The baseline test is currently run with both ramdisk and nfs rootfs. To distinguish baseline-nfs tests in KCIDB, add an 'nfs' suffix to the KCIDB test suite name. Signed-off-by: Laura Nao <[email protected]> * aks: Add kubernetes kcidb deployment We need file that will manage deployment of kcidb bridge in kubernetes production deployment. Signed-off-by: Denys Fedoryshchenko <[email protected]> * kubernetes: Adjust trigger k8s options Ignore kernelci tree on production, as it is special "staging"-only tree, and read all /config directory, not just default pipeline.yaml. Signed-off-by: Denys Fedoryshchenko <[email protected]> * regression_tracker: bugfix: catch empty search condition Fix _get_last_matching_node(), after the previous change there was an unhandled scenario where nodes may be empty but the function wouldn't return None immediately. Signed-off-by: Ricardo Cañuelo <[email protected]> * config: pipeline: correct the kind of kselftest suites to job Signed-off-by: Muhammad Usama Anjum <[email protected]> * scheduler-chromeos.yaml: Temporarily disable non-essential tast tests As per discussion, we disable temporary tast tests which unlikely will be reviewed. Signed-off-by: Denys Fedoryshchenko <[email protected]> * k8s/aks: Update deployment files 1)Update memory limit, as working with linux sources might require 3Gbyte of RAM. 2)Update config file path 3)Add callback environment variable 4)Update image reference to fresh one Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: pipeline: enable android builds with gcc-12 for all architectures Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: enable android builds with clang-17 for all architectures Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: remove build_variants from android build_configs The build_variants is legacy way to specify the different variants. We have moved to the newer way to specify the variants. Hence remove the build_variants from android build_configs. Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: add android15-6.6-lts branch for build as well The android15-6.6-lts has been included recently in legacy KernelCI: https://github.com/kernelci/kernelci-core/pull/2597 Add the same in newer KernelCI. Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: add blocklist for riscv older kernels for android builds Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: update KCIDB test suite mapping for baseline Use `boot` as KCIDB test suite mapping for all baseline tests. Signed-off-by: Jeny Sadadia <[email protected]> * callback_url: Update config and README As we are moving callback URL to environment variable, updating config and README accordingly. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: pipeline: enable android baseline (boot) testing for arm and arm64 in only allmodconfig Signed-off-by: Muhammad Usama Anjum <[email protected]> * scheduler.py: If event have jobfilter, inject it to the node data When someone generate artificial event with jobfilter, this is likely maintainer trying to repeat job. Treat this accordingly, and inject job filter to job node, so we will run only tests maintainer wants. Signed-off-by: Denys Fedoryshchenko <[email protected]> * lava_callback: migrate to fastapi It will be easier to maintain API and Pipeline, as both will be powered by FastAPI framework. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: chromeos: Update fluster rootfs URL Signed-off-by: Laura Nao <[email protected]> * config: pipeline: fix defconfigs in fragments Signed-off-by: Muhammad Usama Anjum <[email protected]> * kbuild.jinja2: support defconfig as list or str As required in https://github.com/kernelci/kernelci-core/pull/2608 defconfig might be two types. Support it in jinja2 accordingly. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: piepline: add kbuilds of lee-mfd with default defconfigs Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: enable baseline testing for mfd for one board of each arch Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: fix platform sections for Qualcomm and Android schedules Signed-off-by: Paweł Wieczorek <[email protected]> * k8s: Update deployment to uvicorn, as we use fastapi now Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: pipeline: Unblock android runs on lava-collabora Signed-off-by: Muhammad Usama Anjum <[email protected]> * pipeline: Enable preempt-rt cyclictest test Enable the first preempt-rt test, cyclictest in new KernelCI. Enable it on all platforms. Since these are all smoke test there is no point in running them too long. Thus reduce the runtime per test to one minute. This should keep the total preempt-rt runtime roughly in the same time frame. The changes have been ported from Daniel's PR [1]. [1] https://github.com/kernelci/kernelci-core/pull/2397 Signed-off-by: Daniel Wagner <[email protected]> Co-developed-by: Muhammad Usama Anjum <[email protected]> Signed-off-by: Muhammad Usama Anjum <[email protected]> * pipeline: add all the test jobs for all rt-test Add jobs definition of all the rt-tests. Enable cyclicdeadline and rtla tests to run on all targets. The changes have been ported from Daniel's PR [1]. [1] https://github.com/kernelci/kernelci-core/pull/2397 Signed-off-by: Daniel Wagner <[email protected]> Co-developed-by: Muhammad Usama Anjum <[email protected]> Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: add template and test properties for preempt_rt jobs Add template, job add kcidb_test_suite properties for all preempt-rt jobs Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: rename preempt-rt to rt-tests which is correct name of tests The legacy was using preempt-rt name of tests. But the repository has rt-tests name. We must use the same name to merge with execution results coming from other CIs in KCIDB. Suggested-by: Jeny Sadadia <[email protected]> Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: add the correct nfsroot for rt-tests Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: Remove android's deprecated branches It has been confirmed with Todd that we should remove the deprecated branches. Hence remove those branches. Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: run baseline on non-allmodconfig The allmodconfig generates very large kernel image. It cannot be booted on the arm64 and arm targets as tftp errors out that size is too large. Reduce the kernel image size. Use the default defconfig. The same defconfigs have been booting for other trees. Signed-off-by: Muhammad Usama Anjum <[email protected]> * doc: developer-documentation: Update documentation by adding more details - Reorganize some things - Specify how to write different variants by removing old syntax - Give two separate templates for kbuild and test - Try to put more details for new contributors Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes since v1: - Fix type - Apply suggestions from code review * doc/developer-documentation: fix a glitch in enabling new tree section Fix a minor bug in YAML block formatting. Fixes: f5f57de ("doc: developer-documentation: Update documentation by adding more details") Signed-off-by: Jeny Sadadia <[email protected]> * doc/developer-documentation: update a section title Rename a section from "Enabling a new Kernel tree" to "Enabling new KernelCI trees, builds, and tests" as it explains enabling tests as well. Signed-off-by: Jeny Sadadia <[email protected]> * config: use the new `tree:branch` format for rules For cases where we want a single branch to be allowed for a given tree, we can now use the `tree:branch` format in rules. Convert existing rules accordingly. Signed-off-by: Arnaud Ferraris <[email protected]> * config: pipeline: fix improper use of "filters" attribute The `filters` param was used in the legacy system but has been replaced by `rules`, with a different syntax. For Android RISC-V builds, this was used to deny job execution on kernels < 4.19, so let's translate this condition with the rules format, and do a similar change for the `rt-tests`-based jobs. Signed-off-by: Arnaud Ferraris <[email protected]> * config/pipeline.yaml: Fix x86 typo in kcidebug job names The kcidebug jobs that run on MediaTek and Qualcomm platforms should have arm64 in the name rather than x86. Fix the typo. Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * config: pipeline: remove params The parameters are only needed when they are changed or appeneded. Remvoe the parameters which aren't being modified. Signed-off-by: Muhammad Usama Anjum <[email protected]> * validate_yaml.py: Jobs are required to have template parameter Add more validation to config files of mandatory parameters. Signed-off-by: Denys Fedoryshchenko <[email protected]> * validate_yaml.py: Add more job validations Add basic validation, each job must have kind parameter Signed-off-by: Denys Fedoryshchenko <[email protected]> * workflows: Add label on CI check failures Automatically add label so broken PR wont go to staging Signed-off-by: Denys Fedoryshchenko <[email protected]> --------- Signed-off-by: Jeny Sadadia <[email protected]> Signed-off-by: Nícolas F. R. A. Prado <[email protected]> Signed-off-by: Denys Fedoryshchenko <[email protected]> Signed-off-by: Ricardo Cañuelo <[email protected]> Signed-off-by: Helen Koike <[email protected]> Signed-off-by: Arnaud Ferraris <[email protected]> Signed-off-by: Laura Nao <[email protected]> Signed-off-by: Muhammad Usama Anjum <[email protected]> Signed-off-by: Shreeya Patel <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Milosz Wasilewski <[email protected]> Signed-off-by: Paweł Wieczorek <[email protected]> Signed-off-by: Daniel Wagner <[email protected]> Co-authored-by: Jeny Sadadia <[email protected]> Co-authored-by: Nícolas F. R. A. Prado <[email protected]> Co-authored-by: Ricardo Cañuelo <[email protected]> Co-authored-by: Helen Koike <[email protected]> Co-authored-by: Arnaud Ferraris <[email protected]> Co-authored-by: Laura Nao <[email protected]> Co-authored-by: Muhammad Usama Anjum <[email protected]> Co-authored-by: Shreeya Patel <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Milosz Wasilewski <[email protected]> Co-authored-by: Paweł Wieczorek <[email protected]> Co-authored-by: Milosz Wasil…
* src/scheduler: store error message when job fails with "submit_error" It is helpful for debugging to catch error message when scheduler fails to submit job to runtime. Store the error message to `data.error_msg` field. Signed-off-by: Jeny Sadadia <[email protected]> * config: pipeline: Set minimum kernel version for DT kselftest to 6.7 The test was introduced upstream in version 6.7, so no point in trying to run it on earlier versions. Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * configs/: Update volteer device Update volteer devices according lab availability Signed-off-by: Denys Fedoryshchenko <[email protected]> * result_summary templates: detailed output for active/inactive regressions Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: new presets for active regressions Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: update CHANGELOG Signed-off-by: Ricardo Cañuelo <[email protected]> * data: chmod -R 777 ./data/output to avoid permission error Avoid errors like PermissionError: [Errno 13] Permission denied: '/home/kernelci/data/output/stable-rc-boot.html' Signed-off-by: Helen Koike <[email protected]> * result_summary: move code to _get_logs Signed-off-by: Helen Koike <[email protected]> * result_summary: use ThreadPoolExecutor to fetch logs Fetching logs is the bottleneck of the script. Fetch them in parallel with ThreadPoolExecutor. Signed-off-by: Helen Koike <[email protected]> * result_summary: fix result presets stable-rc-build-failures and stable-rc-boot-failures weren't querying specifically for test failures. Signed-off-by: Ricardo Cañuelo <[email protected]> * src/regression_tracker: rework regression detection Take into account "active" and "inactive" regressions when creating them and when processing new passed or failed nodes. When a node passes, it checks if it "inactivates" an existing "active" regression. When a node fails, it checks if it needs to create a new regression or update an existing "active" one. Signed-off-by: Ricardo Cañuelo <[email protected]> * src/regression_tracker: link failed nodes to active regressions When a failed node generates a regression, or when it's a re-run of a run that generated a still active regression, link the node to the regression id. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: support for date ranges for creation and update New command line options to let the user specify date ranges for node creation and last update: --created-from, --created-to, --last-updated-from, --last-updated-to Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: support for date ranges for creation and last update Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: support for extra query parameters in cmdline New command line option: --query-params to specify a set of extra query parameters to complete or override preset parameters. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: html markup in some preset titles Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary changelog: update and move to docs folder Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: move parameter loading and processing to 'setup' Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: refactor and split into two clases (single, run) Split the ResultSummary class into a base class and two child classes: ResultSummarySingle and ResultSummaryLoop (only a stub at this point). Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: WIP initial implementation of the "loop" command Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: huge refactoring Implement "summary" (single-shot) and "monitor" (loop) modes based on preset parameters instead of on the command-line main command. Split the logic into multiple files, move all monitor-specific and summary-specific code to independent files, common code in a separate file. Full of kludges, I don't like how this is looking so far, might consider reimplementing it without any dependencies on pipeline code. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: fix markup and indentation Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: new generic templates for monitor mode Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: examples for "monitor" and "summary" modes Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary changelog: summary and monitor modes Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: fix generic regression report Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: summary: fix last_updated option handling Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: embed css stylesheet in html files Signed-off-by: Ricardo Cañuelo <[email protected]> * regression_tracker: [trivial] make regression active by default Fixup for commit fcb29501663d78920bcd129bd57c36b9af624bc4 If the "result" field is ever made non-optional in the models we can probably remove this. Signed-off-by: Ricardo Cañuelo <[email protected]> * regression_tracker: [trivial] set default empty node sequence Fixup for commit fcb29501663d78920bcd129bd57c36b9af624bc4 If the "node_sequence" field is ever made non-optional in the models we can probably remove this. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: add cmdline option --output-dir Introduce a new command-line option: --output-dir, and rename the old --output to --output-file. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary changelog: command-line options change Signed-off-by: Ricardo Cañuelo <[email protected]> * config: jobs-chromeos: remove meaningless Tast tests Several Tast tests can only fail in the context of KernelCI: * `video.PlatformDecoding.v4l2_state*_vp9_0_svc` do not actually exist, causing the whole test job to fail * `platform.DLCService*` and `platform.Memd` rely on features only present in the downstream Chrom{e,ium}OS kernel (see b/247467814 and b/244479619 for those having access to Google's issue tracker) * `kernel.ConfigVerify.chromeos` relies on downstream-only config options such as `CONFIG_SECURITY_CHROMIUMOS` and other similar ones, and therefore can only fail when testing upstream kernels Signed-off-by: Arnaud Ferraris <[email protected]> * config: scheduler-chromeos: don't execute non-working Tast tests Currently, HEVC-related tests are known to either fail or be skipped as ChromeOS doesn't yet handle hardware decoding of HEVC media. This is expected to be fixed at some point though, so we're keeping the job definitions and only remove the corresponding scheduler entries in order to reinstate those jobs when relevant. Signed-off-by: Arnaud Ferraris <[email protected]> * config: jobs-chromeos: exclude Tast tests known to always fail Several decoder tests always fail on all platforms where they're executed, adding only noise to otherwise useful test results. Disable those for improving the quality of the results. Signed-off-by: Arnaud Ferraris <[email protected]> * config: chromeos: add special case for pre-6.7 qcom codec tests On Qualcomm-based ChromeBooks (`trogdor` being the only model in Collabora's lab), we noticed systematic failures of all `vp9_*_frm_resize` and `vp9_*_sub8x8_sf` tests when using a kernel up to 6.6. With 6.7 and above, all of those tests (except one) now pass. It therefore makes sense to exclude those on pre-6.7 kernels so we don't report known failures and get rid of some noise. This involves "duplicating" affected test jobs (although I did my best to minimize that) and setting rules so only the working variant is executed, based on the version of the kernel being tested. Signed-off-by: Arnaud Ferraris <[email protected]> * lava_callback: Compress the log files to save storage space As storage space in cloud and egress have high costs, better to compress potentially large files. Signed-off-by: Denys Fedoryshchenko <[email protected]> * tests: Add basic yaml validation Add yaml load to figure out earlier issues with yaml Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: chromeos: drop stoneyridge/pineview naming in platforms anchors The "stoneyridge" and "pineview" naming used in the Chromebook platform anchors refers to ChromiumOS specific config fragments, but doesn't necessarily match the actual platform of all the devices listed. Use more generic names to distinguish amd and intel Chromebooks. Signed-off-by: Laura Nao <[email protected]> * config: chromeos: rename test job anchors that use chromeos specific configs Rename test job anchors that use chromeos specific kernel configurations to include the 'chromeos' infix. Signed-off-by: Laura Nao <[email protected]> * config: chromeos: add baseline tests Enable the baseline tests on all the supported Chromebooks with their default kernel configuration. Signed-off-by: Laura Nao <[email protected]> * config: chromeos: drop stoneyridge/pineview naming in job defs The "stoneyridge" and "pineview" naming used in some Chromebook job definitions refers to ChromiumOS specific config fragments, but doesn't necessarily match the actual platforms targeted by the jobs. Replace all occurrences with more generic intel/amd naming. Signed-off-by: Laura Nao <[email protected]> * config: chromeos: drop chromeos infix from baseline jobs Keeping different job names for tests targeting different kernel configs might cause too much duplication. Drop the 'chromeos' infix from the job name for the tests using the chromeos config fragment. Users will be able to filter the results using the data.defconfig/data.config_full fields anyway. Signed-off-by: Laura Nao <[email protected]> * result_summary: post-process results for summary and monitor modes Split the post-processing of nodes to a common function that can be used for both summary and monitor modes. Currently, post-processing involves only the collection of logs. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: update and fix presets and templates Signed-off-by: Ricardo Cañuelo <[email protected]> * doc/result-summary-CHANGELOG: update Signed-off-by: Ricardo Cañuelo <[email protected]> * config/pipeline.yaml: enable 'BayLibre' lab Add lab configuration for BayLibre. Signed-off-by: Jeny Sadadia <[email protected]> * docker-compose.yaml: add `lab-baylibre` runtime Add runtime argument `lab-baylibre` to `scheduler-lava` container. This will enable the pipeline to run and submit jobs to BayLibre. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add `baseline-x86-baylibre` job Add job configuration `baseline-x86-baylibre` for BayLibre. Add scheduler entry as well. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add `baseline-armel-baylibre` job Add job configuration `baseline-armel-baylibre` for BayLibre. Add scheduler entry and platform config as well. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline: enable `android` tree and build configs Monitor linux `android` tree. Add build configs for `android-mainline` branch. Signed-off-by: Helen Koike <[email protected]> * config/pipeline.yaml: add kbuild definitions for android-mainline Add kbuild jobs to compile the kernel for android-mainline branch Signed-off-by: Helen Koike <[email protected]> * config/pipeline.yaml: add entries to schedule to build android-mainline Add entries to `scheduler:` section to run the builds for android-mainline. Signed-off-by: Helen Koike <[email protected]> * result_summary: fix node filter in monitor mode Signed-off-by: Ricardo Cañuelo <[email protected]> * kernelci.toml: set `checkout` node timeout to `180 min` Currently set `60 min` timeout is not enough as some `kbuild` jobs and its sub-tests take around 2 hrs to complete after getting submitted to runtime. Here is an example from staging. See the information for a `checkout` and its child nodes: | id | name | created | updated | timeout | |--------------------------|---------------------|----------------------------|----------------------------|----------------------------| | 661c9d59b60b785eb9fc42b0 | checkout | 2024-04-15T03:22:01.317000 | 2024-04-15T03:51:03.870000 | 2024-04-15T04:22:01.284000 | | 661c9d97b60b785eb9fc42b4 | kbuild-gcc-10-arm64 | 2024-04-15T03:23:03.399000 | 2024-04-15T03:50:15.031000 | 2024-04-15T09:23:03.399000 | | 661ca3f7b60b785eb9fc4ead | baseline-arm64 | 2024-04-15T03:50:15.304000 | 2024-04-15T05:09:45.247000 | 2024-04-15T09:50:15.304000 | Signed-off-by: Jeny Sadadia <[email protected]> * result_summary: add email report capabilities for monitor mode Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: plain text single report templates Signed-off-by: Ricardo Cañuelo <[email protected]> * config: chromeos: add baseline-nfs tests Enable the baseline-nfs tests on all the supported Chromebooks, with both the default and the chromeos kernel configurations. Signed-off-by: Laura Nao <[email protected]> * src/timeout: set `checkout` result For `TIMEOUT` mode, set `checkout` node result to `fail` if its state is `running` as it means code checkout is still going on and node timed-out. Set it to `pass` if its state is any other than `running`. Set `checkout` node result to `pass` if mode is `DONE` as it means once `checkout` has been in `available` or `closing` state and it could successfully complete source code checkout. Signed-off-by: Jeny Sadadia <[email protected]> * regression_tracker: bugfix, failed test with no prior runs Handle the case of a failed test run when it's the first occurence of that test case. Consider it "not a regression" for now, since we're defining a regression as a "breaking point" between a success and a failure. Signed-off-by: Ricardo Cañuelo <[email protected]> * config: platforms-chromeos: fix dalboz device type Due due to a copy/paste mishap, the device type for `asus-CM1400CXA-dalboz` had a trailing `_chromeos`, leading LAVA to fail finding the correct device type, and no job from the new system running on this platform. Signed-off-by: Arnaud Ferraris <[email protected]> * config: jobs-chromes: run Tast tests only on 5.4+ Current ChromeOS images have `ext4` filesystems using options not present in 4.19. Therefore tests cannot run on kernels that old, and this leads to false positives in corrupt device identification, so we should only run those tests on 5.4 and later kernels. Signed-off-by: Arnaud Ferraris <[email protected]> * config: platforms-chromes: drop non-existent platform `hp-x360-12b-ca0500na-n4000-octopus` isn't a device type available in Collabora's LAVA lab, so let's drop its definition. Signed-off-by: Arnaud Ferraris <[email protected]> * config: exclude android tree from kbuild jobs Only Android-specific kbuild jobs should run for this tree, let's not overload our system with unneeded builds. Take this opportunity to limit mediatek kbuilds to 6.1+ as that's the earliest version that has upstream support for at least one of our devices. Signed-off-by: Arnaud Ferraris <[email protected]> * src/timeout: a bug fix in `_submit_lapsed_nodes` Fix a glitch in the code related to setting `checkout` node result. Fixes: 361fc0d ("src/timeout: set `checkout` result") Signed-off-by: Jeny Sadadia <[email protected]> * pipeline.yaml: Update early access FQDN We are moving k8s from eastus to westus3 as it is cheaper Signed-off-by: Denys Fedoryshchenko <[email protected]> * src/tarball: fix `_kdir` in `update_repo` Fix the below error: ``` kernelci-pipeline-tarball | File "/home/kernelci/./pipeline/tarball.py", line 79, in _update_repo kernelci-pipeline-tarball | kernelci.shell_cmd(f"rm -rf {self._kdir}") kernelci-pipeline-tarball | ^^^^^^^^^^ kernelci-pipeline-tarball | AttributeError: 'Tarball' object has no attribute '_kdir' ``` Fixes: 0a2fe9c ("src/patchset.py: Implement Patchset service) Signed-off-by: Jeny Sadadia <[email protected]> * src/timeout: fix method to get child nodes recursively `TimeoutService._get_child_nodes_recursive` is used to get pending child nodes recursively for closing and timed-out nodes. It overwrites the result while being called recursively. Fix the method to make it work properly. Signed-off-by: Jeny Sadadia <[email protected]> * config: pipeline: rename "armel" arch to "arm" `armel` has various meanings depending on the system: for ChromeOS, it is ARMv7, while in Debian it's ARMv{5T,6}. Moreover, this project is *Kernel*CI and the kernel uses `arm` for all 32-bits ARM devices. In order to avoid confusion (including those wondering what the heck does `armel` mean), let's rename `armel` to `arm`. Signed-off-by: Arnaud Ferraris <[email protected]> * config: use per-system arch property where relevant With the new `*arch` fields present in the platform configurations, we don't have to hardcode the architecture strings in some specific cases. Let's adapt the config files so we use `{cros,deb,k}arch` wherever it makes sense. Signed-off-by: Arnaud Ferraris <[email protected]> * src/timeout: set timed-out `checkout` result Set timed-out `checkout` node result to `incomplete` while in `running` state. As it denotes that the node timed-out while checkout was still going on. Also, set error related information i.e. `error_code` and `error_msg`. Signed-off-by: Jeny Sadadia <[email protected]> * src/tarball: update checkout node when update repo fails Tarball updates source code repo and creates tarball. If update repo operation fails even with second attempt, it means it failed to checkout souce code. Hence, update `checkout` node with state `done` state and result `fail`. Also, set appropriate error information to the `data` field. Signed-off-by: Jeny Sadadia <[email protected]> * config: pipeline: enable collabora-next tree and build config Monitor the collabora-next tree. Add build config for the for-kernelci branch. Signed-off-by: Laura Nao <[email protected]> * config: chromeos: enable acpi kselftest on collabora-next tree Run the ACPI kselftest on the for-kernelci branch of the collabora-next tree. See: https://lore.kernel.org/linux-kselftest/[email protected]/T/#t Signed-off-by: Laura Nao <[email protected]> * result_summary: restore missing split_query_params function Restore this function that was accidentally removed during the last refactoring. Signed-off-by: Ricardo Cañuelo <[email protected]> * lava_callback: Don't upload empty files to Azure There is no use for lot of empty files on Azure, that only complicate cleanup. Signed-off-by: Denys Fedoryshchenko <[email protected]> * result_summary presets: unify preset and output names Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: update preset for aferraris Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: new presets for laura.nao Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: fixes and new presets for nfraprado Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: fix arch query parameters Signed-off-by: Ricardo Cañuelo <[email protected]> * k8s: Lot of deployment tested fixes Fixes in yaml files for k8s production deployment. Signed-off-by: Denys Fedoryshchenko <[email protected]> * result-summary presets: Fix build failure and regression monitors Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * result_summary: added debug traces to the monitor Show detailed info of the node filterings in real time. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: fix corner case bug when no logs are found Cover rare case where neither the node nor any of its parents up to the checkout node have any log artifacts. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: refine stable-rc presets Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: add regression info to test reports Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: escape log snippets Signed-off-by: Ricardo Cañuelo <[email protected]> * src: lava_callback: add device ID to node data It can be useful to know the exact device on which a job ran, without having to open the LAVA job page. This is done by querying the device ID from the callback data and appending it to the node data. Signed-off-by: Arnaud Ferraris <[email protected]> * src: lava_callback: upload raw callback data as well Debugging callback issues is complex due to the raw data not being saved after processing. This change ensures we save the callback data as a JSON file in order to ease development. Signed-off-by: Arnaud Ferraris <[email protected]> * DONOTMERGE lava_callback: add debug statements Why the heck doesn't this just work??? Signed-off-by: Arnaud Ferraris <[email protected]> * result_summary_templates: fix error 'node' is undefined The object is named test and not node, so s/node/test Signed-off-by: Helen Koike <[email protected]> * config/runtime/kunit: set architecture info Set architecture field for `kunit` test nodes. If no `arch` argument is supplied, kunit takes `um` (User Mode Linux) as architecture to run tests. Signed-off-by: Jeny Sadadia <[email protected]> * src/timeout: count running child jobs of build nodes Add a method to count running jobs of `kbuild` nodes i.e. jobs being submitted after successful builds. Fox example `baseline` or `tast` jobs. Signed-off-by: Jeny Sadadia <[email protected]> * src/timeout: handle closing `checkout` node differently Usually, `checkout` should be transited to `done` state when all its child nodes are completed. In case of closing `checkout`, take into account running child jobs of build nodes before transiting its state to `done`. Otherwise, `checkout` will be assigned to `done` state even if some child jobs are still running. Signed-off-by: Jeny Sadadia <[email protected]> * src/timeout: handle holdoff reached `checkout` node differently Usually, available `checkout` for which holdoff is reached should be transited to `done` state only when all its child nodes are completed. In case of such `checkout` node, take into account running child jobs of build nodes before transiting its state to `done`. Otherwise, `checkout` will be assigned to `done` state even if some child jobs are still running. Signed-off-by: Jeny Sadadia <[email protected]> * Revert "DONOTMERGE lava_callback: add debug statements" This reverts commit 5ed8218d99840373bbba5830b1976813b52bf4b1. Signed-off-by: Arnaud Ferraris <[email protected]> * Create dependabot.yml * result_summary_templates: make generic-test-failures generic to all results The generic-test-failures templates can be used to show general results just replacing the name "failures" by "results". Makeing it easier to be re-used by communities that want to have pre-sets to list all results of the tests, so: s/generic-test-failures/generic-test-results Signed-off-by: Helen Koike <[email protected]> * result-summary.yaml: add preset to list android build tests Since we now build android, add a preset to allow result-summary.yaml to list all build results from Android tree. Signed-off-by: Helen Koike <[email protected]> * tarball: Implement checkout for specific commit We often need not ToT, but specific commit, implement this. Signed-off-by: Denys Fedoryshchenko <[email protected]> * jobs-chromeos.yaml: Disable module compression for every kernel version Commit d4bbe942098b ("kbuild: remove CONFIG_MODULE_COMPRESS"), introduced in kernel v5.13, substituted CONFIG_MODULE_COMPRESS=n for CONFIG_MODULE_COMPRESS_NONE=y as the way to disable module compression. Since module compression causes "Invalid ELF header magic: != ELF" errors during boot on the ChromeOS base config, add the missing config to disable module compression on kernels > v5.13 as well. Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * src: lava_callback: reduce callback data size The callback data is quite large, especially as it includes the full log which we already upload separately. By dropping it and compressing the whole file with `gzip` we can avoid wasting too much storage space. Signed-off-by: Arnaud Ferraris <[email protected]> * src: lava_callback: don't leak secret token The callback data contains the secret tokens value which shouldn't be leaked. Ensure we drop it from the uploaded data. Signed-off-by: Arnaud Ferraris <[email protected]> * config: platforms-chromeos: use new cros-flash image This ensures we use the new version of the `install-modules` script. Signed-off-by: Arnaud Ferraris <[email protected]> * src: regression_tracker: add the "device" field to regression data This can be helpful. We're not using it as a search param though, as we don't want to narrow down the search that much, using the platform only is better. Signed-off-by: Arnaud Ferraris <[email protected]> * config: result_summary_templates: report device used for job This information is now available, and it can be useful to know the affected device withouth having to look at the LAVA job details. Signed-off-by: Arnaud Ferraris <[email protected]> * kubernetes: Update deployment recipe Update list of labs and add KCI_INSTANCE variable. Signed-off-by: Denys Fedoryshchenko <[email protected]> * lava-callback: Limit threads of lava-callback Due inrush of lava callbacks and slow Azure Files processing, we need to make sure we dont spawn too many threads. Also add hard limit of memory 1Gbyte Signed-off-by: Denys Fedoryshchenko <[email protected]> * result_summary presets: add presetes for fluster test Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Make template generic for all v4l2 tests - Rebase on main * result_summary presets: make the name of fluster test generic Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: enable first fluster test for mt8195-cherry-tomato-r2 Enable first fluster test, AV1-TEST-VECTORS for mt8195-cherry-tomato-r2. Run the test on mainline and next until more trees are added. Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Create generic v4l2-decoder-conformance-job and use anchers from it - Update the rootfs address - Move anchor to _anchor - Update with nitpicks * config: jobs-chromeos: Add kernelci tree for testing purpose Remove this commit before merging. Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: chromeos: Enable cpufreq kselftest Enable cpufreq kselftest on all the trees and branches. Signed-off-by: Shreeya Patel <[email protected]> * result_summary presets: fix preset for kselftest-dt failures monitor Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: new presets for kselftest-cpufreq Signed-off-by: Ricardo Cañuelo <[email protected]> * config: mt8195-cherry-tomato-r2: enable all fluster tests for all branches Add all the trees and branches on which the tests would be ran. Enable all the tests for tomato. Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - The build config cannot be added yet. Just list the trees, it will only use the branches configured in build_configs: - mainline will use master - next will use master - collabora-chromeos-kernel will use for-kernelci - media will use master and fixes - Remove kernelci tree as it was added just for testing purpose * config: mt8183-kukui-jacuzzi-juniper-sku16: enable add all supported fluster tests Signed-off-by: Muhammad Usama Anjum <[email protected]> jacuzzi * config: mt8186-corsola-steelix-sku131072: enable add all supported fluster tests Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: mt8192-asurada-spherion-r0: enable add all supported fluster tests Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Don't specify the platforms manually as they are already mentioned in test-job-arm64-mediatek * config: sc7180-trogdor-kingoftown/lazor-limozeen: enable add all supported fluster tests Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Use test-job-arm64-qualcomm instead and carete separate jobs for qualcomm devices - Don't specify platforms manually as they are already mentioned in test-job-arm64-qualcomm * build(deps): bump uwsgi from 2.0.21 to 2.0.22 in /docker/lava-callback Bumps [uwsgi](https://uwsgi-docs.readthedocs.io/en/latest/) from 2.0.21 to 2.0.22. --- updated-dependencies: - dependency-name: uwsgi dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> * pipeline.yaml: Add stable-rc build variants Add more build variants for stable-rc tree to match legacy system. Signed-off-by: Denys Fedoryshchenko <[email protected]> * result_summary: add error classification Classify errors according to patterns in the logs Signed-off-by: Helen Koike <[email protected]> * result_summary presets: add collabora-chromeos-kernel and media trees for fluster tests Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: Use media-stage instead of media-tree Signed-off-by: Muhammad Usama Anjum <[email protected]> * config/pipeline: enable android branches from legacy Enable all android branches from the legacy system Signed-off-by: Helen Koike <[email protected]> * trigger: Add exclude/include tree list for trigger As we need to restrict list of running kernels on staging, we need to add option allowing that. Also it will be good to exclude staging kernels from production kernel list. So in case of staging we need to run kernels only from tree "kernelci" and sometimes something else, for example "mediatek". Option will look like: --trees kernelci,mediatek or --trees kernelci On production we need to exclude trees kernelci and buggytree: --trees !kernelci,buggytree or just kernelci: --trees !kernelci Purpose of this option is that our compiling capacity is limited, and right now staging and production both compiling very large set of kernels, we need to reduce this amount to drop costs. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: platforms-chromeos: use CrOS R124 files ChromeBooks were upgraded with a new image based on ChromiumOS R124, so we must use those files now. Signed-off-by: Arnaud Ferraris <[email protected]> * config: jobs-chromeos: drop non-existent Tast tests Those were removed between R120 and R124 and therefore cause test failures with the new images. Signed-off-by: Arnaud Ferraris <[email protected]> * result_summary presets: fix acpi kselftest presets We're interested in catching regressions and failures in the both the kselftest-acpi test suites and its test cases. Match the nodes by group in the presets accordingly. Fix template used by the failure monitor preset. Signed-off-by: Laura Nao <[email protected]> * src: update return values of `APIHelper.receive_event_node` `APIHelper.receive_event_node` method is used to receive node data from PubSub event. The method has been updated to return `is_hierarchy` flag as well which represents events related to node hierarchy. Update pipeline services using the method accordingly. Signed-off-by: Jeny Sadadia <[email protected]> * result_summary presets: refine presets for v4l2-decoder-conformance Modify the regression preset to monitor regressions on both the v4l2-decoder-conformance test suites and its test cases, by matching the nodes by group instead of by name. Also, change the failure preset to monitor for all errors caused by runtime errors. Signed-off-by: Laura Nao <[email protected]> * result_summary presets: add summary presets for v4l2-decoder-conformance Add summary presets to fetch regressions and failures on v4l2-decoder-conformance tests. Two of the presets are the same used by the monitor; add one additional preset to fetch all the failures on both the test suites and their test cases. Signed-off-by: Laura Nao <[email protected]> * lava_callback.py: Remove error_code/error_msg on lava-callback Sometimes due congestion node might be set to timeout, but then result might arrive late and we need to use it properly. Signed-off-by: Denys Fedoryshchenko <[email protected]> * result_summary presets: fix dt kselftest presets Fix the dt kselftest preset, just like was done for the acpi one, as the current preset doesn't match the actual results we're interested in. Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * doc/connecting-lab: refine documentation Refine documentation for connecting LAVA labs and submitting jobs to the lab. Signed-off-by: Jeny Sadadia <[email protected]> * lava_callback: Sometimes we get totally invalid log file uploaded Most likely problems lays in threading of flask, and possibly callbacks are getting mixed. This commit attempts to introduce several countermeasures against that. Signed-off-by: Denys Fedoryshchenko <[email protected]> * doc: add `_index.md` page Add index documentation page. Signed-off-by: Jeny Sadadia <[email protected]> * doc: add `pipeline-details` page Move `pipeline-details` documentation from the API repository to this repo to make it close to the source. Signed-off-by: Jeny Sadadia <[email protected]> * doc/connecting-lab: adjust `weight` property Change `weight` property of existing doc page to accommodate with transition of pipeline related docs to pipeline repo. Signed-off-by: Jeny Sadadia <[email protected]> * doc: add `developer-documentation` page Add developer manual documentation. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add lab config for Qualcomm Add an entry to `runtimes` section for Qualcomm lab configurations. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add `baseline-x86` job for qualcomm Add job configuration `baseline-x86-qualcomm` for running baseline job in Qualcomm LAVA lab. Add scheduler entry as well. Signed-off-by: Jeny Sadadia <[email protected]> * docker-compose.yaml: add lab-qualcomm runtime Add runtime argument `lab-qualcomm` to `scheduler-lava` container. This will enable the pipeline to run and submit jobs to Qualcomm LAVA lab. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add `baseline-arm64` job for qualcomm Add job configuration `baseline-arm64-qualcomm` for running baseline job for `arm64` in Qualcomm LAVA lab. Add scheduler entry as well. Signed-off-by: Jeny Sadadia <[email protected]> * pipeline.yaml: Update RISC-V configs 1)rv32 defconfig doesn't exist, remove 2)nommu_k210_defconfig have modules disabled Signed-off-by: Denys Fedoryshchenko <[email protected]> * lava_callback.py: Sanitize lava log data As we use this data in reports, lets remove all non-printable characters as they confuse grafana, browsers and others. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config/runtime/kunit.jinja2: fix result map Fix result map for skipped tests. Initially, API didn't have `skip` available node result in the schema. That's why it was mapped to `None` result. But now API has `skip` result to denote skipped tests. Fix the result mapping accordingly. Signed-off-by: Jeny Sadadia <[email protected]> * config: jobs-chromeos: Add lab-setup fragment Add the lab-setup fragment to the chromebook builds, which contains the architecture independent kernel configs needed to run tests on the platform. Notably this disables IP autoconfig by the kernel. The result of this change is that the 12 seconds boot delay and the consequent deferred probe pending warnings will no longer happen on any platform. Particularly on mt8186-corsola-steelix-sku131072 (due to a different network adapter being used) on which it was still happening. Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * lava_callback: bump up slightly threads number Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: chromeos: enable watchdog reset test on Chromebooks Add a basic test to verify watchdog reset functionality. Enable the test on all ARM64 and AMD x86_64 Chromebooks. For Intel Chromebooks, enable the test only on octopus, as ACPI PM Timer on the other devices has been disabled in coreboot. Signed-off-by: Laura Nao <[email protected]> * src/send_kcidb: use schema version 4.3 Test status `MISS` was added to KCIDB in schema v4.2 and supported by the latest version i.e. v4.3. Hence, use the latest version for submission as API may send a few tests with "MISS" status. Signed-off-by: Jeny Sadadia <[email protected]> * send_kcidb: re-structure code for parsing checkout node Move code for parsing checkout node to a separate method. Add `valid` field to parsed checkout node. It denotes if source code was successfully checked out. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: print more information on invalid data Print details for invalid revision data for the sake of debugging. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: optimize `kcidb` import Remove redundant `kcidb` import and adjust kcidb Client call accordingly. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: remove keys with `None` values KCIDB doesn't allow `None` as field value. Remove all optional fields with `None` value to make it valid data for submitting to KCIDB. Signed-off-by: Jeny Sadadia <[email protected]> * config: add `kcidb_test_suite` property Every KernelCI test will be mapped to a unified test suite for KCIDB data submission. Add `kcidb_test_suite` property to test job definitions in YAML configuration files. The added property will store the mapped KCIDB test suite name. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: parse and submit node test and build data Listen to all the node events with node state `done` or `available` and submit the node to KCIDB. Parse node received from the event and create KCIDB schema compatible object based on type of the node i.e. checkout, build or test. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: set `log_excerpt` for builds and tests Fetch logs from compressed log file(*.log.gz) URL and send last 16*1024 characters for setting `log_excerpt` field for build and test nodes as it is the max allowed length of the KCIDB field. Signed-off-by: Jeny Sadadia <[email protected]> * config/jobs-chromes: add kcidb test suite property for watchdog test Add KCIDB test suite mapping for `watchdog_reset` test. Signed-off-by: Jeny Sadadia <[email protected]> * lava_callback.py: disable log removal from callback data We need it for investigations if we have any critical data loss during log sanitizing. Signed-off-by: Denys Fedoryshchenko <[email protected]> * src/send_kcidb: add error info to build nodes Add error metadata fields such as `error_code` and `error_msg` to `misc` field for build nodes. Signed-off-by: Jeny Sadadia <[email protected]> * result_summary presets: add watchdog-reset presets for mainline/next Add monitor and summary presets to track the results from the watchdog reset test on the mainline and next trees. Signed-off-by: Laura Nao <[email protected]> * pipeline.yaml: Fix fluster rootfs URL Signed-off-by: Denys Fedoryshchenko <[email protected]> * src/send_kcidb: get error metadata for failed/incomplete tests Tweak condition to get error metadata for test nodes. It should get error info for incomplete nodes as well and not just failed nodes. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: send tests only if KCIDB test mapping exists All test suite definitions must have `kcidb_test_suite` property i.e. KCIDB test suite mapping. Only send tests for those the mapping is found. Signed-off-by: Jeny Sadadia <[email protected]> * tests/validate_yaml: add validation for KCIDB mapping To submit KernelCI generated data to KCIDB, it is required to have a mapping for all the job definition with `kcidb_test_suite` property. Add validation to ensure all the jobs have a mapping present to avoid missing data submission. This check is to notify test authors trying to enable tests in maestro to include the required property for the mapping in their definition. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add qcs6490-rb3gen2 boot test Signed-off-by: Milosz Wasilewski <[email protected]> * config: chromeos: Enable kselftest-dt on Qualcomm platforms Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * pipeline.yaml: Add one um build for android trees As per request of Android team it will be good to check for breakages UM builds as well. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: use `kind=job` for test suites As part of re-structuring test hierarachy, `Job` model has been introduced for test suite/job nodes. It uses node kind `job`. Update test configurations in `pipeline.yaml` and `jobs-chromeos.yaml` to use `kind=job` to generate job nodes. Signed-off-by: Jeny Sadadia <[email protected]> * config/runtime/kunit.jinja2: provide `kind` value for child tests In case of submitting test hierarchy, child nodes by default inherit `kind` value from parent node. As we are re-structuring test hierarchy, test suit/job nodes will have `kind=job` where its child test nodes will have `kind=test`. Provide `kind` field explicitly to test result hierarchy to preserve different kind value than the parent node. Signed-off-by: Jeny Sadadia <[email protected]> * config/runtime/kunit.jinja2: fix `NameError` Fix the below error in `_submit` method: ``` Traceback (most recent call last): File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 287, in main job.submit(results) File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 138, in submit self._submit(result) File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 265, in _submit return node NameError: name 'node' is not defined ``` Signed-off-by: Jeny Sadadia <[email protected]> * config/runtime/kunit.jinja2: evaluate job node result Evaluate job node result from child node results if `null` result is receive from test result parser. For example nodes such as `fortify`: https://staging.kernelci.org:9000/viewer?node_id=6670ab43d0b7694b399897c4 Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: fix parsing of KUnit log file Handle both compressed(gzip) and plain text log files for getting log excerpt. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: HTTP exception handling for log excerpt Add HTTP exception handling for getting log excerpt data. Signed-off-by: Jeny Sadadia <[email protected]> * config: platforms-chromeos: Add serial delay for some Mediatek platforms Add test_character_delay to the Spherion, Tomato and Steelix platforms to workaround the fact that they're sometimes unable to process serial input fast enough, resulting in mangled commands and consequently flaky test results, as described in https://github.com/kernelci/kernelci-project/issues/366. The right place to do this change would be in the device-type template as described in LAVA's documentation [1]. This overriding in KernelCI is meant only as a temporary workaround to verify whether this fixes the issue. If it does, then we'll do it in LAVA upstream instead. [1] https://docs.lavasoftware.org/lava/debugging.html#differences-in-input-speeds Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * config: chromeos: Enable error-logs kselftest for MediaTek Chromebooks Run the error-logs kselftest on MediaTek Chromebooks. This test is currently under review upstream [1] so, in the meantime, it has been added to the collabora-next tree so it can prove its value by helping to detect issues upstream. [1] https://lore.kernel.org/all/[email protected] Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * config/pipeline.yaml: enable CIP lab Add configuration for LAVA CIP lab. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add baseline-x86 test for CIP Add `baseline-x86-cip` test to be submitted to CIP LAVA lab. Signed-off-by: Jeny Sadadia <[email protected]> * docker-compose.yaml: add `lab-cip` runtime Add runtime argument `lab-cip` to `scheduler-lava` container. This will enable the pipeline to run and submit jobs to CIP LAVA lab. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: enable `job` node submission to KCIDB Parse newly added job node and its child tests for KCIDB submission. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: don't submit `setup` test suite nodes `setup` test suite has been introduced to store test results for environment setup checks before running actual test suite. KCIDB doesn't require `setup` test suite result as long as main test job result is submitted. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: add a check before sending data Check if parsed data is available before sending revision data to KCIDB. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: fix logs Fix log statement about submitting node to KCIDB as we are not sending all the nodes we receive event for to KCIDB. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: handle skipped tests Do not retrieve artifacts or metadata from parent node for skipped tests as in pratice only kernel revision, test runtime and platform will be available for skipped tests. Signed-off-by: Jeny Sadadia <[email protected]> * result_summary/utils: ignore failures on log retrieval Make the script continue running if there was an error fetching a test log. Signed-off-by: Ricardo Cañuelo <[email protected]> * doc/developer-documentation: add docs for enabling new tests Add developer documentation for enabling new tests. Signed-off-by: Jeny Sadadia <[email protected]> * Fix links after docs page migration Documentation has been migrated to the "docs.*" subdomain. Signed-off-by: Paweł Wieczorek <[email protected]> * pipeline.yaml: Add kcidebug fragment Add useful low-overhead debug option to kernel, and test on most x86 boards we have available, with minimal baseline tests. Signed-off-by: Denys Fedoryshchenko <[email protected]> * configs: update gcc-10 to gcc-12 As we upgrade compiler images, we need update gcc version Signed-off-by: Denys Fedoryshchenko <[email protected]> * regression_tracker: workaround: match node paths programatically Don't use 'path' as an api search parameter. The use of lists as query parameters (path is a list) is undefined. Instead, do the filtering in code. Signed-off-by: Ricardo Cañuelo <[email protected]> * config: remove qemu jobs from lab-qualcomm QEMU jobs use container pulled from hub.docker.com. After the lab move pulling from this registry is no longer possible at Qualcomm. This patch disables QEMU jobs from Qualcomm lab. Signed-off-by: Milosz Wasilewski <[email protected]> * validate_yaml.py: Improve pipeline validation Add validation that scheduler entries have matching job entry, this is critical validation, and job entries have at least one entry in the scheduler. Fix one entry detected by this validation Signed-off-by: Denys Fedoryshchenko <[email protected]> * pipeline.yaml: Add broonie(Mark Brown) trees to pipeline It is time to enable even more trees. Signed-off-by: Denys Fedoryshchenko <[email protected]> * validate_yaml.py: Add additional verification for duplicate keys We might have redefined same keys in different yaml files, this tool will ensure consistency of this entries. Signed-off-by: Denys Fedoryshchenko <[email protected]> * validate_yaml.py: Remove path separator Signed-off-by: Denys Fedoryshchenko <[email protected]> * validate_yaml.py: Rename variable to schedules Signed-off-by: Denys Fedoryshchenko <[email protected]> * config/kernelci.toml: update KCIDB origin name As we agreed to refer new KernelCI API & Pipeline as "maestro", use the new name while submitting data to KCIDB. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: update KCI result mapping with KCIDB status Update evaluation of KCIDB status from KCI result. Create 2 categories for error codes: 1. When pre-check tests completed but actual test suite coudln't run - this will have `MISS` status 2. When pre-check tests completed, actual test suite could run but somehow couldn't complete - this will have `ERROR` status Some LAVA error codes can occur at any point of execution such as `Cancelled` and `Test`. Listed such error codes to the most relevant category based on analysis of available results. Signed-off-by: Jeny Sadadia <[email protected]> * result_summary presets: fix presets for v4l2-decoder-conformance Following recent updates to data representation on KernelCI nodes, the top-level nodes for tests now have their kind set to 'job' instead of 'test'. Update the presets for v4l2-decoder-conformance tests accordingly. Signed-off-by: Laura Nao <[email protected]> * result_summary presets: fix output file name in kselftest-acpi preset Signed-off-by: Laura Nao <[email protected]> * config: enable dmabuf-heaps, exec and iommu kselftest suites Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Add kcidb_test_suite * config: result-summary: add generic rule to monitor failures and regression Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: Add rt-stable builds Copy rt-stable builds from legacy KernelCI. Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Major changes to move to new way of writing kbuild jobs * config: pipeline: Add v6.6-rt branch for builds Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: result-summary: add rt-stable kbuilds presets Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: chromeos: Add 'nfs' suffix to KCIDB suite name for baseline-nfs The baseline test is currently run with both ramdisk and nfs rootfs. To distinguish baseline-nfs tests in KCIDB, add an 'nfs' suffix to the KCIDB test suite name. Signed-off-by: Laura Nao <[email protected]> * aks: Add kubernetes kcidb deployment We need file that will manage deployment of kcidb bridge in kubernetes production deployment. Signed-off-by: Denys Fedoryshchenko <[email protected]> * kubernetes: Adjust trigger k8s options Ignore kernelci tree on production, as it is special "staging"-only tree, and read all /config directory, not just default pipeline.yaml. Signed-off-by: Denys Fedoryshchenko <[email protected]> * regression_tracker: bugfix: catch empty search condition Fix _get_last_matching_node(), after the previous change there was an unhandled scenario where nodes may be empty but the function wouldn't return None immediately. Signed-off-by: Ricardo Cañuelo <[email protected]> * config: pipeline: correct the kind of kselftest suites to job Signed-off-by: Muhammad Usama Anjum <[email protected]> * scheduler-chromeos.yaml: Temporarily disable non-essential tast tests As per discussion, we disable temporary tast tests which unlikely will be reviewed. Signed-off-by: Denys Fedoryshchenko <[email protected]> * k8s/aks: Update deployment files 1)Update memory limit, as working with linux sources might require 3Gbyte of RAM. 2)Update config file path 3)Add callback environment variable 4)Update image reference to fresh one Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: pipeline: enable android builds with gcc-12 for all architectures Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: enable android builds with clang-17 for all architectures Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: remove build_variants from android build_configs The build_variants is legacy way to specify the different variants. We have moved to the newer way to specify the variants. Hence remove the build_variants from android build_configs. Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: add android15-6.6-lts branch for build as well The android15-6.6-lts has been included recently in legacy KernelCI: https://github.com/kernelci/kernelci-core/pull/2597 Add the same in newer KernelCI. Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: add blocklist for riscv older kernels for android builds Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: update KCIDB test suite mapping for baseline Use `boot` as KCIDB test suite mapping for all baseline tests. Signed-off-by: Jeny Sadadia <[email protected]> * callback_url: Update config and README As we are moving callback URL to environment variable, updating config and README accordingly. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: pipeline: enable android baseline (boot) testing for arm and arm64 in only allmodconfig Signed-off-by: Muhammad Usama Anjum <[email protected]> * scheduler.py: If event have jobfilter, inject it to the node data When someone generate artificial event with jobfilter, this is likely maintainer trying to repeat job. Treat this accordingly, and inject job filter to job node, so we will run only tests maintainer wants. Signed-off-by: Denys Fedoryshchenko <[email protected]> * lava_callback: migrate to fastapi It will be easier to maintain API and Pipeline, as both will be powered by FastAPI framework. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: chromeos: Update fluster rootfs URL Signed-off-by: Laura Nao <[email protected]> * config: pipeline: fix defconfigs in fragments Signed-off-by: Muhammad Usama Anjum <[email protected]> * kbuild.jinja2: support defconfig as list or str As required in https://github.com/kernelci/kernelci-core/pull/2608 defconfig might be two types. Support it in jinja2 accordingly. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: piepline: add kbuilds of lee-mfd with default defconfigs Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: enable baseline testing for mfd for one board of each arch Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: fix platform sections for Qualcomm and Android schedules Signed-off-by: Paweł Wieczorek <[email protected]> * k8s: Update deployment to uvicorn, as we use fastapi now Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: pipeline: Unblock android runs on lava-collabora Signed-off-by: Muhammad Usama Anjum <[email protected]> * pipeline: Enable preempt-rt cyclictest test Enable the first preempt-rt test, cyclictest in new KernelCI. Enable it on all platforms. Since these are all smoke test there is no point in running them too long. Thus reduce the runtime per test to one minute. This should keep the total preempt-rt runtime roughly in the same time frame. The changes have been ported from Daniel's PR [1]. [1] https://github.com/kernelci/kernelci-core/pull/2397 Signed-off-by: Daniel Wagner <[email protected]> Co-developed-by: Muhammad Usama Anjum <[email protected]> Signed-off-by: Muhammad Usama Anjum <[email protected]> * pipeline: add all the test jobs for all rt-test Add jobs definition of all the rt-tests. Enable cyclicdeadline and rtla tests to run on all targets. The changes have been ported from Daniel's PR [1]. [1] https://github.com/kernelci/kernelci-core/pull/2397 Signed-off-by: Daniel Wagner <[email protected]> Co-developed-by: Muhammad Usama Anjum <[email protected]> Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: add template and test properties for preempt_rt jobs Add template, job add kcidb_test_suite properties for all preempt-rt jobs Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: rename preempt-rt to rt-tests which is correct name of tests The legacy was using preempt-rt name of tests. But the repository has rt-tests name. We must use the same name to merge with execution results coming from other CIs in KCIDB. Suggested-by: Jeny Sadadia <[email protected]> Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: add the correct nfsroot for rt-tests Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: Remove android's deprecated branches It has been confirmed with Todd that we should remove the deprecated branches. Hence remove those branches. Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: run baseline on non-allmodconfig The allmodconfig generates very large kernel image. It cannot be booted on the arm64 and arm targets as tftp errors out that size is too large. Reduce the kernel image size. Use the default defconfig. The same defconfigs have been booting for other trees. Signed-off-by: Muhammad Usama Anjum <[email protected]> * doc: developer-documentation: Update documentation by adding more details - Reorganize some things - Specify how to write different variants by removing old syntax - Give two separate templates for kbuild and test - Try to put more details for new contributors Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes since v1: - Fix type - Apply suggestions from code review * doc/developer-documentation: fix a glitch in enabling new tree section Fix a minor bug in YAML block formatting. Fixes: f5f57de ("doc: developer-documentation: Update documentation by adding more details") Signed-off-by: Jeny Sadadia <[email protected]> * doc/developer-documentation: update a section title Rename a section from "Enabling a new Kernel tree" to "Enabling new KernelCI trees, builds, and tests" as it explains enabling tests as well. Signed-off-by: Jeny Sadadia <[email protected]> * config: use the new `tree:branch` format for rules For cases where we want a single branch to be allowed for a given tree, we can now use the `tree:branch` format in rules. Convert existing rules accordingly. Signed-off-by: Arnaud Ferraris <[email protected]> * config: pipeline: fix improper use of "filters" attribute The `filters` param was used in the legacy system but has been replaced by `rules`, with a different syntax. For Android RISC-V builds, this was used to deny job execution on kernels < 4.19, so let's translate this condition with the rules format, and do a similar change for the `rt-tests`-based jobs. Signed-off-by: Arnaud Ferraris <[email protected]> * config/pipeline.yaml: Fix x86 typo in kcidebug job names The kcidebug jobs that run on MediaTek and Qualcomm platforms should have arm64 in the name rather than x86. Fix the typo. Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * config: pipeline: remove params The parameters are only needed when they are changed or appeneded. Remvoe the parameters which aren't being modified. Signed-off-by: Muhammad Usama Anjum <[email protected]> * validate_yaml.py: Jobs are required to have template parameter Add more validation to config files of mandatory parameters. Signed-off-by: Denys Fedoryshchenko <[email protected]> * validate_yaml.py: Add more job validations Add basic validation, each job must have kind parameter Signed-off-by: Denys Fedoryshchenko <[email protected]> * workflows: Add label on CI check failures Automatically add label so broken PR wont go to staging Signed-off-by: Denys Fedoryshchenko <[email protected]> --------- Signed-off-by: Jeny Sadadia <[email protected]> Signed-off-by: Nícolas F. R. A. Prado <[email protected]> Signed-off-by: Denys Fedoryshchenko <[email protected]> Signed-off-by: Ricardo Cañuelo <[email protected]> Signed-off-by: Helen Koike <[email protected]> Signed-off-by: Arnaud Ferraris <[email protected]> Signed-off-by: Laura Nao <[email protected]> Signed-off-by: Muhammad Usama Anjum <[email protected]> Signed-off-by: Shreeya Patel <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Milosz Wasilewski <[email protected]> Signed-off-by: Paweł Wieczorek <[email protected]> Signed-off-by: Daniel Wagner <[email protected]> Co-authored-by: Jeny Sadadia <[email protected]> Co-authored-by: Nícolas F. R. A. Prado <[email protected]> Co-authored-by: Ricardo Cañuelo <[email protected]> Co-authored-by: Helen Koike <[email protected]> Co-authored-by: Arnaud Ferraris <[email protected]> Co-authored-by: Laura Nao <[email protected]> Co-authored-by: Muhammad Usama Anjum <[email protected]> Co-authored-by: Shreeya Patel <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Milosz Wasilewski <[email protected]> Co-authored-by: Paweł Wieczorek <[email protected]> Co-authored-by: Milosz Wasilewski <[email protected]> Co-authored-by: Daniel Wagner <[email protected]> Signed-off-by: Denys Fedoryshchenko <[email protected]>
* src/scheduler: store error message when job fails with "submit_error" It is helpful for debugging to catch error message when scheduler fails to submit job to runtime. Store the error message to `data.error_msg` field. Signed-off-by: Jeny Sadadia <[email protected]> * config: pipeline: Set minimum kernel version for DT kselftest to 6.7 The test was introduced upstream in version 6.7, so no point in trying to run it on earlier versions. Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * configs/: Update volteer device Update volteer devices according lab availability Signed-off-by: Denys Fedoryshchenko <[email protected]> * result_summary templates: detailed output for active/inactive regressions Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: new presets for active regressions Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: update CHANGELOG Signed-off-by: Ricardo Cañuelo <[email protected]> * data: chmod -R 777 ./data/output to avoid permission error Avoid errors like PermissionError: [Errno 13] Permission denied: '/home/kernelci/data/output/stable-rc-boot.html' Signed-off-by: Helen Koike <[email protected]> * result_summary: move code to _get_logs Signed-off-by: Helen Koike <[email protected]> * result_summary: use ThreadPoolExecutor to fetch logs Fetching logs is the bottleneck of the script. Fetch them in parallel with ThreadPoolExecutor. Signed-off-by: Helen Koike <[email protected]> * result_summary: fix result presets stable-rc-build-failures and stable-rc-boot-failures weren't querying specifically for test failures. Signed-off-by: Ricardo Cañuelo <[email protected]> * src/regression_tracker: rework regression detection Take into account "active" and "inactive" regressions when creating them and when processing new passed or failed nodes. When a node passes, it checks if it "inactivates" an existing "active" regression. When a node fails, it checks if it needs to create a new regression or update an existing "active" one. Signed-off-by: Ricardo Cañuelo <[email protected]> * src/regression_tracker: link failed nodes to active regressions When a failed node generates a regression, or when it's a re-run of a run that generated a still active regression, link the node to the regression id. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: support for date ranges for creation and update New command line options to let the user specify date ranges for node creation and last update: --created-from, --created-to, --last-updated-from, --last-updated-to Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: support for date ranges for creation and last update Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: support for extra query parameters in cmdline New command line option: --query-params to specify a set of extra query parameters to complete or override preset parameters. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: html markup in some preset titles Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary changelog: update and move to docs folder Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: move parameter loading and processing to 'setup' Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: refactor and split into two clases (single, run) Split the ResultSummary class into a base class and two child classes: ResultSummarySingle and ResultSummaryLoop (only a stub at this point). Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: WIP initial implementation of the "loop" command Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: huge refactoring Implement "summary" (single-shot) and "monitor" (loop) modes based on preset parameters instead of on the command-line main command. Split the logic into multiple files, move all monitor-specific and summary-specific code to independent files, common code in a separate file. Full of kludges, I don't like how this is looking so far, might consider reimplementing it without any dependencies on pipeline code. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: fix markup and indentation Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: new generic templates for monitor mode Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: examples for "monitor" and "summary" modes Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary changelog: summary and monitor modes Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: fix generic regression report Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: summary: fix last_updated option handling Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: embed css stylesheet in html files Signed-off-by: Ricardo Cañuelo <[email protected]> * regression_tracker: [trivial] make regression active by default Fixup for commit fcb29501663d78920bcd129bd57c36b9af624bc4 If the "result" field is ever made non-optional in the models we can probably remove this. Signed-off-by: Ricardo Cañuelo <[email protected]> * regression_tracker: [trivial] set default empty node sequence Fixup for commit fcb29501663d78920bcd129bd57c36b9af624bc4 If the "node_sequence" field is ever made non-optional in the models we can probably remove this. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: add cmdline option --output-dir Introduce a new command-line option: --output-dir, and rename the old --output to --output-file. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary changelog: command-line options change Signed-off-by: Ricardo Cañuelo <[email protected]> * config: jobs-chromeos: remove meaningless Tast tests Several Tast tests can only fail in the context of KernelCI: * `video.PlatformDecoding.v4l2_state*_vp9_0_svc` do not actually exist, causing the whole test job to fail * `platform.DLCService*` and `platform.Memd` rely on features only present in the downstream Chrom{e,ium}OS kernel (see b/247467814 and b/244479619 for those having access to Google's issue tracker) * `kernel.ConfigVerify.chromeos` relies on downstream-only config options such as `CONFIG_SECURITY_CHROMIUMOS` and other similar ones, and therefore can only fail when testing upstream kernels Signed-off-by: Arnaud Ferraris <[email protected]> * config: scheduler-chromeos: don't execute non-working Tast tests Currently, HEVC-related tests are known to either fail or be skipped as ChromeOS doesn't yet handle hardware decoding of HEVC media. This is expected to be fixed at some point though, so we're keeping the job definitions and only remove the corresponding scheduler entries in order to reinstate those jobs when relevant. Signed-off-by: Arnaud Ferraris <[email protected]> * config: jobs-chromeos: exclude Tast tests known to always fail Several decoder tests always fail on all platforms where they're executed, adding only noise to otherwise useful test results. Disable those for improving the quality of the results. Signed-off-by: Arnaud Ferraris <[email protected]> * config: chromeos: add special case for pre-6.7 qcom codec tests On Qualcomm-based ChromeBooks (`trogdor` being the only model in Collabora's lab), we noticed systematic failures of all `vp9_*_frm_resize` and `vp9_*_sub8x8_sf` tests when using a kernel up to 6.6. With 6.7 and above, all of those tests (except one) now pass. It therefore makes sense to exclude those on pre-6.7 kernels so we don't report known failures and get rid of some noise. This involves "duplicating" affected test jobs (although I did my best to minimize that) and setting rules so only the working variant is executed, based on the version of the kernel being tested. Signed-off-by: Arnaud Ferraris <[email protected]> * lava_callback: Compress the log files to save storage space As storage space in cloud and egress have high costs, better to compress potentially large files. Signed-off-by: Denys Fedoryshchenko <[email protected]> * tests: Add basic yaml validation Add yaml load to figure out earlier issues with yaml Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: chromeos: drop stoneyridge/pineview naming in platforms anchors The "stoneyridge" and "pineview" naming used in the Chromebook platform anchors refers to ChromiumOS specific config fragments, but doesn't necessarily match the actual platform of all the devices listed. Use more generic names to distinguish amd and intel Chromebooks. Signed-off-by: Laura Nao <[email protected]> * config: chromeos: rename test job anchors that use chromeos specific configs Rename test job anchors that use chromeos specific kernel configurations to include the 'chromeos' infix. Signed-off-by: Laura Nao <[email protected]> * config: chromeos: add baseline tests Enable the baseline tests on all the supported Chromebooks with their default kernel configuration. Signed-off-by: Laura Nao <[email protected]> * config: chromeos: drop stoneyridge/pineview naming in job defs The "stoneyridge" and "pineview" naming used in some Chromebook job definitions refers to ChromiumOS specific config fragments, but doesn't necessarily match the actual platforms targeted by the jobs. Replace all occurrences with more generic intel/amd naming. Signed-off-by: Laura Nao <[email protected]> * config: chromeos: drop chromeos infix from baseline jobs Keeping different job names for tests targeting different kernel configs might cause too much duplication. Drop the 'chromeos' infix from the job name for the tests using the chromeos config fragment. Users will be able to filter the results using the data.defconfig/data.config_full fields anyway. Signed-off-by: Laura Nao <[email protected]> * result_summary: post-process results for summary and monitor modes Split the post-processing of nodes to a common function that can be used for both summary and monitor modes. Currently, post-processing involves only the collection of logs. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: update and fix presets and templates Signed-off-by: Ricardo Cañuelo <[email protected]> * doc/result-summary-CHANGELOG: update Signed-off-by: Ricardo Cañuelo <[email protected]> * config/pipeline.yaml: enable 'BayLibre' lab Add lab configuration for BayLibre. Signed-off-by: Jeny Sadadia <[email protected]> * docker-compose.yaml: add `lab-baylibre` runtime Add runtime argument `lab-baylibre` to `scheduler-lava` container. This will enable the pipeline to run and submit jobs to BayLibre. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add `baseline-x86-baylibre` job Add job configuration `baseline-x86-baylibre` for BayLibre. Add scheduler entry as well. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add `baseline-armel-baylibre` job Add job configuration `baseline-armel-baylibre` for BayLibre. Add scheduler entry and platform config as well. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline: enable `android` tree and build configs Monitor linux `android` tree. Add build configs for `android-mainline` branch. Signed-off-by: Helen Koike <[email protected]> * config/pipeline.yaml: add kbuild definitions for android-mainline Add kbuild jobs to compile the kernel for android-mainline branch Signed-off-by: Helen Koike <[email protected]> * config/pipeline.yaml: add entries to schedule to build android-mainline Add entries to `scheduler:` section to run the builds for android-mainline. Signed-off-by: Helen Koike <[email protected]> * result_summary: fix node filter in monitor mode Signed-off-by: Ricardo Cañuelo <[email protected]> * kernelci.toml: set `checkout` node timeout to `180 min` Currently set `60 min` timeout is not enough as some `kbuild` jobs and its sub-tests take around 2 hrs to complete after getting submitted to runtime. Here is an example from staging. See the information for a `checkout` and its child nodes: | id | name | created | updated | timeout | |--------------------------|---------------------|----------------------------|----------------------------|----------------------------| | 661c9d59b60b785eb9fc42b0 | checkout | 2024-04-15T03:22:01.317000 | 2024-04-15T03:51:03.870000 | 2024-04-15T04:22:01.284000 | | 661c9d97b60b785eb9fc42b4 | kbuild-gcc-10-arm64 | 2024-04-15T03:23:03.399000 | 2024-04-15T03:50:15.031000 | 2024-04-15T09:23:03.399000 | | 661ca3f7b60b785eb9fc4ead | baseline-arm64 | 2024-04-15T03:50:15.304000 | 2024-04-15T05:09:45.247000 | 2024-04-15T09:50:15.304000 | Signed-off-by: Jeny Sadadia <[email protected]> * result_summary: add email report capabilities for monitor mode Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: plain text single report templates Signed-off-by: Ricardo Cañuelo <[email protected]> * config: chromeos: add baseline-nfs tests Enable the baseline-nfs tests on all the supported Chromebooks, with both the default and the chromeos kernel configurations. Signed-off-by: Laura Nao <[email protected]> * src/timeout: set `checkout` result For `TIMEOUT` mode, set `checkout` node result to `fail` if its state is `running` as it means code checkout is still going on and node timed-out. Set it to `pass` if its state is any other than `running`. Set `checkout` node result to `pass` if mode is `DONE` as it means once `checkout` has been in `available` or `closing` state and it could successfully complete source code checkout. Signed-off-by: Jeny Sadadia <[email protected]> * regression_tracker: bugfix, failed test with no prior runs Handle the case of a failed test run when it's the first occurence of that test case. Consider it "not a regression" for now, since we're defining a regression as a "breaking point" between a success and a failure. Signed-off-by: Ricardo Cañuelo <[email protected]> * config: platforms-chromeos: fix dalboz device type Due due to a copy/paste mishap, the device type for `asus-CM1400CXA-dalboz` had a trailing `_chromeos`, leading LAVA to fail finding the correct device type, and no job from the new system running on this platform. Signed-off-by: Arnaud Ferraris <[email protected]> * config: jobs-chromes: run Tast tests only on 5.4+ Current ChromeOS images have `ext4` filesystems using options not present in 4.19. Therefore tests cannot run on kernels that old, and this leads to false positives in corrupt device identification, so we should only run those tests on 5.4 and later kernels. Signed-off-by: Arnaud Ferraris <[email protected]> * config: platforms-chromes: drop non-existent platform `hp-x360-12b-ca0500na-n4000-octopus` isn't a device type available in Collabora's LAVA lab, so let's drop its definition. Signed-off-by: Arnaud Ferraris <[email protected]> * config: exclude android tree from kbuild jobs Only Android-specific kbuild jobs should run for this tree, let's not overload our system with unneeded builds. Take this opportunity to limit mediatek kbuilds to 6.1+ as that's the earliest version that has upstream support for at least one of our devices. Signed-off-by: Arnaud Ferraris <[email protected]> * src/timeout: a bug fix in `_submit_lapsed_nodes` Fix a glitch in the code related to setting `checkout` node result. Fixes: 361fc0d ("src/timeout: set `checkout` result") Signed-off-by: Jeny Sadadia <[email protected]> * pipeline.yaml: Update early access FQDN We are moving k8s from eastus to westus3 as it is cheaper Signed-off-by: Denys Fedoryshchenko <[email protected]> * src/tarball: fix `_kdir` in `update_repo` Fix the below error: ``` kernelci-pipeline-tarball | File "/home/kernelci/./pipeline/tarball.py", line 79, in _update_repo kernelci-pipeline-tarball | kernelci.shell_cmd(f"rm -rf {self._kdir}") kernelci-pipeline-tarball | ^^^^^^^^^^ kernelci-pipeline-tarball | AttributeError: 'Tarball' object has no attribute '_kdir' ``` Fixes: 0a2fe9c ("src/patchset.py: Implement Patchset service) Signed-off-by: Jeny Sadadia <[email protected]> * src/timeout: fix method to get child nodes recursively `TimeoutService._get_child_nodes_recursive` is used to get pending child nodes recursively for closing and timed-out nodes. It overwrites the result while being called recursively. Fix the method to make it work properly. Signed-off-by: Jeny Sadadia <[email protected]> * config: pipeline: rename "armel" arch to "arm" `armel` has various meanings depending on the system: for ChromeOS, it is ARMv7, while in Debian it's ARMv{5T,6}. Moreover, this project is *Kernel*CI and the kernel uses `arm` for all 32-bits ARM devices. In order to avoid confusion (including those wondering what the heck does `armel` mean), let's rename `armel` to `arm`. Signed-off-by: Arnaud Ferraris <[email protected]> * config: use per-system arch property where relevant With the new `*arch` fields present in the platform configurations, we don't have to hardcode the architecture strings in some specific cases. Let's adapt the config files so we use `{cros,deb,k}arch` wherever it makes sense. Signed-off-by: Arnaud Ferraris <[email protected]> * src/timeout: set timed-out `checkout` result Set timed-out `checkout` node result to `incomplete` while in `running` state. As it denotes that the node timed-out while checkout was still going on. Also, set error related information i.e. `error_code` and `error_msg`. Signed-off-by: Jeny Sadadia <[email protected]> * src/tarball: update checkout node when update repo fails Tarball updates source code repo and creates tarball. If update repo operation fails even with second attempt, it means it failed to checkout souce code. Hence, update `checkout` node with state `done` state and result `fail`. Also, set appropriate error information to the `data` field. Signed-off-by: Jeny Sadadia <[email protected]> * config: pipeline: enable collabora-next tree and build config Monitor the collabora-next tree. Add build config for the for-kernelci branch. Signed-off-by: Laura Nao <[email protected]> * config: chromeos: enable acpi kselftest on collabora-next tree Run the ACPI kselftest on the for-kernelci branch of the collabora-next tree. See: https://lore.kernel.org/linux-kselftest/[email protected]/T/#t Signed-off-by: Laura Nao <[email protected]> * result_summary: restore missing split_query_params function Restore this function that was accidentally removed during the last refactoring. Signed-off-by: Ricardo Cañuelo <[email protected]> * lava_callback: Don't upload empty files to Azure There is no use for lot of empty files on Azure, that only complicate cleanup. Signed-off-by: Denys Fedoryshchenko <[email protected]> * result_summary presets: unify preset and output names Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: update preset for aferraris Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: new presets for laura.nao Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: fixes and new presets for nfraprado Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: fix arch query parameters Signed-off-by: Ricardo Cañuelo <[email protected]> * k8s: Lot of deployment tested fixes Fixes in yaml files for k8s production deployment. Signed-off-by: Denys Fedoryshchenko <[email protected]> * result-summary presets: Fix build failure and regression monitors Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * result_summary: added debug traces to the monitor Show detailed info of the node filterings in real time. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: fix corner case bug when no logs are found Cover rare case where neither the node nor any of its parents up to the checkout node have any log artifacts. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: refine stable-rc presets Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: add regression info to test reports Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: escape log snippets Signed-off-by: Ricardo Cañuelo <[email protected]> * src: lava_callback: add device ID to node data It can be useful to know the exact device on which a job ran, without having to open the LAVA job page. This is done by querying the device ID from the callback data and appending it to the node data. Signed-off-by: Arnaud Ferraris <[email protected]> * src: lava_callback: upload raw callback data as well Debugging callback issues is complex due to the raw data not being saved after processing. This change ensures we save the callback data as a JSON file in order to ease development. Signed-off-by: Arnaud Ferraris <[email protected]> * DONOTMERGE lava_callback: add debug statements Why the heck doesn't this just work??? Signed-off-by: Arnaud Ferraris <[email protected]> * result_summary_templates: fix error 'node' is undefined The object is named test and not node, so s/node/test Signed-off-by: Helen Koike <[email protected]> * config/runtime/kunit: set architecture info Set architecture field for `kunit` test nodes. If no `arch` argument is supplied, kunit takes `um` (User Mode Linux) as architecture to run tests. Signed-off-by: Jeny Sadadia <[email protected]> * src/timeout: count running child jobs of build nodes Add a method to count running jobs of `kbuild` nodes i.e. jobs being submitted after successful builds. Fox example `baseline` or `tast` jobs. Signed-off-by: Jeny Sadadia <[email protected]> * src/timeout: handle closing `checkout` node differently Usually, `checkout` should be transited to `done` state when all its child nodes are completed. In case of closing `checkout`, take into account running child jobs of build nodes before transiting its state to `done`. Otherwise, `checkout` will be assigned to `done` state even if some child jobs are still running. Signed-off-by: Jeny Sadadia <[email protected]> * src/timeout: handle holdoff reached `checkout` node differently Usually, available `checkout` for which holdoff is reached should be transited to `done` state only when all its child nodes are completed. In case of such `checkout` node, take into account running child jobs of build nodes before transiting its state to `done`. Otherwise, `checkout` will be assigned to `done` state even if some child jobs are still running. Signed-off-by: Jeny Sadadia <[email protected]> * Revert "DONOTMERGE lava_callback: add debug statements" This reverts commit 5ed8218d99840373bbba5830b1976813b52bf4b1. Signed-off-by: Arnaud Ferraris <[email protected]> * Create dependabot.yml * result_summary_templates: make generic-test-failures generic to all results The generic-test-failures templates can be used to show general results just replacing the name "failures" by "results". Makeing it easier to be re-used by communities that want to have pre-sets to list all results of the tests, so: s/generic-test-failures/generic-test-results Signed-off-by: Helen Koike <[email protected]> * result-summary.yaml: add preset to list android build tests Since we now build android, add a preset to allow result-summary.yaml to list all build results from Android tree. Signed-off-by: Helen Koike <[email protected]> * tarball: Implement checkout for specific commit We often need not ToT, but specific commit, implement this. Signed-off-by: Denys Fedoryshchenko <[email protected]> * jobs-chromeos.yaml: Disable module compression for every kernel version Commit d4bbe942098b ("kbuild: remove CONFIG_MODULE_COMPRESS"), introduced in kernel v5.13, substituted CONFIG_MODULE_COMPRESS=n for CONFIG_MODULE_COMPRESS_NONE=y as the way to disable module compression. Since module compression causes "Invalid ELF header magic: != ELF" errors during boot on the ChromeOS base config, add the missing config to disable module compression on kernels > v5.13 as well. Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * src: lava_callback: reduce callback data size The callback data is quite large, especially as it includes the full log which we already upload separately. By dropping it and compressing the whole file with `gzip` we can avoid wasting too much storage space. Signed-off-by: Arnaud Ferraris <[email protected]> * src: lava_callback: don't leak secret token The callback data contains the secret tokens value which shouldn't be leaked. Ensure we drop it from the uploaded data. Signed-off-by: Arnaud Ferraris <[email protected]> * config: platforms-chromeos: use new cros-flash image This ensures we use the new version of the `install-modules` script. Signed-off-by: Arnaud Ferraris <[email protected]> * src: regression_tracker: add the "device" field to regression data This can be helpful. We're not using it as a search param though, as we don't want to narrow down the search that much, using the platform only is better. Signed-off-by: Arnaud Ferraris <[email protected]> * config: result_summary_templates: report device used for job This information is now available, and it can be useful to know the affected device withouth having to look at the LAVA job details. Signed-off-by: Arnaud Ferraris <[email protected]> * kubernetes: Update deployment recipe Update list of labs and add KCI_INSTANCE variable. Signed-off-by: Denys Fedoryshchenko <[email protected]> * lava-callback: Limit threads of lava-callback Due inrush of lava callbacks and slow Azure Files processing, we need to make sure we dont spawn too many threads. Also add hard limit of memory 1Gbyte Signed-off-by: Denys Fedoryshchenko <[email protected]> * result_summary presets: add presetes for fluster test Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Make template generic for all v4l2 tests - Rebase on main * result_summary presets: make the name of fluster test generic Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: enable first fluster test for mt8195-cherry-tomato-r2 Enable first fluster test, AV1-TEST-VECTORS for mt8195-cherry-tomato-r2. Run the test on mainline and next until more trees are added. Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Create generic v4l2-decoder-conformance-job and use anchers from it - Update the rootfs address - Move anchor to _anchor - Update with nitpicks * config: jobs-chromeos: Add kernelci tree for testing purpose Remove this commit before merging. Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: chromeos: Enable cpufreq kselftest Enable cpufreq kselftest on all the trees and branches. Signed-off-by: Shreeya Patel <[email protected]> * result_summary presets: fix preset for kselftest-dt failures monitor Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: new presets for kselftest-cpufreq Signed-off-by: Ricardo Cañuelo <[email protected]> * config: mt8195-cherry-tomato-r2: enable all fluster tests for all branches Add all the trees and branches on which the tests would be ran. Enable all the tests for tomato. Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - The build config cannot be added yet. Just list the trees, it will only use the branches configured in build_configs: - mainline will use master - next will use master - collabora-chromeos-kernel will use for-kernelci - media will use master and fixes - Remove kernelci tree as it was added just for testing purpose * config: mt8183-kukui-jacuzzi-juniper-sku16: enable add all supported fluster tests Signed-off-by: Muhammad Usama Anjum <[email protected]> jacuzzi * config: mt8186-corsola-steelix-sku131072: enable add all supported fluster tests Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: mt8192-asurada-spherion-r0: enable add all supported fluster tests Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Don't specify the platforms manually as they are already mentioned in test-job-arm64-mediatek * config: sc7180-trogdor-kingoftown/lazor-limozeen: enable add all supported fluster tests Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Use test-job-arm64-qualcomm instead and carete separate jobs for qualcomm devices - Don't specify platforms manually as they are already mentioned in test-job-arm64-qualcomm * build(deps): bump uwsgi from 2.0.21 to 2.0.22 in /docker/lava-callback Bumps [uwsgi](https://uwsgi-docs.readthedocs.io/en/latest/) from 2.0.21 to 2.0.22. --- updated-dependencies: - dependency-name: uwsgi dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> * pipeline.yaml: Add stable-rc build variants Add more build variants for stable-rc tree to match legacy system. Signed-off-by: Denys Fedoryshchenko <[email protected]> * result_summary: add error classification Classify errors according to patterns in the logs Signed-off-by: Helen Koike <[email protected]> * result_summary presets: add collabora-chromeos-kernel and media trees for fluster tests Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: Use media-stage instead of media-tree Signed-off-by: Muhammad Usama Anjum <[email protected]> * config/pipeline: enable android branches from legacy Enable all android branches from the legacy system Signed-off-by: Helen Koike <[email protected]> * trigger: Add exclude/include tree list for trigger As we need to restrict list of running kernels on staging, we need to add option allowing that. Also it will be good to exclude staging kernels from production kernel list. So in case of staging we need to run kernels only from tree "kernelci" and sometimes something else, for example "mediatek". Option will look like: --trees kernelci,mediatek or --trees kernelci On production we need to exclude trees kernelci and buggytree: --trees !kernelci,buggytree or just kernelci: --trees !kernelci Purpose of this option is that our compiling capacity is limited, and right now staging and production both compiling very large set of kernels, we need to reduce this amount to drop costs. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: platforms-chromeos: use CrOS R124 files ChromeBooks were upgraded with a new image based on ChromiumOS R124, so we must use those files now. Signed-off-by: Arnaud Ferraris <[email protected]> * config: jobs-chromeos: drop non-existent Tast tests Those were removed between R120 and R124 and therefore cause test failures with the new images. Signed-off-by: Arnaud Ferraris <[email protected]> * result_summary presets: fix acpi kselftest presets We're interested in catching regressions and failures in the both the kselftest-acpi test suites and its test cases. Match the nodes by group in the presets accordingly. Fix template used by the failure monitor preset. Signed-off-by: Laura Nao <[email protected]> * src: update return values of `APIHelper.receive_event_node` `APIHelper.receive_event_node` method is used to receive node data from PubSub event. The method has been updated to return `is_hierarchy` flag as well which represents events related to node hierarchy. Update pipeline services using the method accordingly. Signed-off-by: Jeny Sadadia <[email protected]> * result_summary presets: refine presets for v4l2-decoder-conformance Modify the regression preset to monitor regressions on both the v4l2-decoder-conformance test suites and its test cases, by matching the nodes by group instead of by name. Also, change the failure preset to monitor for all errors caused by runtime errors. Signed-off-by: Laura Nao <[email protected]> * result_summary presets: add summary presets for v4l2-decoder-conformance Add summary presets to fetch regressions and failures on v4l2-decoder-conformance tests. Two of the presets are the same used by the monitor; add one additional preset to fetch all the failures on both the test suites and their test cases. Signed-off-by: Laura Nao <[email protected]> * lava_callback.py: Remove error_code/error_msg on lava-callback Sometimes due congestion node might be set to timeout, but then result might arrive late and we need to use it properly. Signed-off-by: Denys Fedoryshchenko <[email protected]> * result_summary presets: fix dt kselftest presets Fix the dt kselftest preset, just like was done for the acpi one, as the current preset doesn't match the actual results we're interested in. Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * doc/connecting-lab: refine documentation Refine documentation for connecting LAVA labs and submitting jobs to the lab. Signed-off-by: Jeny Sadadia <[email protected]> * lava_callback: Sometimes we get totally invalid log file uploaded Most likely problems lays in threading of flask, and possibly callbacks are getting mixed. This commit attempts to introduce several countermeasures against that. Signed-off-by: Denys Fedoryshchenko <[email protected]> * doc: add `_index.md` page Add index documentation page. Signed-off-by: Jeny Sadadia <[email protected]> * doc: add `pipeline-details` page Move `pipeline-details` documentation from the API repository to this repo to make it close to the source. Signed-off-by: Jeny Sadadia <[email protected]> * doc/connecting-lab: adjust `weight` property Change `weight` property of existing doc page to accommodate with transition of pipeline related docs to pipeline repo. Signed-off-by: Jeny Sadadia <[email protected]> * doc: add `developer-documentation` page Add developer manual documentation. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add lab config for Qualcomm Add an entry to `runtimes` section for Qualcomm lab configurations. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add `baseline-x86` job for qualcomm Add job configuration `baseline-x86-qualcomm` for running baseline job in Qualcomm LAVA lab. Add scheduler entry as well. Signed-off-by: Jeny Sadadia <[email protected]> * docker-compose.yaml: add lab-qualcomm runtime Add runtime argument `lab-qualcomm` to `scheduler-lava` container. This will enable the pipeline to run and submit jobs to Qualcomm LAVA lab. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add `baseline-arm64` job for qualcomm Add job configuration `baseline-arm64-qualcomm` for running baseline job for `arm64` in Qualcomm LAVA lab. Add scheduler entry as well. Signed-off-by: Jeny Sadadia <[email protected]> * pipeline.yaml: Update RISC-V configs 1)rv32 defconfig doesn't exist, remove 2)nommu_k210_defconfig have modules disabled Signed-off-by: Denys Fedoryshchenko <[email protected]> * lava_callback.py: Sanitize lava log data As we use this data in reports, lets remove all non-printable characters as they confuse grafana, browsers and others. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config/runtime/kunit.jinja2: fix result map Fix result map for skipped tests. Initially, API didn't have `skip` available node result in the schema. That's why it was mapped to `None` result. But now API has `skip` result to denote skipped tests. Fix the result mapping accordingly. Signed-off-by: Jeny Sadadia <[email protected]> * config: jobs-chromeos: Add lab-setup fragment Add the lab-setup fragment to the chromebook builds, which contains the architecture independent kernel configs needed to run tests on the platform. Notably this disables IP autoconfig by the kernel. The result of this change is that the 12 seconds boot delay and the consequent deferred probe pending warnings will no longer happen on any platform. Particularly on mt8186-corsola-steelix-sku131072 (due to a different network adapter being used) on which it was still happening. Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * lava_callback: bump up slightly threads number Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: chromeos: enable watchdog reset test on Chromebooks Add a basic test to verify watchdog reset functionality. Enable the test on all ARM64 and AMD x86_64 Chromebooks. For Intel Chromebooks, enable the test only on octopus, as ACPI PM Timer on the other devices has been disabled in coreboot. Signed-off-by: Laura Nao <[email protected]> * src/send_kcidb: use schema version 4.3 Test status `MISS` was added to KCIDB in schema v4.2 and supported by the latest version i.e. v4.3. Hence, use the latest version for submission as API may send a few tests with "MISS" status. Signed-off-by: Jeny Sadadia <[email protected]> * send_kcidb: re-structure code for parsing checkout node Move code for parsing checkout node to a separate method. Add `valid` field to parsed checkout node. It denotes if source code was successfully checked out. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: print more information on invalid data Print details for invalid revision data for the sake of debugging. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: optimize `kcidb` import Remove redundant `kcidb` import and adjust kcidb Client call accordingly. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: remove keys with `None` values KCIDB doesn't allow `None` as field value. Remove all optional fields with `None` value to make it valid data for submitting to KCIDB. Signed-off-by: Jeny Sadadia <[email protected]> * config: add `kcidb_test_suite` property Every KernelCI test will be mapped to a unified test suite for KCIDB data submission. Add `kcidb_test_suite` property to test job definitions in YAML configuration files. The added property will store the mapped KCIDB test suite name. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: parse and submit node test and build data Listen to all the node events with node state `done` or `available` and submit the node to KCIDB. Parse node received from the event and create KCIDB schema compatible object based on type of the node i.e. checkout, build or test. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: set `log_excerpt` for builds and tests Fetch logs from compressed log file(*.log.gz) URL and send last 16*1024 characters for setting `log_excerpt` field for build and test nodes as it is the max allowed length of the KCIDB field. Signed-off-by: Jeny Sadadia <[email protected]> * config/jobs-chromes: add kcidb test suite property for watchdog test Add KCIDB test suite mapping for `watchdog_reset` test. Signed-off-by: Jeny Sadadia <[email protected]> * lava_callback.py: disable log removal from callback data We need it for investigations if we have any critical data loss during log sanitizing. Signed-off-by: Denys Fedoryshchenko <[email protected]> * src/send_kcidb: add error info to build nodes Add error metadata fields such as `error_code` and `error_msg` to `misc` field for build nodes. Signed-off-by: Jeny Sadadia <[email protected]> * result_summary presets: add watchdog-reset presets for mainline/next Add monitor and summary presets to track the results from the watchdog reset test on the mainline and next trees. Signed-off-by: Laura Nao <[email protected]> * pipeline.yaml: Fix fluster rootfs URL Signed-off-by: Denys Fedoryshchenko <[email protected]> * src/send_kcidb: get error metadata for failed/incomplete tests Tweak condition to get error metadata for test nodes. It should get error info for incomplete nodes as well and not just failed nodes. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: send tests only if KCIDB test mapping exists All test suite definitions must have `kcidb_test_suite` property i.e. KCIDB test suite mapping. Only send tests for those the mapping is found. Signed-off-by: Jeny Sadadia <[email protected]> * tests/validate_yaml: add validation for KCIDB mapping To submit KernelCI generated data to KCIDB, it is required to have a mapping for all the job definition with `kcidb_test_suite` property. Add validation to ensure all the jobs have a mapping present to avoid missing data submission. This check is to notify test authors trying to enable tests in maestro to include the required property for the mapping in their definition. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add qcs6490-rb3gen2 boot test Signed-off-by: Milosz Wasilewski <[email protected]> * config: chromeos: Enable kselftest-dt on Qualcomm platforms Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * pipeline.yaml: Add one um build for android trees As per request of Android team it will be good to check for breakages UM builds as well. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: use `kind=job` for test suites As part of re-structuring test hierarachy, `Job` model has been introduced for test suite/job nodes. It uses node kind `job`. Update test configurations in `pipeline.yaml` and `jobs-chromeos.yaml` to use `kind=job` to generate job nodes. Signed-off-by: Jeny Sadadia <[email protected]> * config/runtime/kunit.jinja2: provide `kind` value for child tests In case of submitting test hierarchy, child nodes by default inherit `kind` value from parent node. As we are re-structuring test hierarchy, test suit/job nodes will have `kind=job` where its child test nodes will have `kind=test`. Provide `kind` field explicitly to test result hierarchy to preserve different kind value than the parent node. Signed-off-by: Jeny Sadadia <[email protected]> * config/runtime/kunit.jinja2: fix `NameError` Fix the below error in `_submit` method: ``` Traceback (most recent call last): File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 287, in main job.submit(results) File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 138, in submit self._submit(result) File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 265, in _submit return node NameError: name 'node' is not defined ``` Signed-off-by: Jeny Sadadia <[email protected]> * config/runtime/kunit.jinja2: evaluate job node result Evaluate job node result from child node results if `null` result is receive from test result parser. For example nodes such as `fortify`: https://staging.kernelci.org:9000/viewer?node_id=6670ab43d0b7694b399897c4 Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: fix parsing of KUnit log file Handle both compressed(gzip) and plain text log files for getting log excerpt. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: HTTP exception handling for log excerpt Add HTTP exception handling for getting log excerpt data. Signed-off-by: Jeny Sadadia <[email protected]> * config: platforms-chromeos: Add serial delay for some Mediatek platforms Add test_character_delay to the Spherion, Tomato and Steelix platforms to workaround the fact that they're sometimes unable to process serial input fast enough, resulting in mangled commands and consequently flaky test results, as described in https://github.com/kernelci/kernelci-project/issues/366. The right place to do this change would be in the device-type template as described in LAVA's documentation [1]. This overriding in KernelCI is meant only as a temporary workaround to verify whether this fixes the issue. If it does, then we'll do it in LAVA upstream instead. [1] https://docs.lavasoftware.org/lava/debugging.html#differences-in-input-speeds Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * config: chromeos: Enable error-logs kselftest for MediaTek Chromebooks Run the error-logs kselftest on MediaTek Chromebooks. This test is currently under review upstream [1] so, in the meantime, it has been added to the collabora-next tree so it can prove its value by helping to detect issues upstream. [1] https://lore.kernel.org/all/[email protected] Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * config/pipeline.yaml: enable CIP lab Add configuration for LAVA CIP lab. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add baseline-x86 test for CIP Add `baseline-x86-cip` test to be submitted to CIP LAVA lab. Signed-off-by: Jeny Sadadia <[email protected]> * docker-compose.yaml: add `lab-cip` runtime Add runtime argument `lab-cip` to `scheduler-lava` container. This will enable the pipeline to run and submit jobs to CIP LAVA lab. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: enable `job` node submission to KCIDB Parse newly added job node and its child tests for KCIDB submission. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: don't submit `setup` test suite nodes `setup` test suite has been introduced to store test results for environment setup checks before running actual test suite. KCIDB doesn't require `setup` test suite result as long as main test job result is submitted. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: add a check before sending data Check if parsed data is available before sending revision data to KCIDB. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: fix logs Fix log statement about submitting node to KCIDB as we are not sending all the nodes we receive event for to KCIDB. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: handle skipped tests Do not retrieve artifacts or metadata from parent node for skipped tests as in pratice only kernel revision, test runtime and platform will be available for skipped tests. Signed-off-by: Jeny Sadadia <[email protected]> * result_summary/utils: ignore failures on log retrieval Make the script continue running if there was an error fetching a test log. Signed-off-by: Ricardo Cañuelo <[email protected]> * doc/developer-documentation: add docs for enabling new tests Add developer documentation for enabling new tests. Signed-off-by: Jeny Sadadia <[email protected]> * Fix links after docs page migration Documentation has been migrated to the "docs.*" subdomain. Signed-off-by: Paweł Wieczorek <[email protected]> * pipeline.yaml: Add kcidebug fragment Add useful low-overhead debug option to kernel, and test on most x86 boards we have available, with minimal baseline tests. Signed-off-by: Denys Fedoryshchenko <[email protected]> * configs: update gcc-10 to gcc-12 As we upgrade compiler images, we need update gcc version Signed-off-by: Denys Fedoryshchenko <[email protected]> * regression_tracker: workaround: match node paths programatically Don't use 'path' as an api search parameter. The use of lists as query parameters (path is a list) is undefined. Instead, do the filtering in code. Signed-off-by: Ricardo Cañuelo <[email protected]> * config: remove qemu jobs from lab-qualcomm QEMU jobs use container pulled from hub.docker.com. After the lab move pulling from this registry is no longer possible at Qualcomm. This patch disables QEMU jobs from Qualcomm lab. Signed-off-by: Milosz Wasilewski <[email protected]> * validate_yaml.py: Improve pipeline validation Add validation that scheduler entries have matching job entry, this is critical validation, and job entries have at least one entry in the scheduler. Fix one entry detected by this validation Signed-off-by: Denys Fedoryshchenko <[email protected]> * pipeline.yaml: Add broonie(Mark Brown) trees to pipeline It is time to enable even more trees. Signed-off-by: Denys Fedoryshchenko <[email protected]> * validate_yaml.py: Add additional verification for duplicate keys We might have redefined same keys in different yaml files, this tool will ensure consistency of this entries. Signed-off-by: Denys Fedoryshchenko <[email protected]> * validate_yaml.py: Remove path separator Signed-off-by: Denys Fedoryshchenko <[email protected]> * validate_yaml.py: Rename variable to schedules Signed-off-by: Denys Fedoryshchenko <[email protected]> * config/kernelci.toml: update KCIDB origin name As we agreed to refer new KernelCI API & Pipeline as "maestro", use the new name while submitting data to KCIDB. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: update KCI result mapping with KCIDB status Update evaluation of KCIDB status from KCI result. Create 2 categories for error codes: 1. When pre-check tests completed but actual test suite coudln't run - this will have `MISS` status 2. When pre-check tests completed, actual test suite could run but somehow couldn't complete - this will have `ERROR` status Some LAVA error codes can occur at any point of execution such as `Cancelled` and `Test`. Listed such error codes to the most relevant category based on analysis of available results. Signed-off-by: Jeny Sadadia <[email protected]> * result_summary presets: fix presets for v4l2-decoder-conformance Following recent updates to data representation on KernelCI nodes, the top-level nodes for tests now have their kind set to 'job' instead of 'test'. Update the presets for v4l2-decoder-conformance tests accordingly. Signed-off-by: Laura Nao <[email protected]> * result_summary presets: fix output file name in kselftest-acpi preset Signed-off-by: Laura Nao <[email protected]> * config: enable dmabuf-heaps, exec and iommu kselftest suites Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Add kcidb_test_suite * config: result-summary: add generic rule to monitor failures and regression Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: Add rt-stable builds Copy rt-stable builds from legacy KernelCI. Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Major changes to move to new way of writing kbuild jobs * config: pipeline: Add v6.6-rt branch for builds Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: result-summary: add rt-stable kbuilds presets Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: chromeos: Add 'nfs' suffix to KCIDB suite name for baseline-nfs The baseline test is currently run with both ramdisk and nfs rootfs. To distinguish baseline-nfs tests in KCIDB, add an 'nfs' suffix to the KCIDB test suite name. Signed-off-by: Laura Nao <[email protected]> * aks: Add kubernetes kcidb deployment We need file that will manage deployment of kcidb bridge in kubernetes production deployment. Signed-off-by: Denys Fedoryshchenko <[email protected]> * kubernetes: Adjust trigger k8s options Ignore kernelci tree on production, as it is special "staging"-only tree, and read all /config directory, not just default pipeline.yaml. Signed-off-by: Denys Fedoryshchenko <[email protected]> * regression_tracker: bugfix: catch empty search condition Fix _get_last_matching_node(), after the previous change there was an unhandled scenario where nodes may be empty but the function wouldn't return None immediately. Signed-off-by: Ricardo Cañuelo <[email protected]> * config: pipeline: correct the kind of kselftest suites to job Signed-off-by: Muhammad Usama Anjum <[email protected]> * scheduler-chromeos.yaml: Temporarily disable non-essential tast tests As per discussion, we disable temporary tast tests which unlikely will be reviewed. Signed-off-by: Denys Fedoryshchenko <[email protected]> * k8s/aks: Update deployment files 1)Update memory limit, as working with linux sources might require 3Gbyte of RAM. 2)Update config file path 3)Add callback environment variable 4)Update image reference to fresh one Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: pipeline: enable android builds with gcc-12 for all architectures Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: enable android builds with clang-17 for all architectures Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: remove build_variants from android build_configs The build_variants is legacy way to specify the different variants. We have moved to the newer way to specify the variants. Hence remove the build_variants from android build_configs. Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: add android15-6.6-lts branch for build as well The android15-6.6-lts has been included recently in legacy KernelCI: https://github.com/kernelci/kernelci-core/pull/2597 Add the same in newer KernelCI. Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: add blocklist for riscv older kernels for android builds Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: update KCIDB test suite mapping for baseline Use `boot` as KCIDB test suite mapping for all baseline tests. Signed-off-by: Jeny Sadadia <[email protected]> * callback_url: Update config and README As we are moving callback URL to environment variable, updating config and README accordingly. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: pipeline: enable android baseline (boot) testing for arm and arm64 in only allmodconfig Signed-off-by: Muhammad Usama Anjum <[email protected]> * scheduler.py: If event have jobfilter, inject it to the node data When someone generate artificial event with jobfilter, this is likely maintainer trying to repeat job. Treat this accordingly, and inject job filter to job node, so we will run only tests maintainer wants. Signed-off-by: Denys Fedoryshchenko <[email protected]> * lava_callback: migrate to fastapi It will be easier to maintain API and Pipeline, as both will be powered by FastAPI framework. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: chromeos: Update fluster rootfs URL Signed-off-by: Laura Nao <[email protected]> * config: pipeline: fix defconfigs in fragments Signed-off-by: Muhammad Usama Anjum <[email protected]> * kbuild.jinja2: support defconfig as list or str As required in https://github.com/kernelci/kernelci-core/pull/2608 defconfig might be two types. Support it in jinja2 accordingly. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: piepline: add kbuilds of lee-mfd with default defconfigs Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: enable baseline testing for mfd for one board of each arch Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: fix platform sections for Qualcomm and Android schedules Signed-off-by: Paweł Wieczorek <[email protected]> * k8s: Update deployment to uvicorn, as we use fastapi now Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: pipeline: Unblock android runs on lava-collabora Signed-off-by: Muhammad Usama Anjum <[email protected]> * pipeline: Enable preempt-rt cyclictest test Enable the first preempt-rt test, cyclictest in new KernelCI. Enable it on all platforms. Since these are all smoke test there is no point in running them too long. Thus reduce the runtime per test to one minute. This should keep the total preempt-rt runtime roughly in the same time frame. The changes have been ported from Daniel's PR [1]. [1] https://github.com/kernelci/kernelci-core/pull/2397 Signed-off-by: Daniel Wagner <[email protected]> Co-developed-by: Muhammad Usama Anjum <[email protected]> Signed-off-by: Muhammad Usama Anjum <[email protected]> * pipeline: add all the test jobs for all rt-test Add jobs definition of all the rt-tests. Enable cyclicdeadline and rtla tests to run on all targets. The changes have been ported from Daniel's PR [1]. [1] https://github.com/kernelci/kernelci-core/pull/2397 Signed-off-by: Daniel Wagner <[email protected]> Co-developed-by: Muhammad Usama Anjum <[email protected]> Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: add template and test properties for preempt_rt jobs Add template, job add kcidb_test_suite properties for all preempt-rt jobs Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: rename preempt-rt to rt-tests which is correct name of tests The legacy was using preempt-rt name of tests. But the repository has rt-tests name. We must use the same name to merge with execution results coming from other CIs in KCIDB. Suggested-by: Jeny Sadadia <[email protected]> Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: add the correct nfsroot for rt-tests Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: Remove android's deprecated branches It has been confirmed with Todd that we should remove the deprecated branches. Hence remove those branches. Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: run baseline on non-allmodconfig The allmodconfig generates very large kernel image. It cannot be booted on the arm64 and arm targets as tftp errors out that size is too large. Reduce the kernel image size. Use the default defconfig. The same defconfigs have been booting for other trees. Signed-off-by: Muhammad Usama Anjum <[email protected]> * doc: developer-documentation: Update documentation by adding more details - Reorganize some things - Specify how to write different variants by removing old syntax - Give two separate templates for kbuild and test - Try to put more details for new contributors Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes since v1: - Fix type - Apply suggestions from code review * doc/developer-documentation: fix a glitch in enabling new tree section Fix a minor bug in YAML block formatting. Fixes: f5f57de ("doc: developer-documentation: Update documentation by adding more details") Signed-off-by: Jeny Sadadia <[email protected]> * doc/developer-documentation: update a section title Rename a section from "Enabling a new Kernel tree" to "Enabling new KernelCI trees, builds, and tests" as it explains enabling tests as well. Signed-off-by: Jeny Sadadia <[email protected]> * config: use the new `tree:branch` format for rules For cases where we want a single branch to be allowed for a given tree, we can now use the `tree:branch` format in rules. Convert existing rules accordingly. Signed-off-by: Arnaud Ferraris <[email protected]> * config: pipeline: fix improper use of "filters" attribute The `filters` param was used in the legacy system but has been replaced by `rules`, with a different syntax. For Android RISC-V builds, this was used to deny job execution on kernels < 4.19, so let's translate this condition with the rules format, and do a similar change for the `rt-tests`-based jobs. Signed-off-by: Arnaud Ferraris <[email protected]> * config/pipeline.yaml: Fix x86 typo in kcidebug job names The kcidebug jobs that run on MediaTek and Qualcomm platforms should have arm64 in the name rather than x86. Fix the typo. Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * config: pipeline: remove params The parameters are only needed when they are changed or appeneded. Remvoe the parameters which aren't being modified. Signed-off-by: Muhammad Usama Anjum <[email protected]> * validate_yaml.py: Jobs are required to have template parameter Add more validation to config files of mandatory parameters. Signed-off-by: Denys Fedoryshchenko <[email protected]> * validate_yaml.py: Add more job validations Add basic validation, each job must have kind parameter Signed-off-by: Denys Fedoryshchenko <[email protected]> * workflows: Add label on CI check failures Automatically add label so broken PR wont go to staging Signed-off-by: Denys Fedoryshchenko <[email protected]> --------- Signed-off-by: Jeny Sadadia <[email protected]> Signed-off-by: Nícolas F. R. A. Prado <[email protected]> Signed-off-by: Denys Fedoryshchenko <[email protected]> Signed-off-by: Ricardo Cañuelo <[email protected]> Signed-off-by: Helen Koike <[email protected]> Signed-off-by: Arnaud Ferraris <[email protected]> Signed-off-by: Laura Nao <[email protected]> Signed-off-by: Muhammad Usama Anjum <[email protected]> Signed-off-by: Shreeya Patel <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Milosz Wasilewski <[email protected]> Signed-off-by: Paweł Wieczorek <[email protected]> Signed-off-by: Daniel Wagner <[email protected]> Co-authored-by: Jeny Sadadia <[email protected]> Co-authored-by: Nícolas F. R. A. Prado <[email protected]> Co-authored-by: Ricardo Cañuelo <[email protected]> Co-authored-by: Helen Koike <[email protected]> Co-authored-by: Arnaud Ferraris <[email protected]> Co-authored-by: Laura Nao <[email protected]> Co-authored-by: Muhammad Usama Anjum <[email protected]> Co-authored-by: Shreeya Patel <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Milosz Wasilewski <[email protected]> Co-authored-by: Paweł Wieczorek <[email protected]> Co-authored-by: Milosz Wasilewski <[email protected]> Co-authored-by: Daniel Wagner <[email protected]> Signed-off-by: Denys Fedoryshchenko <[email protected]>
* src/scheduler: store error message when job fails with "submit_error" It is helpful for debugging to catch error message when scheduler fails to submit job to runtime. Store the error message to `data.error_msg` field. Signed-off-by: Jeny Sadadia <[email protected]> * config: pipeline: Set minimum kernel version for DT kselftest to 6.7 The test was introduced upstream in version 6.7, so no point in trying to run it on earlier versions. Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * configs/: Update volteer device Update volteer devices according lab availability Signed-off-by: Denys Fedoryshchenko <[email protected]> * result_summary templates: detailed output for active/inactive regressions Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: new presets for active regressions Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: update CHANGELOG Signed-off-by: Ricardo Cañuelo <[email protected]> * data: chmod -R 777 ./data/output to avoid permission error Avoid errors like PermissionError: [Errno 13] Permission denied: '/home/kernelci/data/output/stable-rc-boot.html' Signed-off-by: Helen Koike <[email protected]> * result_summary: move code to _get_logs Signed-off-by: Helen Koike <[email protected]> * result_summary: use ThreadPoolExecutor to fetch logs Fetching logs is the bottleneck of the script. Fetch them in parallel with ThreadPoolExecutor. Signed-off-by: Helen Koike <[email protected]> * result_summary: fix result presets stable-rc-build-failures and stable-rc-boot-failures weren't querying specifically for test failures. Signed-off-by: Ricardo Cañuelo <[email protected]> * src/regression_tracker: rework regression detection Take into account "active" and "inactive" regressions when creating them and when processing new passed or failed nodes. When a node passes, it checks if it "inactivates" an existing "active" regression. When a node fails, it checks if it needs to create a new regression or update an existing "active" one. Signed-off-by: Ricardo Cañuelo <[email protected]> * src/regression_tracker: link failed nodes to active regressions When a failed node generates a regression, or when it's a re-run of a run that generated a still active regression, link the node to the regression id. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: support for date ranges for creation and update New command line options to let the user specify date ranges for node creation and last update: --created-from, --created-to, --last-updated-from, --last-updated-to Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: support for date ranges for creation and last update Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: support for extra query parameters in cmdline New command line option: --query-params to specify a set of extra query parameters to complete or override preset parameters. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: html markup in some preset titles Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary changelog: update and move to docs folder Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: move parameter loading and processing to 'setup' Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: refactor and split into two clases (single, run) Split the ResultSummary class into a base class and two child classes: ResultSummarySingle and ResultSummaryLoop (only a stub at this point). Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: WIP initial implementation of the "loop" command Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: huge refactoring Implement "summary" (single-shot) and "monitor" (loop) modes based on preset parameters instead of on the command-line main command. Split the logic into multiple files, move all monitor-specific and summary-specific code to independent files, common code in a separate file. Full of kludges, I don't like how this is looking so far, might consider reimplementing it without any dependencies on pipeline code. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: fix markup and indentation Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: new generic templates for monitor mode Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: examples for "monitor" and "summary" modes Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary changelog: summary and monitor modes Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: fix generic regression report Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: summary: fix last_updated option handling Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: embed css stylesheet in html files Signed-off-by: Ricardo Cañuelo <[email protected]> * regression_tracker: [trivial] make regression active by default Fixup for commit fcb29501663d78920bcd129bd57c36b9af624bc4 If the "result" field is ever made non-optional in the models we can probably remove this. Signed-off-by: Ricardo Cañuelo <[email protected]> * regression_tracker: [trivial] set default empty node sequence Fixup for commit fcb29501663d78920bcd129bd57c36b9af624bc4 If the "node_sequence" field is ever made non-optional in the models we can probably remove this. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: add cmdline option --output-dir Introduce a new command-line option: --output-dir, and rename the old --output to --output-file. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary changelog: command-line options change Signed-off-by: Ricardo Cañuelo <[email protected]> * config: jobs-chromeos: remove meaningless Tast tests Several Tast tests can only fail in the context of KernelCI: * `video.PlatformDecoding.v4l2_state*_vp9_0_svc` do not actually exist, causing the whole test job to fail * `platform.DLCService*` and `platform.Memd` rely on features only present in the downstream Chrom{e,ium}OS kernel (see b/247467814 and b/244479619 for those having access to Google's issue tracker) * `kernel.ConfigVerify.chromeos` relies on downstream-only config options such as `CONFIG_SECURITY_CHROMIUMOS` and other similar ones, and therefore can only fail when testing upstream kernels Signed-off-by: Arnaud Ferraris <[email protected]> * config: scheduler-chromeos: don't execute non-working Tast tests Currently, HEVC-related tests are known to either fail or be skipped as ChromeOS doesn't yet handle hardware decoding of HEVC media. This is expected to be fixed at some point though, so we're keeping the job definitions and only remove the corresponding scheduler entries in order to reinstate those jobs when relevant. Signed-off-by: Arnaud Ferraris <[email protected]> * config: jobs-chromeos: exclude Tast tests known to always fail Several decoder tests always fail on all platforms where they're executed, adding only noise to otherwise useful test results. Disable those for improving the quality of the results. Signed-off-by: Arnaud Ferraris <[email protected]> * config: chromeos: add special case for pre-6.7 qcom codec tests On Qualcomm-based ChromeBooks (`trogdor` being the only model in Collabora's lab), we noticed systematic failures of all `vp9_*_frm_resize` and `vp9_*_sub8x8_sf` tests when using a kernel up to 6.6. With 6.7 and above, all of those tests (except one) now pass. It therefore makes sense to exclude those on pre-6.7 kernels so we don't report known failures and get rid of some noise. This involves "duplicating" affected test jobs (although I did my best to minimize that) and setting rules so only the working variant is executed, based on the version of the kernel being tested. Signed-off-by: Arnaud Ferraris <[email protected]> * lava_callback: Compress the log files to save storage space As storage space in cloud and egress have high costs, better to compress potentially large files. Signed-off-by: Denys Fedoryshchenko <[email protected]> * tests: Add basic yaml validation Add yaml load to figure out earlier issues with yaml Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: chromeos: drop stoneyridge/pineview naming in platforms anchors The "stoneyridge" and "pineview" naming used in the Chromebook platform anchors refers to ChromiumOS specific config fragments, but doesn't necessarily match the actual platform of all the devices listed. Use more generic names to distinguish amd and intel Chromebooks. Signed-off-by: Laura Nao <[email protected]> * config: chromeos: rename test job anchors that use chromeos specific configs Rename test job anchors that use chromeos specific kernel configurations to include the 'chromeos' infix. Signed-off-by: Laura Nao <[email protected]> * config: chromeos: add baseline tests Enable the baseline tests on all the supported Chromebooks with their default kernel configuration. Signed-off-by: Laura Nao <[email protected]> * config: chromeos: drop stoneyridge/pineview naming in job defs The "stoneyridge" and "pineview" naming used in some Chromebook job definitions refers to ChromiumOS specific config fragments, but doesn't necessarily match the actual platforms targeted by the jobs. Replace all occurrences with more generic intel/amd naming. Signed-off-by: Laura Nao <[email protected]> * config: chromeos: drop chromeos infix from baseline jobs Keeping different job names for tests targeting different kernel configs might cause too much duplication. Drop the 'chromeos' infix from the job name for the tests using the chromeos config fragment. Users will be able to filter the results using the data.defconfig/data.config_full fields anyway. Signed-off-by: Laura Nao <[email protected]> * result_summary: post-process results for summary and monitor modes Split the post-processing of nodes to a common function that can be used for both summary and monitor modes. Currently, post-processing involves only the collection of logs. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: update and fix presets and templates Signed-off-by: Ricardo Cañuelo <[email protected]> * doc/result-summary-CHANGELOG: update Signed-off-by: Ricardo Cañuelo <[email protected]> * config/pipeline.yaml: enable 'BayLibre' lab Add lab configuration for BayLibre. Signed-off-by: Jeny Sadadia <[email protected]> * docker-compose.yaml: add `lab-baylibre` runtime Add runtime argument `lab-baylibre` to `scheduler-lava` container. This will enable the pipeline to run and submit jobs to BayLibre. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add `baseline-x86-baylibre` job Add job configuration `baseline-x86-baylibre` for BayLibre. Add scheduler entry as well. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add `baseline-armel-baylibre` job Add job configuration `baseline-armel-baylibre` for BayLibre. Add scheduler entry and platform config as well. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline: enable `android` tree and build configs Monitor linux `android` tree. Add build configs for `android-mainline` branch. Signed-off-by: Helen Koike <[email protected]> * config/pipeline.yaml: add kbuild definitions for android-mainline Add kbuild jobs to compile the kernel for android-mainline branch Signed-off-by: Helen Koike <[email protected]> * config/pipeline.yaml: add entries to schedule to build android-mainline Add entries to `scheduler:` section to run the builds for android-mainline. Signed-off-by: Helen Koike <[email protected]> * result_summary: fix node filter in monitor mode Signed-off-by: Ricardo Cañuelo <[email protected]> * kernelci.toml: set `checkout` node timeout to `180 min` Currently set `60 min` timeout is not enough as some `kbuild` jobs and its sub-tests take around 2 hrs to complete after getting submitted to runtime. Here is an example from staging. See the information for a `checkout` and its child nodes: | id | name | created | updated | timeout | |--------------------------|---------------------|----------------------------|----------------------------|----------------------------| | 661c9d59b60b785eb9fc42b0 | checkout | 2024-04-15T03:22:01.317000 | 2024-04-15T03:51:03.870000 | 2024-04-15T04:22:01.284000 | | 661c9d97b60b785eb9fc42b4 | kbuild-gcc-10-arm64 | 2024-04-15T03:23:03.399000 | 2024-04-15T03:50:15.031000 | 2024-04-15T09:23:03.399000 | | 661ca3f7b60b785eb9fc4ead | baseline-arm64 | 2024-04-15T03:50:15.304000 | 2024-04-15T05:09:45.247000 | 2024-04-15T09:50:15.304000 | Signed-off-by: Jeny Sadadia <[email protected]> * result_summary: add email report capabilities for monitor mode Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: plain text single report templates Signed-off-by: Ricardo Cañuelo <[email protected]> * config: chromeos: add baseline-nfs tests Enable the baseline-nfs tests on all the supported Chromebooks, with both the default and the chromeos kernel configurations. Signed-off-by: Laura Nao <[email protected]> * src/timeout: set `checkout` result For `TIMEOUT` mode, set `checkout` node result to `fail` if its state is `running` as it means code checkout is still going on and node timed-out. Set it to `pass` if its state is any other than `running`. Set `checkout` node result to `pass` if mode is `DONE` as it means once `checkout` has been in `available` or `closing` state and it could successfully complete source code checkout. Signed-off-by: Jeny Sadadia <[email protected]> * regression_tracker: bugfix, failed test with no prior runs Handle the case of a failed test run when it's the first occurence of that test case. Consider it "not a regression" for now, since we're defining a regression as a "breaking point" between a success and a failure. Signed-off-by: Ricardo Cañuelo <[email protected]> * config: platforms-chromeos: fix dalboz device type Due due to a copy/paste mishap, the device type for `asus-CM1400CXA-dalboz` had a trailing `_chromeos`, leading LAVA to fail finding the correct device type, and no job from the new system running on this platform. Signed-off-by: Arnaud Ferraris <[email protected]> * config: jobs-chromes: run Tast tests only on 5.4+ Current ChromeOS images have `ext4` filesystems using options not present in 4.19. Therefore tests cannot run on kernels that old, and this leads to false positives in corrupt device identification, so we should only run those tests on 5.4 and later kernels. Signed-off-by: Arnaud Ferraris <[email protected]> * config: platforms-chromes: drop non-existent platform `hp-x360-12b-ca0500na-n4000-octopus` isn't a device type available in Collabora's LAVA lab, so let's drop its definition. Signed-off-by: Arnaud Ferraris <[email protected]> * config: exclude android tree from kbuild jobs Only Android-specific kbuild jobs should run for this tree, let's not overload our system with unneeded builds. Take this opportunity to limit mediatek kbuilds to 6.1+ as that's the earliest version that has upstream support for at least one of our devices. Signed-off-by: Arnaud Ferraris <[email protected]> * src/timeout: a bug fix in `_submit_lapsed_nodes` Fix a glitch in the code related to setting `checkout` node result. Fixes: 361fc0d ("src/timeout: set `checkout` result") Signed-off-by: Jeny Sadadia <[email protected]> * pipeline.yaml: Update early access FQDN We are moving k8s from eastus to westus3 as it is cheaper Signed-off-by: Denys Fedoryshchenko <[email protected]> * src/tarball: fix `_kdir` in `update_repo` Fix the below error: ``` kernelci-pipeline-tarball | File "/home/kernelci/./pipeline/tarball.py", line 79, in _update_repo kernelci-pipeline-tarball | kernelci.shell_cmd(f"rm -rf {self._kdir}") kernelci-pipeline-tarball | ^^^^^^^^^^ kernelci-pipeline-tarball | AttributeError: 'Tarball' object has no attribute '_kdir' ``` Fixes: 0a2fe9c ("src/patchset.py: Implement Patchset service) Signed-off-by: Jeny Sadadia <[email protected]> * src/timeout: fix method to get child nodes recursively `TimeoutService._get_child_nodes_recursive` is used to get pending child nodes recursively for closing and timed-out nodes. It overwrites the result while being called recursively. Fix the method to make it work properly. Signed-off-by: Jeny Sadadia <[email protected]> * config: pipeline: rename "armel" arch to "arm" `armel` has various meanings depending on the system: for ChromeOS, it is ARMv7, while in Debian it's ARMv{5T,6}. Moreover, this project is *Kernel*CI and the kernel uses `arm` for all 32-bits ARM devices. In order to avoid confusion (including those wondering what the heck does `armel` mean), let's rename `armel` to `arm`. Signed-off-by: Arnaud Ferraris <[email protected]> * config: use per-system arch property where relevant With the new `*arch` fields present in the platform configurations, we don't have to hardcode the architecture strings in some specific cases. Let's adapt the config files so we use `{cros,deb,k}arch` wherever it makes sense. Signed-off-by: Arnaud Ferraris <[email protected]> * src/timeout: set timed-out `checkout` result Set timed-out `checkout` node result to `incomplete` while in `running` state. As it denotes that the node timed-out while checkout was still going on. Also, set error related information i.e. `error_code` and `error_msg`. Signed-off-by: Jeny Sadadia <[email protected]> * src/tarball: update checkout node when update repo fails Tarball updates source code repo and creates tarball. If update repo operation fails even with second attempt, it means it failed to checkout souce code. Hence, update `checkout` node with state `done` state and result `fail`. Also, set appropriate error information to the `data` field. Signed-off-by: Jeny Sadadia <[email protected]> * config: pipeline: enable collabora-next tree and build config Monitor the collabora-next tree. Add build config for the for-kernelci branch. Signed-off-by: Laura Nao <[email protected]> * config: chromeos: enable acpi kselftest on collabora-next tree Run the ACPI kselftest on the for-kernelci branch of the collabora-next tree. See: https://lore.kernel.org/linux-kselftest/[email protected]/T/#t Signed-off-by: Laura Nao <[email protected]> * result_summary: restore missing split_query_params function Restore this function that was accidentally removed during the last refactoring. Signed-off-by: Ricardo Cañuelo <[email protected]> * lava_callback: Don't upload empty files to Azure There is no use for lot of empty files on Azure, that only complicate cleanup. Signed-off-by: Denys Fedoryshchenko <[email protected]> * result_summary presets: unify preset and output names Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: update preset for aferraris Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: new presets for laura.nao Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: fixes and new presets for nfraprado Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: fix arch query parameters Signed-off-by: Ricardo Cañuelo <[email protected]> * k8s: Lot of deployment tested fixes Fixes in yaml files for k8s production deployment. Signed-off-by: Denys Fedoryshchenko <[email protected]> * result-summary presets: Fix build failure and regression monitors Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * result_summary: added debug traces to the monitor Show detailed info of the node filterings in real time. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: fix corner case bug when no logs are found Cover rare case where neither the node nor any of its parents up to the checkout node have any log artifacts. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: refine stable-rc presets Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: add regression info to test reports Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: escape log snippets Signed-off-by: Ricardo Cañuelo <[email protected]> * src: lava_callback: add device ID to node data It can be useful to know the exact device on which a job ran, without having to open the LAVA job page. This is done by querying the device ID from the callback data and appending it to the node data. Signed-off-by: Arnaud Ferraris <[email protected]> * src: lava_callback: upload raw callback data as well Debugging callback issues is complex due to the raw data not being saved after processing. This change ensures we save the callback data as a JSON file in order to ease development. Signed-off-by: Arnaud Ferraris <[email protected]> * DONOTMERGE lava_callback: add debug statements Why the heck doesn't this just work??? Signed-off-by: Arnaud Ferraris <[email protected]> * result_summary_templates: fix error 'node' is undefined The object is named test and not node, so s/node/test Signed-off-by: Helen Koike <[email protected]> * config/runtime/kunit: set architecture info Set architecture field for `kunit` test nodes. If no `arch` argument is supplied, kunit takes `um` (User Mode Linux) as architecture to run tests. Signed-off-by: Jeny Sadadia <[email protected]> * src/timeout: count running child jobs of build nodes Add a method to count running jobs of `kbuild` nodes i.e. jobs being submitted after successful builds. Fox example `baseline` or `tast` jobs. Signed-off-by: Jeny Sadadia <[email protected]> * src/timeout: handle closing `checkout` node differently Usually, `checkout` should be transited to `done` state when all its child nodes are completed. In case of closing `checkout`, take into account running child jobs of build nodes before transiting its state to `done`. Otherwise, `checkout` will be assigned to `done` state even if some child jobs are still running. Signed-off-by: Jeny Sadadia <[email protected]> * src/timeout: handle holdoff reached `checkout` node differently Usually, available `checkout` for which holdoff is reached should be transited to `done` state only when all its child nodes are completed. In case of such `checkout` node, take into account running child jobs of build nodes before transiting its state to `done`. Otherwise, `checkout` will be assigned to `done` state even if some child jobs are still running. Signed-off-by: Jeny Sadadia <[email protected]> * Revert "DONOTMERGE lava_callback: add debug statements" This reverts commit 5ed8218d99840373bbba5830b1976813b52bf4b1. Signed-off-by: Arnaud Ferraris <[email protected]> * Create dependabot.yml * result_summary_templates: make generic-test-failures generic to all results The generic-test-failures templates can be used to show general results just replacing the name "failures" by "results". Makeing it easier to be re-used by communities that want to have pre-sets to list all results of the tests, so: s/generic-test-failures/generic-test-results Signed-off-by: Helen Koike <[email protected]> * result-summary.yaml: add preset to list android build tests Since we now build android, add a preset to allow result-summary.yaml to list all build results from Android tree. Signed-off-by: Helen Koike <[email protected]> * tarball: Implement checkout for specific commit We often need not ToT, but specific commit, implement this. Signed-off-by: Denys Fedoryshchenko <[email protected]> * jobs-chromeos.yaml: Disable module compression for every kernel version Commit d4bbe942098b ("kbuild: remove CONFIG_MODULE_COMPRESS"), introduced in kernel v5.13, substituted CONFIG_MODULE_COMPRESS=n for CONFIG_MODULE_COMPRESS_NONE=y as the way to disable module compression. Since module compression causes "Invalid ELF header magic: != ELF" errors during boot on the ChromeOS base config, add the missing config to disable module compression on kernels > v5.13 as well. Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * src: lava_callback: reduce callback data size The callback data is quite large, especially as it includes the full log which we already upload separately. By dropping it and compressing the whole file with `gzip` we can avoid wasting too much storage space. Signed-off-by: Arnaud Ferraris <[email protected]> * src: lava_callback: don't leak secret token The callback data contains the secret tokens value which shouldn't be leaked. Ensure we drop it from the uploaded data. Signed-off-by: Arnaud Ferraris <[email protected]> * config: platforms-chromeos: use new cros-flash image This ensures we use the new version of the `install-modules` script. Signed-off-by: Arnaud Ferraris <[email protected]> * src: regression_tracker: add the "device" field to regression data This can be helpful. We're not using it as a search param though, as we don't want to narrow down the search that much, using the platform only is better. Signed-off-by: Arnaud Ferraris <[email protected]> * config: result_summary_templates: report device used for job This information is now available, and it can be useful to know the affected device withouth having to look at the LAVA job details. Signed-off-by: Arnaud Ferraris <[email protected]> * kubernetes: Update deployment recipe Update list of labs and add KCI_INSTANCE variable. Signed-off-by: Denys Fedoryshchenko <[email protected]> * lava-callback: Limit threads of lava-callback Due inrush of lava callbacks and slow Azure Files processing, we need to make sure we dont spawn too many threads. Also add hard limit of memory 1Gbyte Signed-off-by: Denys Fedoryshchenko <[email protected]> * result_summary presets: add presetes for fluster test Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Make template generic for all v4l2 tests - Rebase on main * result_summary presets: make the name of fluster test generic Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: enable first fluster test for mt8195-cherry-tomato-r2 Enable first fluster test, AV1-TEST-VECTORS for mt8195-cherry-tomato-r2. Run the test on mainline and next until more trees are added. Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Create generic v4l2-decoder-conformance-job and use anchers from it - Update the rootfs address - Move anchor to _anchor - Update with nitpicks * config: jobs-chromeos: Add kernelci tree for testing purpose Remove this commit before merging. Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: chromeos: Enable cpufreq kselftest Enable cpufreq kselftest on all the trees and branches. Signed-off-by: Shreeya Patel <[email protected]> * result_summary presets: fix preset for kselftest-dt failures monitor Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: new presets for kselftest-cpufreq Signed-off-by: Ricardo Cañuelo <[email protected]> * config: mt8195-cherry-tomato-r2: enable all fluster tests for all branches Add all the trees and branches on which the tests would be ran. Enable all the tests for tomato. Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - The build config cannot be added yet. Just list the trees, it will only use the branches configured in build_configs: - mainline will use master - next will use master - collabora-chromeos-kernel will use for-kernelci - media will use master and fixes - Remove kernelci tree as it was added just for testing purpose * config: mt8183-kukui-jacuzzi-juniper-sku16: enable add all supported fluster tests Signed-off-by: Muhammad Usama Anjum <[email protected]> jacuzzi * config: mt8186-corsola-steelix-sku131072: enable add all supported fluster tests Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: mt8192-asurada-spherion-r0: enable add all supported fluster tests Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Don't specify the platforms manually as they are already mentioned in test-job-arm64-mediatek * config: sc7180-trogdor-kingoftown/lazor-limozeen: enable add all supported fluster tests Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Use test-job-arm64-qualcomm instead and carete separate jobs for qualcomm devices - Don't specify platforms manually as they are already mentioned in test-job-arm64-qualcomm * build(deps): bump uwsgi from 2.0.21 to 2.0.22 in /docker/lava-callback Bumps [uwsgi](https://uwsgi-docs.readthedocs.io/en/latest/) from 2.0.21 to 2.0.22. --- updated-dependencies: - dependency-name: uwsgi dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> * pipeline.yaml: Add stable-rc build variants Add more build variants for stable-rc tree to match legacy system. Signed-off-by: Denys Fedoryshchenko <[email protected]> * result_summary: add error classification Classify errors according to patterns in the logs Signed-off-by: Helen Koike <[email protected]> * result_summary presets: add collabora-chromeos-kernel and media trees for fluster tests Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: Use media-stage instead of media-tree Signed-off-by: Muhammad Usama Anjum <[email protected]> * config/pipeline: enable android branches from legacy Enable all android branches from the legacy system Signed-off-by: Helen Koike <[email protected]> * trigger: Add exclude/include tree list for trigger As we need to restrict list of running kernels on staging, we need to add option allowing that. Also it will be good to exclude staging kernels from production kernel list. So in case of staging we need to run kernels only from tree "kernelci" and sometimes something else, for example "mediatek". Option will look like: --trees kernelci,mediatek or --trees kernelci On production we need to exclude trees kernelci and buggytree: --trees !kernelci,buggytree or just kernelci: --trees !kernelci Purpose of this option is that our compiling capacity is limited, and right now staging and production both compiling very large set of kernels, we need to reduce this amount to drop costs. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: platforms-chromeos: use CrOS R124 files ChromeBooks were upgraded with a new image based on ChromiumOS R124, so we must use those files now. Signed-off-by: Arnaud Ferraris <[email protected]> * config: jobs-chromeos: drop non-existent Tast tests Those were removed between R120 and R124 and therefore cause test failures with the new images. Signed-off-by: Arnaud Ferraris <[email protected]> * result_summary presets: fix acpi kselftest presets We're interested in catching regressions and failures in the both the kselftest-acpi test suites and its test cases. Match the nodes by group in the presets accordingly. Fix template used by the failure monitor preset. Signed-off-by: Laura Nao <[email protected]> * src: update return values of `APIHelper.receive_event_node` `APIHelper.receive_event_node` method is used to receive node data from PubSub event. The method has been updated to return `is_hierarchy` flag as well which represents events related to node hierarchy. Update pipeline services using the method accordingly. Signed-off-by: Jeny Sadadia <[email protected]> * result_summary presets: refine presets for v4l2-decoder-conformance Modify the regression preset to monitor regressions on both the v4l2-decoder-conformance test suites and its test cases, by matching the nodes by group instead of by name. Also, change the failure preset to monitor for all errors caused by runtime errors. Signed-off-by: Laura Nao <[email protected]> * result_summary presets: add summary presets for v4l2-decoder-conformance Add summary presets to fetch regressions and failures on v4l2-decoder-conformance tests. Two of the presets are the same used by the monitor; add one additional preset to fetch all the failures on both the test suites and their test cases. Signed-off-by: Laura Nao <[email protected]> * lava_callback.py: Remove error_code/error_msg on lava-callback Sometimes due congestion node might be set to timeout, but then result might arrive late and we need to use it properly. Signed-off-by: Denys Fedoryshchenko <[email protected]> * result_summary presets: fix dt kselftest presets Fix the dt kselftest preset, just like was done for the acpi one, as the current preset doesn't match the actual results we're interested in. Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * doc/connecting-lab: refine documentation Refine documentation for connecting LAVA labs and submitting jobs to the lab. Signed-off-by: Jeny Sadadia <[email protected]> * lava_callback: Sometimes we get totally invalid log file uploaded Most likely problems lays in threading of flask, and possibly callbacks are getting mixed. This commit attempts to introduce several countermeasures against that. Signed-off-by: Denys Fedoryshchenko <[email protected]> * doc: add `_index.md` page Add index documentation page. Signed-off-by: Jeny Sadadia <[email protected]> * doc: add `pipeline-details` page Move `pipeline-details` documentation from the API repository to this repo to make it close to the source. Signed-off-by: Jeny Sadadia <[email protected]> * doc/connecting-lab: adjust `weight` property Change `weight` property of existing doc page to accommodate with transition of pipeline related docs to pipeline repo. Signed-off-by: Jeny Sadadia <[email protected]> * doc: add `developer-documentation` page Add developer manual documentation. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add lab config for Qualcomm Add an entry to `runtimes` section for Qualcomm lab configurations. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add `baseline-x86` job for qualcomm Add job configuration `baseline-x86-qualcomm` for running baseline job in Qualcomm LAVA lab. Add scheduler entry as well. Signed-off-by: Jeny Sadadia <[email protected]> * docker-compose.yaml: add lab-qualcomm runtime Add runtime argument `lab-qualcomm` to `scheduler-lava` container. This will enable the pipeline to run and submit jobs to Qualcomm LAVA lab. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add `baseline-arm64` job for qualcomm Add job configuration `baseline-arm64-qualcomm` for running baseline job for `arm64` in Qualcomm LAVA lab. Add scheduler entry as well. Signed-off-by: Jeny Sadadia <[email protected]> * pipeline.yaml: Update RISC-V configs 1)rv32 defconfig doesn't exist, remove 2)nommu_k210_defconfig have modules disabled Signed-off-by: Denys Fedoryshchenko <[email protected]> * lava_callback.py: Sanitize lava log data As we use this data in reports, lets remove all non-printable characters as they confuse grafana, browsers and others. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config/runtime/kunit.jinja2: fix result map Fix result map for skipped tests. Initially, API didn't have `skip` available node result in the schema. That's why it was mapped to `None` result. But now API has `skip` result to denote skipped tests. Fix the result mapping accordingly. Signed-off-by: Jeny Sadadia <[email protected]> * config: jobs-chromeos: Add lab-setup fragment Add the lab-setup fragment to the chromebook builds, which contains the architecture independent kernel configs needed to run tests on the platform. Notably this disables IP autoconfig by the kernel. The result of this change is that the 12 seconds boot delay and the consequent deferred probe pending warnings will no longer happen on any platform. Particularly on mt8186-corsola-steelix-sku131072 (due to a different network adapter being used) on which it was still happening. Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * lava_callback: bump up slightly threads number Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: chromeos: enable watchdog reset test on Chromebooks Add a basic test to verify watchdog reset functionality. Enable the test on all ARM64 and AMD x86_64 Chromebooks. For Intel Chromebooks, enable the test only on octopus, as ACPI PM Timer on the other devices has been disabled in coreboot. Signed-off-by: Laura Nao <[email protected]> * src/send_kcidb: use schema version 4.3 Test status `MISS` was added to KCIDB in schema v4.2 and supported by the latest version i.e. v4.3. Hence, use the latest version for submission as API may send a few tests with "MISS" status. Signed-off-by: Jeny Sadadia <[email protected]> * send_kcidb: re-structure code for parsing checkout node Move code for parsing checkout node to a separate method. Add `valid` field to parsed checkout node. It denotes if source code was successfully checked out. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: print more information on invalid data Print details for invalid revision data for the sake of debugging. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: optimize `kcidb` import Remove redundant `kcidb` import and adjust kcidb Client call accordingly. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: remove keys with `None` values KCIDB doesn't allow `None` as field value. Remove all optional fields with `None` value to make it valid data for submitting to KCIDB. Signed-off-by: Jeny Sadadia <[email protected]> * config: add `kcidb_test_suite` property Every KernelCI test will be mapped to a unified test suite for KCIDB data submission. Add `kcidb_test_suite` property to test job definitions in YAML configuration files. The added property will store the mapped KCIDB test suite name. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: parse and submit node test and build data Listen to all the node events with node state `done` or `available` and submit the node to KCIDB. Parse node received from the event and create KCIDB schema compatible object based on type of the node i.e. checkout, build or test. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: set `log_excerpt` for builds and tests Fetch logs from compressed log file(*.log.gz) URL and send last 16*1024 characters for setting `log_excerpt` field for build and test nodes as it is the max allowed length of the KCIDB field. Signed-off-by: Jeny Sadadia <[email protected]> * config/jobs-chromes: add kcidb test suite property for watchdog test Add KCIDB test suite mapping for `watchdog_reset` test. Signed-off-by: Jeny Sadadia <[email protected]> * lava_callback.py: disable log removal from callback data We need it for investigations if we have any critical data loss during log sanitizing. Signed-off-by: Denys Fedoryshchenko <[email protected]> * src/send_kcidb: add error info to build nodes Add error metadata fields such as `error_code` and `error_msg` to `misc` field for build nodes. Signed-off-by: Jeny Sadadia <[email protected]> * result_summary presets: add watchdog-reset presets for mainline/next Add monitor and summary presets to track the results from the watchdog reset test on the mainline and next trees. Signed-off-by: Laura Nao <[email protected]> * pipeline.yaml: Fix fluster rootfs URL Signed-off-by: Denys Fedoryshchenko <[email protected]> * src/send_kcidb: get error metadata for failed/incomplete tests Tweak condition to get error metadata for test nodes. It should get error info for incomplete nodes as well and not just failed nodes. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: send tests only if KCIDB test mapping exists All test suite definitions must have `kcidb_test_suite` property i.e. KCIDB test suite mapping. Only send tests for those the mapping is found. Signed-off-by: Jeny Sadadia <[email protected]> * tests/validate_yaml: add validation for KCIDB mapping To submit KernelCI generated data to KCIDB, it is required to have a mapping for all the job definition with `kcidb_test_suite` property. Add validation to ensure all the jobs have a mapping present to avoid missing data submission. This check is to notify test authors trying to enable tests in maestro to include the required property for the mapping in their definition. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add qcs6490-rb3gen2 boot test Signed-off-by: Milosz Wasilewski <[email protected]> * config: chromeos: Enable kselftest-dt on Qualcomm platforms Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * pipeline.yaml: Add one um build for android trees As per request of Android team it will be good to check for breakages UM builds as well. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: use `kind=job` for test suites As part of re-structuring test hierarachy, `Job` model has been introduced for test suite/job nodes. It uses node kind `job`. Update test configurations in `pipeline.yaml` and `jobs-chromeos.yaml` to use `kind=job` to generate job nodes. Signed-off-by: Jeny Sadadia <[email protected]> * config/runtime/kunit.jinja2: provide `kind` value for child tests In case of submitting test hierarchy, child nodes by default inherit `kind` value from parent node. As we are re-structuring test hierarchy, test suit/job nodes will have `kind=job` where its child test nodes will have `kind=test`. Provide `kind` field explicitly to test result hierarchy to preserve different kind value than the parent node. Signed-off-by: Jeny Sadadia <[email protected]> * config/runtime/kunit.jinja2: fix `NameError` Fix the below error in `_submit` method: ``` Traceback (most recent call last): File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 287, in main job.submit(results) File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 138, in submit self._submit(result) File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 265, in _submit return node NameError: name 'node' is not defined ``` Signed-off-by: Jeny Sadadia <[email protected]> * config/runtime/kunit.jinja2: evaluate job node result Evaluate job node result from child node results if `null` result is receive from test result parser. For example nodes such as `fortify`: https://staging.kernelci.org:9000/viewer?node_id=6670ab43d0b7694b399897c4 Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: fix parsing of KUnit log file Handle both compressed(gzip) and plain text log files for getting log excerpt. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: HTTP exception handling for log excerpt Add HTTP exception handling for getting log excerpt data. Signed-off-by: Jeny Sadadia <[email protected]> * config: platforms-chromeos: Add serial delay for some Mediatek platforms Add test_character_delay to the Spherion, Tomato and Steelix platforms to workaround the fact that they're sometimes unable to process serial input fast enough, resulting in mangled commands and consequently flaky test results, as described in https://github.com/kernelci/kernelci-project/issues/366. The right place to do this change would be in the device-type template as described in LAVA's documentation [1]. This overriding in KernelCI is meant only as a temporary workaround to verify whether this fixes the issue. If it does, then we'll do it in LAVA upstream instead. [1] https://docs.lavasoftware.org/lava/debugging.html#differences-in-input-speeds Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * config: chromeos: Enable error-logs kselftest for MediaTek Chromebooks Run the error-logs kselftest on MediaTek Chromebooks. This test is currently under review upstream [1] so, in the meantime, it has been added to the collabora-next tree so it can prove its value by helping to detect issues upstream. [1] https://lore.kernel.org/all/[email protected] Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * config/pipeline.yaml: enable CIP lab Add configuration for LAVA CIP lab. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add baseline-x86 test for CIP Add `baseline-x86-cip` test to be submitted to CIP LAVA lab. Signed-off-by: Jeny Sadadia <[email protected]> * docker-compose.yaml: add `lab-cip` runtime Add runtime argument `lab-cip` to `scheduler-lava` container. This will enable the pipeline to run and submit jobs to CIP LAVA lab. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: enable `job` node submission to KCIDB Parse newly added job node and its child tests for KCIDB submission. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: don't submit `setup` test suite nodes `setup` test suite has been introduced to store test results for environment setup checks before running actual test suite. KCIDB doesn't require `setup` test suite result as long as main test job result is submitted. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: add a check before sending data Check if parsed data is available before sending revision data to KCIDB. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: fix logs Fix log statement about submitting node to KCIDB as we are not sending all the nodes we receive event for to KCIDB. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: handle skipped tests Do not retrieve artifacts or metadata from parent node for skipped tests as in pratice only kernel revision, test runtime and platform will be available for skipped tests. Signed-off-by: Jeny Sadadia <[email protected]> * result_summary/utils: ignore failures on log retrieval Make the script continue running if there was an error fetching a test log. Signed-off-by: Ricardo Cañuelo <[email protected]> * doc/developer-documentation: add docs for enabling new tests Add developer documentation for enabling new tests. Signed-off-by: Jeny Sadadia <[email protected]> * Fix links after docs page migration Documentation has been migrated to the "docs.*" subdomain. Signed-off-by: Paweł Wieczorek <[email protected]> * pipeline.yaml: Add kcidebug fragment Add useful low-overhead debug option to kernel, and test on most x86 boards we have available, with minimal baseline tests. Signed-off-by: Denys Fedoryshchenko <[email protected]> * configs: update gcc-10 to gcc-12 As we upgrade compiler images, we need update gcc version Signed-off-by: Denys Fedoryshchenko <[email protected]> * regression_tracker: workaround: match node paths programatically Don't use 'path' as an api search parameter. The use of lists as query parameters (path is a list) is undefined. Instead, do the filtering in code. Signed-off-by: Ricardo Cañuelo <[email protected]> * config: remove qemu jobs from lab-qualcomm QEMU jobs use container pulled from hub.docker.com. After the lab move pulling from this registry is no longer possible at Qualcomm. This patch disables QEMU jobs from Qualcomm lab. Signed-off-by: Milosz Wasilewski <[email protected]> * validate_yaml.py: Improve pipeline validation Add validation that scheduler entries have matching job entry, this is critical validation, and job entries have at least one entry in the scheduler. Fix one entry detected by this validation Signed-off-by: Denys Fedoryshchenko <[email protected]> * pipeline.yaml: Add broonie(Mark Brown) trees to pipeline It is time to enable even more trees. Signed-off-by: Denys Fedoryshchenko <[email protected]> * validate_yaml.py: Add additional verification for duplicate keys We might have redefined same keys in different yaml files, this tool will ensure consistency of this entries. Signed-off-by: Denys Fedoryshchenko <[email protected]> * validate_yaml.py: Remove path separator Signed-off-by: Denys Fedoryshchenko <[email protected]> * validate_yaml.py: Rename variable to schedules Signed-off-by: Denys Fedoryshchenko <[email protected]> * config/kernelci.toml: update KCIDB origin name As we agreed to refer new KernelCI API & Pipeline as "maestro", use the new name while submitting data to KCIDB. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: update KCI result mapping with KCIDB status Update evaluation of KCIDB status from KCI result. Create 2 categories for error codes: 1. When pre-check tests completed but actual test suite coudln't run - this will have `MISS` status 2. When pre-check tests completed, actual test suite could run but somehow couldn't complete - this will have `ERROR` status Some LAVA error codes can occur at any point of execution such as `Cancelled` and `Test`. Listed such error codes to the most relevant category based on analysis of available results. Signed-off-by: Jeny Sadadia <[email protected]> * result_summary presets: fix presets for v4l2-decoder-conformance Following recent updates to data representation on KernelCI nodes, the top-level nodes for tests now have their kind set to 'job' instead of 'test'. Update the presets for v4l2-decoder-conformance tests accordingly. Signed-off-by: Laura Nao <[email protected]> * result_summary presets: fix output file name in kselftest-acpi preset Signed-off-by: Laura Nao <[email protected]> * config: enable dmabuf-heaps, exec and iommu kselftest suites Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Add kcidb_test_suite * config: result-summary: add generic rule to monitor failures and regression Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: Add rt-stable builds Copy rt-stable builds from legacy KernelCI. Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Major changes to move to new way of writing kbuild jobs * config: pipeline: Add v6.6-rt branch for builds Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: result-summary: add rt-stable kbuilds presets Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: chromeos: Add 'nfs' suffix to KCIDB suite name for baseline-nfs The baseline test is currently run with both ramdisk and nfs rootfs. To distinguish baseline-nfs tests in KCIDB, add an 'nfs' suffix to the KCIDB test suite name. Signed-off-by: Laura Nao <[email protected]> * aks: Add kubernetes kcidb deployment We need file that will manage deployment of kcidb bridge in kubernetes production deployment. Signed-off-by: Denys Fedoryshchenko <[email protected]> * kubernetes: Adjust trigger k8s options Ignore kernelci tree on production, as it is special "staging"-only tree, and read all /config directory, not just default pipeline.yaml. Signed-off-by: Denys Fedoryshchenko <[email protected]> * regression_tracker: bugfix: catch empty search condition Fix _get_last_matching_node(), after the previous change there was an unhandled scenario where nodes may be empty but the function wouldn't return None immediately. Signed-off-by: Ricardo Cañuelo <[email protected]> * config: pipeline: correct the kind of kselftest suites to job Signed-off-by: Muhammad Usama Anjum <[email protected]> * scheduler-chromeos.yaml: Temporarily disable non-essential tast tests As per discussion, we disable temporary tast tests which unlikely will be reviewed. Signed-off-by: Denys Fedoryshchenko <[email protected]> * k8s/aks: Update deployment files 1)Update memory limit, as working with linux sources might require 3Gbyte of RAM. 2)Update config file path 3)Add callback environment variable 4)Update image reference to fresh one Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: pipeline: enable android builds with gcc-12 for all architectures Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: enable android builds with clang-17 for all architectures Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: remove build_variants from android build_configs The build_variants is legacy way to specify the different variants. We have moved to the newer way to specify the variants. Hence remove the build_variants from android build_configs. Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: add android15-6.6-lts branch for build as well The android15-6.6-lts has been included recently in legacy KernelCI: https://github.com/kernelci/kernelci-core/pull/2597 Add the same in newer KernelCI. Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: add blocklist for riscv older kernels for android builds Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: update KCIDB test suite mapping for baseline Use `boot` as KCIDB test suite mapping for all baseline tests. Signed-off-by: Jeny Sadadia <[email protected]> * callback_url: Update config and README As we are moving callback URL to environment variable, updating config and README accordingly. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: pipeline: enable android baseline (boot) testing for arm and arm64 in only allmodconfig Signed-off-by: Muhammad Usama Anjum <[email protected]> * scheduler.py: If event have jobfilter, inject it to the node data When someone generate artificial event with jobfilter, this is likely maintainer trying to repeat job. Treat this accordingly, and inject job filter to job node, so we will run only tests maintainer wants. Signed-off-by: Denys Fedoryshchenko <[email protected]> * lava_callback: migrate to fastapi It will be easier to maintain API and Pipeline, as both will be powered by FastAPI framework. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: chromeos: Update fluster rootfs URL Signed-off-by: Laura Nao <[email protected]> * config: pipeline: fix defconfigs in fragments Signed-off-by: Muhammad Usama Anjum <[email protected]> * kbuild.jinja2: support defconfig as list or str As required in https://github.com/kernelci/kernelci-core/pull/2608 defconfig might be two types. Support it in jinja2 accordingly. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: piepline: add kbuilds of lee-mfd with default defconfigs Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: enable baseline testing for mfd for one board of each arch Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: fix platform sections for Qualcomm and Android schedules Signed-off-by: Paweł Wieczorek <[email protected]> * k8s: Update deployment to uvicorn, as we use fastapi now Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: pipeline: Unblock android runs on lava-collabora Signed-off-by: Muhammad Usama Anjum <[email protected]> * pipeline: Enable preempt-rt cyclictest test Enable the first preempt-rt test, cyclictest in new KernelCI. Enable it on all platforms. Since these are all smoke test there is no point in running them too long. Thus reduce the runtime per test to one minute. This should keep the total preempt-rt runtime roughly in the same time frame. The changes have been ported from Daniel's PR [1]. [1] https://github.com/kernelci/kernelci-core/pull/2397 Signed-off-by: Daniel Wagner <[email protected]> Co-developed-by: Muhammad Usama Anjum <[email protected]> Signed-off-by: Muhammad Usama Anjum <[email protected]> * pipeline: add all the test jobs for all rt-test Add jobs definition of all the rt-tests. Enable cyclicdeadline and rtla tests to run on all targets. The changes have been ported from Daniel's PR [1]. [1] https://github.com/kernelci/kernelci-core/pull/2397 Signed-off-by: Daniel Wagner <[email protected]> Co-developed-by: Muhammad Usama Anjum <[email protected]> Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: add template and test properties for preempt_rt jobs Add template, job add kcidb_test_suite properties for all preempt-rt jobs Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: rename preempt-rt to rt-tests which is correct name of tests The legacy was using preempt-rt name of tests. But the repository has rt-tests name. We must use the same name to merge with execution results coming from other CIs in KCIDB. Suggested-by: Jeny Sadadia <[email protected]> Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: add the correct nfsroot for rt-tests Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: Remove android's deprecated branches It has been confirmed with Todd that we should remove the deprecated branches. Hence remove those branches. Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: run baseline on non-allmodconfig The allmodconfig generates very large kernel image. It cannot be booted on the arm64 and arm targets as tftp errors out that size is too large. Reduce the kernel image size. Use the default defconfig. The same defconfigs have been booting for other trees. Signed-off-by: Muhammad Usama Anjum <[email protected]> * doc: developer-documentation: Update documentation by adding more details - Reorganize some things - Specify how to write different variants by removing old syntax - Give two separate templates for kbuild and test - Try to put more details for new contributors Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes since v1: - Fix type - Apply suggestions from code review * doc/developer-documentation: fix a glitch in enabling new tree section Fix a minor bug in YAML block formatting. Fixes: f5f57de ("doc: developer-documentation: Update documentation by adding more details") Signed-off-by: Jeny Sadadia <[email protected]> * doc/developer-documentation: update a section title Rename a section from "Enabling a new Kernel tree" to "Enabling new KernelCI trees, builds, and tests" as it explains enabling tests as well. Signed-off-by: Jeny Sadadia <[email protected]> * config: use the new `tree:branch` format for rules For cases where we want a single branch to be allowed for a given tree, we can now use the `tree:branch` format in rules. Convert existing rules accordingly. Signed-off-by: Arnaud Ferraris <[email protected]> * config: pipeline: fix improper use of "filters" attribute The `filters` param was used in the legacy system but has been replaced by `rules`, with a different syntax. For Android RISC-V builds, this was used to deny job execution on kernels < 4.19, so let's translate this condition with the rules format, and do a similar change for the `rt-tests`-based jobs. Signed-off-by: Arnaud Ferraris <[email protected]> * config/pipeline.yaml: Fix x86 typo in kcidebug job names The kcidebug jobs that run on MediaTek and Qualcomm platforms should have arm64 in the name rather than x86. Fix the typo. Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * config: pipeline: remove params The parameters are only needed when they are changed or appeneded. Remvoe the parameters which aren't being modified. Signed-off-by: Muhammad Usama Anjum <[email protected]> * validate_yaml.py: Jobs are required to have template parameter Add more validation to config files of mandatory parameters. Signed-off-by: Denys Fedoryshchenko <[email protected]> * validate_yaml.py: Add more job validations Add basic validation, each job must have kind parameter Signed-off-by: Denys Fedoryshchenko <[email protected]> * workflows: Add label on CI check failures Automatically add label so broken PR wont go to staging Signed-off-by: Denys Fedoryshchenko <[email protected]> --------- Signed-off-by: Jeny Sadadia <[email protected]> Signed-off-by: Nícolas F. R. A. Prado <[email protected]> Signed-off-by: Denys Fedoryshchenko <[email protected]> Signed-off-by: Ricardo Cañuelo <[email protected]> Signed-off-by: Helen Koike <[email protected]> Signed-off-by: Arnaud Ferraris <[email protected]> Signed-off-by: Laura Nao <[email protected]> Signed-off-by: Muhammad Usama Anjum <[email protected]> Signed-off-by: Shreeya Patel <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Milosz Wasilewski <[email protected]> Signed-off-by: Paweł Wieczorek <[email protected]> Signed-off-by: Daniel Wagner <[email protected]> Co-authored-by: Jeny Sadadia <[email protected]> Co-authored-by: Nícolas F. R. A. Prado <[email protected]> Co-authored-by: Ricardo Cañuelo <[email protected]> Co-authored-by: Helen Koike <[email protected]> Co-authored-by: Arnaud Ferraris <[email protected]> Co-authored-by: Laura Nao <[email protected]> Co-authored-by: Muhammad Usama Anjum <[email protected]> Co-authored-by: Shreeya Patel <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Milosz Wasilewski <[email protected]> Co-authored-by: Paweł Wieczorek <[email protected]> Co-authored-by: Milosz Wasilewski <[email protected]> Co-authored-by: Daniel Wagner <[email protected]> Signed-off-by: Denys Fedoryshchenko <[email protected]>
* src/scheduler: store error message when job fails with "submit_error" It is helpful for debugging to catch error message when scheduler fails to submit job to runtime. Store the error message to `data.error_msg` field. Signed-off-by: Jeny Sadadia <[email protected]> * config: pipeline: Set minimum kernel version for DT kselftest to 6.7 The test was introduced upstream in version 6.7, so no point in trying to run it on earlier versions. Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * configs/: Update volteer device Update volteer devices according lab availability Signed-off-by: Denys Fedoryshchenko <[email protected]> * result_summary templates: detailed output for active/inactive regressions Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: new presets for active regressions Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: update CHANGELOG Signed-off-by: Ricardo Cañuelo <[email protected]> * data: chmod -R 777 ./data/output to avoid permission error Avoid errors like PermissionError: [Errno 13] Permission denied: '/home/kernelci/data/output/stable-rc-boot.html' Signed-off-by: Helen Koike <[email protected]> * result_summary: move code to _get_logs Signed-off-by: Helen Koike <[email protected]> * result_summary: use ThreadPoolExecutor to fetch logs Fetching logs is the bottleneck of the script. Fetch them in parallel with ThreadPoolExecutor. Signed-off-by: Helen Koike <[email protected]> * result_summary: fix result presets stable-rc-build-failures and stable-rc-boot-failures weren't querying specifically for test failures. Signed-off-by: Ricardo Cañuelo <[email protected]> * src/regression_tracker: rework regression detection Take into account "active" and "inactive" regressions when creating them and when processing new passed or failed nodes. When a node passes, it checks if it "inactivates" an existing "active" regression. When a node fails, it checks if it needs to create a new regression or update an existing "active" one. Signed-off-by: Ricardo Cañuelo <[email protected]> * src/regression_tracker: link failed nodes to active regressions When a failed node generates a regression, or when it's a re-run of a run that generated a still active regression, link the node to the regression id. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: support for date ranges for creation and update New command line options to let the user specify date ranges for node creation and last update: --created-from, --created-to, --last-updated-from, --last-updated-to Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: support for date ranges for creation and last update Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: support for extra query parameters in cmdline New command line option: --query-params to specify a set of extra query parameters to complete or override preset parameters. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: html markup in some preset titles Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary changelog: update and move to docs folder Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: move parameter loading and processing to 'setup' Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: refactor and split into two clases (single, run) Split the ResultSummary class into a base class and two child classes: ResultSummarySingle and ResultSummaryLoop (only a stub at this point). Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: WIP initial implementation of the "loop" command Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: huge refactoring Implement "summary" (single-shot) and "monitor" (loop) modes based on preset parameters instead of on the command-line main command. Split the logic into multiple files, move all monitor-specific and summary-specific code to independent files, common code in a separate file. Full of kludges, I don't like how this is looking so far, might consider reimplementing it without any dependencies on pipeline code. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: fix markup and indentation Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: new generic templates for monitor mode Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: examples for "monitor" and "summary" modes Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary changelog: summary and monitor modes Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: fix generic regression report Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: summary: fix last_updated option handling Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: embed css stylesheet in html files Signed-off-by: Ricardo Cañuelo <[email protected]> * regression_tracker: [trivial] make regression active by default Fixup for commit fcb29501663d78920bcd129bd57c36b9af624bc4 If the "result" field is ever made non-optional in the models we can probably remove this. Signed-off-by: Ricardo Cañuelo <[email protected]> * regression_tracker: [trivial] set default empty node sequence Fixup for commit fcb29501663d78920bcd129bd57c36b9af624bc4 If the "node_sequence" field is ever made non-optional in the models we can probably remove this. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: add cmdline option --output-dir Introduce a new command-line option: --output-dir, and rename the old --output to --output-file. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary changelog: command-line options change Signed-off-by: Ricardo Cañuelo <[email protected]> * config: jobs-chromeos: remove meaningless Tast tests Several Tast tests can only fail in the context of KernelCI: * `video.PlatformDecoding.v4l2_state*_vp9_0_svc` do not actually exist, causing the whole test job to fail * `platform.DLCService*` and `platform.Memd` rely on features only present in the downstream Chrom{e,ium}OS kernel (see b/247467814 and b/244479619 for those having access to Google's issue tracker) * `kernel.ConfigVerify.chromeos` relies on downstream-only config options such as `CONFIG_SECURITY_CHROMIUMOS` and other similar ones, and therefore can only fail when testing upstream kernels Signed-off-by: Arnaud Ferraris <[email protected]> * config: scheduler-chromeos: don't execute non-working Tast tests Currently, HEVC-related tests are known to either fail or be skipped as ChromeOS doesn't yet handle hardware decoding of HEVC media. This is expected to be fixed at some point though, so we're keeping the job definitions and only remove the corresponding scheduler entries in order to reinstate those jobs when relevant. Signed-off-by: Arnaud Ferraris <[email protected]> * config: jobs-chromeos: exclude Tast tests known to always fail Several decoder tests always fail on all platforms where they're executed, adding only noise to otherwise useful test results. Disable those for improving the quality of the results. Signed-off-by: Arnaud Ferraris <[email protected]> * config: chromeos: add special case for pre-6.7 qcom codec tests On Qualcomm-based ChromeBooks (`trogdor` being the only model in Collabora's lab), we noticed systematic failures of all `vp9_*_frm_resize` and `vp9_*_sub8x8_sf` tests when using a kernel up to 6.6. With 6.7 and above, all of those tests (except one) now pass. It therefore makes sense to exclude those on pre-6.7 kernels so we don't report known failures and get rid of some noise. This involves "duplicating" affected test jobs (although I did my best to minimize that) and setting rules so only the working variant is executed, based on the version of the kernel being tested. Signed-off-by: Arnaud Ferraris <[email protected]> * lava_callback: Compress the log files to save storage space As storage space in cloud and egress have high costs, better to compress potentially large files. Signed-off-by: Denys Fedoryshchenko <[email protected]> * tests: Add basic yaml validation Add yaml load to figure out earlier issues with yaml Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: chromeos: drop stoneyridge/pineview naming in platforms anchors The "stoneyridge" and "pineview" naming used in the Chromebook platform anchors refers to ChromiumOS specific config fragments, but doesn't necessarily match the actual platform of all the devices listed. Use more generic names to distinguish amd and intel Chromebooks. Signed-off-by: Laura Nao <[email protected]> * config: chromeos: rename test job anchors that use chromeos specific configs Rename test job anchors that use chromeos specific kernel configurations to include the 'chromeos' infix. Signed-off-by: Laura Nao <[email protected]> * config: chromeos: add baseline tests Enable the baseline tests on all the supported Chromebooks with their default kernel configuration. Signed-off-by: Laura Nao <[email protected]> * config: chromeos: drop stoneyridge/pineview naming in job defs The "stoneyridge" and "pineview" naming used in some Chromebook job definitions refers to ChromiumOS specific config fragments, but doesn't necessarily match the actual platforms targeted by the jobs. Replace all occurrences with more generic intel/amd naming. Signed-off-by: Laura Nao <[email protected]> * config: chromeos: drop chromeos infix from baseline jobs Keeping different job names for tests targeting different kernel configs might cause too much duplication. Drop the 'chromeos' infix from the job name for the tests using the chromeos config fragment. Users will be able to filter the results using the data.defconfig/data.config_full fields anyway. Signed-off-by: Laura Nao <[email protected]> * result_summary: post-process results for summary and monitor modes Split the post-processing of nodes to a common function that can be used for both summary and monitor modes. Currently, post-processing involves only the collection of logs. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: update and fix presets and templates Signed-off-by: Ricardo Cañuelo <[email protected]> * doc/result-summary-CHANGELOG: update Signed-off-by: Ricardo Cañuelo <[email protected]> * config/pipeline.yaml: enable 'BayLibre' lab Add lab configuration for BayLibre. Signed-off-by: Jeny Sadadia <[email protected]> * docker-compose.yaml: add `lab-baylibre` runtime Add runtime argument `lab-baylibre` to `scheduler-lava` container. This will enable the pipeline to run and submit jobs to BayLibre. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add `baseline-x86-baylibre` job Add job configuration `baseline-x86-baylibre` for BayLibre. Add scheduler entry as well. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add `baseline-armel-baylibre` job Add job configuration `baseline-armel-baylibre` for BayLibre. Add scheduler entry and platform config as well. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline: enable `android` tree and build configs Monitor linux `android` tree. Add build configs for `android-mainline` branch. Signed-off-by: Helen Koike <[email protected]> * config/pipeline.yaml: add kbuild definitions for android-mainline Add kbuild jobs to compile the kernel for android-mainline branch Signed-off-by: Helen Koike <[email protected]> * config/pipeline.yaml: add entries to schedule to build android-mainline Add entries to `scheduler:` section to run the builds for android-mainline. Signed-off-by: Helen Koike <[email protected]> * result_summary: fix node filter in monitor mode Signed-off-by: Ricardo Cañuelo <[email protected]> * kernelci.toml: set `checkout` node timeout to `180 min` Currently set `60 min` timeout is not enough as some `kbuild` jobs and its sub-tests take around 2 hrs to complete after getting submitted to runtime. Here is an example from staging. See the information for a `checkout` and its child nodes: | id | name | created | updated | timeout | |--------------------------|---------------------|----------------------------|----------------------------|----------------------------| | 661c9d59b60b785eb9fc42b0 | checkout | 2024-04-15T03:22:01.317000 | 2024-04-15T03:51:03.870000 | 2024-04-15T04:22:01.284000 | | 661c9d97b60b785eb9fc42b4 | kbuild-gcc-10-arm64 | 2024-04-15T03:23:03.399000 | 2024-04-15T03:50:15.031000 | 2024-04-15T09:23:03.399000 | | 661ca3f7b60b785eb9fc4ead | baseline-arm64 | 2024-04-15T03:50:15.304000 | 2024-04-15T05:09:45.247000 | 2024-04-15T09:50:15.304000 | Signed-off-by: Jeny Sadadia <[email protected]> * result_summary: add email report capabilities for monitor mode Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: plain text single report templates Signed-off-by: Ricardo Cañuelo <[email protected]> * config: chromeos: add baseline-nfs tests Enable the baseline-nfs tests on all the supported Chromebooks, with both the default and the chromeos kernel configurations. Signed-off-by: Laura Nao <[email protected]> * src/timeout: set `checkout` result For `TIMEOUT` mode, set `checkout` node result to `fail` if its state is `running` as it means code checkout is still going on and node timed-out. Set it to `pass` if its state is any other than `running`. Set `checkout` node result to `pass` if mode is `DONE` as it means once `checkout` has been in `available` or `closing` state and it could successfully complete source code checkout. Signed-off-by: Jeny Sadadia <[email protected]> * regression_tracker: bugfix, failed test with no prior runs Handle the case of a failed test run when it's the first occurence of that test case. Consider it "not a regression" for now, since we're defining a regression as a "breaking point" between a success and a failure. Signed-off-by: Ricardo Cañuelo <[email protected]> * config: platforms-chromeos: fix dalboz device type Due due to a copy/paste mishap, the device type for `asus-CM1400CXA-dalboz` had a trailing `_chromeos`, leading LAVA to fail finding the correct device type, and no job from the new system running on this platform. Signed-off-by: Arnaud Ferraris <[email protected]> * config: jobs-chromes: run Tast tests only on 5.4+ Current ChromeOS images have `ext4` filesystems using options not present in 4.19. Therefore tests cannot run on kernels that old, and this leads to false positives in corrupt device identification, so we should only run those tests on 5.4 and later kernels. Signed-off-by: Arnaud Ferraris <[email protected]> * config: platforms-chromes: drop non-existent platform `hp-x360-12b-ca0500na-n4000-octopus` isn't a device type available in Collabora's LAVA lab, so let's drop its definition. Signed-off-by: Arnaud Ferraris <[email protected]> * config: exclude android tree from kbuild jobs Only Android-specific kbuild jobs should run for this tree, let's not overload our system with unneeded builds. Take this opportunity to limit mediatek kbuilds to 6.1+ as that's the earliest version that has upstream support for at least one of our devices. Signed-off-by: Arnaud Ferraris <[email protected]> * src/timeout: a bug fix in `_submit_lapsed_nodes` Fix a glitch in the code related to setting `checkout` node result. Fixes: 361fc0d ("src/timeout: set `checkout` result") Signed-off-by: Jeny Sadadia <[email protected]> * pipeline.yaml: Update early access FQDN We are moving k8s from eastus to westus3 as it is cheaper Signed-off-by: Denys Fedoryshchenko <[email protected]> * src/tarball: fix `_kdir` in `update_repo` Fix the below error: ``` kernelci-pipeline-tarball | File "/home/kernelci/./pipeline/tarball.py", line 79, in _update_repo kernelci-pipeline-tarball | kernelci.shell_cmd(f"rm -rf {self._kdir}") kernelci-pipeline-tarball | ^^^^^^^^^^ kernelci-pipeline-tarball | AttributeError: 'Tarball' object has no attribute '_kdir' ``` Fixes: 0a2fe9c ("src/patchset.py: Implement Patchset service) Signed-off-by: Jeny Sadadia <[email protected]> * src/timeout: fix method to get child nodes recursively `TimeoutService._get_child_nodes_recursive` is used to get pending child nodes recursively for closing and timed-out nodes. It overwrites the result while being called recursively. Fix the method to make it work properly. Signed-off-by: Jeny Sadadia <[email protected]> * config: pipeline: rename "armel" arch to "arm" `armel` has various meanings depending on the system: for ChromeOS, it is ARMv7, while in Debian it's ARMv{5T,6}. Moreover, this project is *Kernel*CI and the kernel uses `arm` for all 32-bits ARM devices. In order to avoid confusion (including those wondering what the heck does `armel` mean), let's rename `armel` to `arm`. Signed-off-by: Arnaud Ferraris <[email protected]> * config: use per-system arch property where relevant With the new `*arch` fields present in the platform configurations, we don't have to hardcode the architecture strings in some specific cases. Let's adapt the config files so we use `{cros,deb,k}arch` wherever it makes sense. Signed-off-by: Arnaud Ferraris <[email protected]> * src/timeout: set timed-out `checkout` result Set timed-out `checkout` node result to `incomplete` while in `running` state. As it denotes that the node timed-out while checkout was still going on. Also, set error related information i.e. `error_code` and `error_msg`. Signed-off-by: Jeny Sadadia <[email protected]> * src/tarball: update checkout node when update repo fails Tarball updates source code repo and creates tarball. If update repo operation fails even with second attempt, it means it failed to checkout souce code. Hence, update `checkout` node with state `done` state and result `fail`. Also, set appropriate error information to the `data` field. Signed-off-by: Jeny Sadadia <[email protected]> * config: pipeline: enable collabora-next tree and build config Monitor the collabora-next tree. Add build config for the for-kernelci branch. Signed-off-by: Laura Nao <[email protected]> * config: chromeos: enable acpi kselftest on collabora-next tree Run the ACPI kselftest on the for-kernelci branch of the collabora-next tree. See: https://lore.kernel.org/linux-kselftest/[email protected]/T/#t Signed-off-by: Laura Nao <[email protected]> * result_summary: restore missing split_query_params function Restore this function that was accidentally removed during the last refactoring. Signed-off-by: Ricardo Cañuelo <[email protected]> * lava_callback: Don't upload empty files to Azure There is no use for lot of empty files on Azure, that only complicate cleanup. Signed-off-by: Denys Fedoryshchenko <[email protected]> * result_summary presets: unify preset and output names Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: update preset for aferraris Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: new presets for laura.nao Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: fixes and new presets for nfraprado Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: fix arch query parameters Signed-off-by: Ricardo Cañuelo <[email protected]> * k8s: Lot of deployment tested fixes Fixes in yaml files for k8s production deployment. Signed-off-by: Denys Fedoryshchenko <[email protected]> * result-summary presets: Fix build failure and regression monitors Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * result_summary: added debug traces to the monitor Show detailed info of the node filterings in real time. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: fix corner case bug when no logs are found Cover rare case where neither the node nor any of its parents up to the checkout node have any log artifacts. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: refine stable-rc presets Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: add regression info to test reports Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: escape log snippets Signed-off-by: Ricardo Cañuelo <[email protected]> * src: lava_callback: add device ID to node data It can be useful to know the exact device on which a job ran, without having to open the LAVA job page. This is done by querying the device ID from the callback data and appending it to the node data. Signed-off-by: Arnaud Ferraris <[email protected]> * src: lava_callback: upload raw callback data as well Debugging callback issues is complex due to the raw data not being saved after processing. This change ensures we save the callback data as a JSON file in order to ease development. Signed-off-by: Arnaud Ferraris <[email protected]> * DONOTMERGE lava_callback: add debug statements Why the heck doesn't this just work??? Signed-off-by: Arnaud Ferraris <[email protected]> * result_summary_templates: fix error 'node' is undefined The object is named test and not node, so s/node/test Signed-off-by: Helen Koike <[email protected]> * config/runtime/kunit: set architecture info Set architecture field for `kunit` test nodes. If no `arch` argument is supplied, kunit takes `um` (User Mode Linux) as architecture to run tests. Signed-off-by: Jeny Sadadia <[email protected]> * src/timeout: count running child jobs of build nodes Add a method to count running jobs of `kbuild` nodes i.e. jobs being submitted after successful builds. Fox example `baseline` or `tast` jobs. Signed-off-by: Jeny Sadadia <[email protected]> * src/timeout: handle closing `checkout` node differently Usually, `checkout` should be transited to `done` state when all its child nodes are completed. In case of closing `checkout`, take into account running child jobs of build nodes before transiting its state to `done`. Otherwise, `checkout` will be assigned to `done` state even if some child jobs are still running. Signed-off-by: Jeny Sadadia <[email protected]> * src/timeout: handle holdoff reached `checkout` node differently Usually, available `checkout` for which holdoff is reached should be transited to `done` state only when all its child nodes are completed. In case of such `checkout` node, take into account running child jobs of build nodes before transiting its state to `done`. Otherwise, `checkout` will be assigned to `done` state even if some child jobs are still running. Signed-off-by: Jeny Sadadia <[email protected]> * Revert "DONOTMERGE lava_callback: add debug statements" This reverts commit 5ed8218d99840373bbba5830b1976813b52bf4b1. Signed-off-by: Arnaud Ferraris <[email protected]> * Create dependabot.yml * result_summary_templates: make generic-test-failures generic to all results The generic-test-failures templates can be used to show general results just replacing the name "failures" by "results". Makeing it easier to be re-used by communities that want to have pre-sets to list all results of the tests, so: s/generic-test-failures/generic-test-results Signed-off-by: Helen Koike <[email protected]> * result-summary.yaml: add preset to list android build tests Since we now build android, add a preset to allow result-summary.yaml to list all build results from Android tree. Signed-off-by: Helen Koike <[email protected]> * tarball: Implement checkout for specific commit We often need not ToT, but specific commit, implement this. Signed-off-by: Denys Fedoryshchenko <[email protected]> * jobs-chromeos.yaml: Disable module compression for every kernel version Commit d4bbe942098b ("kbuild: remove CONFIG_MODULE_COMPRESS"), introduced in kernel v5.13, substituted CONFIG_MODULE_COMPRESS=n for CONFIG_MODULE_COMPRESS_NONE=y as the way to disable module compression. Since module compression causes "Invalid ELF header magic: != ELF" errors during boot on the ChromeOS base config, add the missing config to disable module compression on kernels > v5.13 as well. Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * src: lava_callback: reduce callback data size The callback data is quite large, especially as it includes the full log which we already upload separately. By dropping it and compressing the whole file with `gzip` we can avoid wasting too much storage space. Signed-off-by: Arnaud Ferraris <[email protected]> * src: lava_callback: don't leak secret token The callback data contains the secret tokens value which shouldn't be leaked. Ensure we drop it from the uploaded data. Signed-off-by: Arnaud Ferraris <[email protected]> * config: platforms-chromeos: use new cros-flash image This ensures we use the new version of the `install-modules` script. Signed-off-by: Arnaud Ferraris <[email protected]> * src: regression_tracker: add the "device" field to regression data This can be helpful. We're not using it as a search param though, as we don't want to narrow down the search that much, using the platform only is better. Signed-off-by: Arnaud Ferraris <[email protected]> * config: result_summary_templates: report device used for job This information is now available, and it can be useful to know the affected device withouth having to look at the LAVA job details. Signed-off-by: Arnaud Ferraris <[email protected]> * kubernetes: Update deployment recipe Update list of labs and add KCI_INSTANCE variable. Signed-off-by: Denys Fedoryshchenko <[email protected]> * lava-callback: Limit threads of lava-callback Due inrush of lava callbacks and slow Azure Files processing, we need to make sure we dont spawn too many threads. Also add hard limit of memory 1Gbyte Signed-off-by: Denys Fedoryshchenko <[email protected]> * result_summary presets: add presetes for fluster test Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Make template generic for all v4l2 tests - Rebase on main * result_summary presets: make the name of fluster test generic Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: enable first fluster test for mt8195-cherry-tomato-r2 Enable first fluster test, AV1-TEST-VECTORS for mt8195-cherry-tomato-r2. Run the test on mainline and next until more trees are added. Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Create generic v4l2-decoder-conformance-job and use anchers from it - Update the rootfs address - Move anchor to _anchor - Update with nitpicks * config: jobs-chromeos: Add kernelci tree for testing purpose Remove this commit before merging. Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: chromeos: Enable cpufreq kselftest Enable cpufreq kselftest on all the trees and branches. Signed-off-by: Shreeya Patel <[email protected]> * result_summary presets: fix preset for kselftest-dt failures monitor Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: new presets for kselftest-cpufreq Signed-off-by: Ricardo Cañuelo <[email protected]> * config: mt8195-cherry-tomato-r2: enable all fluster tests for all branches Add all the trees and branches on which the tests would be ran. Enable all the tests for tomato. Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - The build config cannot be added yet. Just list the trees, it will only use the branches configured in build_configs: - mainline will use master - next will use master - collabora-chromeos-kernel will use for-kernelci - media will use master and fixes - Remove kernelci tree as it was added just for testing purpose * config: mt8183-kukui-jacuzzi-juniper-sku16: enable add all supported fluster tests Signed-off-by: Muhammad Usama Anjum <[email protected]> jacuzzi * config: mt8186-corsola-steelix-sku131072: enable add all supported fluster tests Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: mt8192-asurada-spherion-r0: enable add all supported fluster tests Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Don't specify the platforms manually as they are already mentioned in test-job-arm64-mediatek * config: sc7180-trogdor-kingoftown/lazor-limozeen: enable add all supported fluster tests Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Use test-job-arm64-qualcomm instead and carete separate jobs for qualcomm devices - Don't specify platforms manually as they are already mentioned in test-job-arm64-qualcomm * build(deps): bump uwsgi from 2.0.21 to 2.0.22 in /docker/lava-callback Bumps [uwsgi](https://uwsgi-docs.readthedocs.io/en/latest/) from 2.0.21 to 2.0.22. --- updated-dependencies: - dependency-name: uwsgi dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> * pipeline.yaml: Add stable-rc build variants Add more build variants for stable-rc tree to match legacy system. Signed-off-by: Denys Fedoryshchenko <[email protected]> * result_summary: add error classification Classify errors according to patterns in the logs Signed-off-by: Helen Koike <[email protected]> * result_summary presets: add collabora-chromeos-kernel and media trees for fluster tests Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: Use media-stage instead of media-tree Signed-off-by: Muhammad Usama Anjum <[email protected]> * config/pipeline: enable android branches from legacy Enable all android branches from the legacy system Signed-off-by: Helen Koike <[email protected]> * trigger: Add exclude/include tree list for trigger As we need to restrict list of running kernels on staging, we need to add option allowing that. Also it will be good to exclude staging kernels from production kernel list. So in case of staging we need to run kernels only from tree "kernelci" and sometimes something else, for example "mediatek". Option will look like: --trees kernelci,mediatek or --trees kernelci On production we need to exclude trees kernelci and buggytree: --trees !kernelci,buggytree or just kernelci: --trees !kernelci Purpose of this option is that our compiling capacity is limited, and right now staging and production both compiling very large set of kernels, we need to reduce this amount to drop costs. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: platforms-chromeos: use CrOS R124 files ChromeBooks were upgraded with a new image based on ChromiumOS R124, so we must use those files now. Signed-off-by: Arnaud Ferraris <[email protected]> * config: jobs-chromeos: drop non-existent Tast tests Those were removed between R120 and R124 and therefore cause test failures with the new images. Signed-off-by: Arnaud Ferraris <[email protected]> * result_summary presets: fix acpi kselftest presets We're interested in catching regressions and failures in the both the kselftest-acpi test suites and its test cases. Match the nodes by group in the presets accordingly. Fix template used by the failure monitor preset. Signed-off-by: Laura Nao <[email protected]> * src: update return values of `APIHelper.receive_event_node` `APIHelper.receive_event_node` method is used to receive node data from PubSub event. The method has been updated to return `is_hierarchy` flag as well which represents events related to node hierarchy. Update pipeline services using the method accordingly. Signed-off-by: Jeny Sadadia <[email protected]> * result_summary presets: refine presets for v4l2-decoder-conformance Modify the regression preset to monitor regressions on both the v4l2-decoder-conformance test suites and its test cases, by matching the nodes by group instead of by name. Also, change the failure preset to monitor for all errors caused by runtime errors. Signed-off-by: Laura Nao <[email protected]> * result_summary presets: add summary presets for v4l2-decoder-conformance Add summary presets to fetch regressions and failures on v4l2-decoder-conformance tests. Two of the presets are the same used by the monitor; add one additional preset to fetch all the failures on both the test suites and their test cases. Signed-off-by: Laura Nao <[email protected]> * lava_callback.py: Remove error_code/error_msg on lava-callback Sometimes due congestion node might be set to timeout, but then result might arrive late and we need to use it properly. Signed-off-by: Denys Fedoryshchenko <[email protected]> * result_summary presets: fix dt kselftest presets Fix the dt kselftest preset, just like was done for the acpi one, as the current preset doesn't match the actual results we're interested in. Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * doc/connecting-lab: refine documentation Refine documentation for connecting LAVA labs and submitting jobs to the lab. Signed-off-by: Jeny Sadadia <[email protected]> * lava_callback: Sometimes we get totally invalid log file uploaded Most likely problems lays in threading of flask, and possibly callbacks are getting mixed. This commit attempts to introduce several countermeasures against that. Signed-off-by: Denys Fedoryshchenko <[email protected]> * doc: add `_index.md` page Add index documentation page. Signed-off-by: Jeny Sadadia <[email protected]> * doc: add `pipeline-details` page Move `pipeline-details` documentation from the API repository to this repo to make it close to the source. Signed-off-by: Jeny Sadadia <[email protected]> * doc/connecting-lab: adjust `weight` property Change `weight` property of existing doc page to accommodate with transition of pipeline related docs to pipeline repo. Signed-off-by: Jeny Sadadia <[email protected]> * doc: add `developer-documentation` page Add developer manual documentation. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add lab config for Qualcomm Add an entry to `runtimes` section for Qualcomm lab configurations. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add `baseline-x86` job for qualcomm Add job configuration `baseline-x86-qualcomm` for running baseline job in Qualcomm LAVA lab. Add scheduler entry as well. Signed-off-by: Jeny Sadadia <[email protected]> * docker-compose.yaml: add lab-qualcomm runtime Add runtime argument `lab-qualcomm` to `scheduler-lava` container. This will enable the pipeline to run and submit jobs to Qualcomm LAVA lab. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add `baseline-arm64` job for qualcomm Add job configuration `baseline-arm64-qualcomm` for running baseline job for `arm64` in Qualcomm LAVA lab. Add scheduler entry as well. Signed-off-by: Jeny Sadadia <[email protected]> * pipeline.yaml: Update RISC-V configs 1)rv32 defconfig doesn't exist, remove 2)nommu_k210_defconfig have modules disabled Signed-off-by: Denys Fedoryshchenko <[email protected]> * lava_callback.py: Sanitize lava log data As we use this data in reports, lets remove all non-printable characters as they confuse grafana, browsers and others. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config/runtime/kunit.jinja2: fix result map Fix result map for skipped tests. Initially, API didn't have `skip` available node result in the schema. That's why it was mapped to `None` result. But now API has `skip` result to denote skipped tests. Fix the result mapping accordingly. Signed-off-by: Jeny Sadadia <[email protected]> * config: jobs-chromeos: Add lab-setup fragment Add the lab-setup fragment to the chromebook builds, which contains the architecture independent kernel configs needed to run tests on the platform. Notably this disables IP autoconfig by the kernel. The result of this change is that the 12 seconds boot delay and the consequent deferred probe pending warnings will no longer happen on any platform. Particularly on mt8186-corsola-steelix-sku131072 (due to a different network adapter being used) on which it was still happening. Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * lava_callback: bump up slightly threads number Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: chromeos: enable watchdog reset test on Chromebooks Add a basic test to verify watchdog reset functionality. Enable the test on all ARM64 and AMD x86_64 Chromebooks. For Intel Chromebooks, enable the test only on octopus, as ACPI PM Timer on the other devices has been disabled in coreboot. Signed-off-by: Laura Nao <[email protected]> * src/send_kcidb: use schema version 4.3 Test status `MISS` was added to KCIDB in schema v4.2 and supported by the latest version i.e. v4.3. Hence, use the latest version for submission as API may send a few tests with "MISS" status. Signed-off-by: Jeny Sadadia <[email protected]> * send_kcidb: re-structure code for parsing checkout node Move code for parsing checkout node to a separate method. Add `valid` field to parsed checkout node. It denotes if source code was successfully checked out. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: print more information on invalid data Print details for invalid revision data for the sake of debugging. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: optimize `kcidb` import Remove redundant `kcidb` import and adjust kcidb Client call accordingly. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: remove keys with `None` values KCIDB doesn't allow `None` as field value. Remove all optional fields with `None` value to make it valid data for submitting to KCIDB. Signed-off-by: Jeny Sadadia <[email protected]> * config: add `kcidb_test_suite` property Every KernelCI test will be mapped to a unified test suite for KCIDB data submission. Add `kcidb_test_suite` property to test job definitions in YAML configuration files. The added property will store the mapped KCIDB test suite name. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: parse and submit node test and build data Listen to all the node events with node state `done` or `available` and submit the node to KCIDB. Parse node received from the event and create KCIDB schema compatible object based on type of the node i.e. checkout, build or test. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: set `log_excerpt` for builds and tests Fetch logs from compressed log file(*.log.gz) URL and send last 16*1024 characters for setting `log_excerpt` field for build and test nodes as it is the max allowed length of the KCIDB field. Signed-off-by: Jeny Sadadia <[email protected]> * config/jobs-chromes: add kcidb test suite property for watchdog test Add KCIDB test suite mapping for `watchdog_reset` test. Signed-off-by: Jeny Sadadia <[email protected]> * lava_callback.py: disable log removal from callback data We need it for investigations if we have any critical data loss during log sanitizing. Signed-off-by: Denys Fedoryshchenko <[email protected]> * src/send_kcidb: add error info to build nodes Add error metadata fields such as `error_code` and `error_msg` to `misc` field for build nodes. Signed-off-by: Jeny Sadadia <[email protected]> * result_summary presets: add watchdog-reset presets for mainline/next Add monitor and summary presets to track the results from the watchdog reset test on the mainline and next trees. Signed-off-by: Laura Nao <[email protected]> * pipeline.yaml: Fix fluster rootfs URL Signed-off-by: Denys Fedoryshchenko <[email protected]> * src/send_kcidb: get error metadata for failed/incomplete tests Tweak condition to get error metadata for test nodes. It should get error info for incomplete nodes as well and not just failed nodes. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: send tests only if KCIDB test mapping exists All test suite definitions must have `kcidb_test_suite` property i.e. KCIDB test suite mapping. Only send tests for those the mapping is found. Signed-off-by: Jeny Sadadia <[email protected]> * tests/validate_yaml: add validation for KCIDB mapping To submit KernelCI generated data to KCIDB, it is required to have a mapping for all the job definition with `kcidb_test_suite` property. Add validation to ensure all the jobs have a mapping present to avoid missing data submission. This check is to notify test authors trying to enable tests in maestro to include the required property for the mapping in their definition. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add qcs6490-rb3gen2 boot test Signed-off-by: Milosz Wasilewski <[email protected]> * config: chromeos: Enable kselftest-dt on Qualcomm platforms Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * pipeline.yaml: Add one um build for android trees As per request of Android team it will be good to check for breakages UM builds as well. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: use `kind=job` for test suites As part of re-structuring test hierarachy, `Job` model has been introduced for test suite/job nodes. It uses node kind `job`. Update test configurations in `pipeline.yaml` and `jobs-chromeos.yaml` to use `kind=job` to generate job nodes. Signed-off-by: Jeny Sadadia <[email protected]> * config/runtime/kunit.jinja2: provide `kind` value for child tests In case of submitting test hierarchy, child nodes by default inherit `kind` value from parent node. As we are re-structuring test hierarchy, test suit/job nodes will have `kind=job` where its child test nodes will have `kind=test`. Provide `kind` field explicitly to test result hierarchy to preserve different kind value than the parent node. Signed-off-by: Jeny Sadadia <[email protected]> * config/runtime/kunit.jinja2: fix `NameError` Fix the below error in `_submit` method: ``` Traceback (most recent call last): File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 287, in main job.submit(results) File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 138, in submit self._submit(result) File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 265, in _submit return node NameError: name 'node' is not defined ``` Signed-off-by: Jeny Sadadia <[email protected]> * config/runtime/kunit.jinja2: evaluate job node result Evaluate job node result from child node results if `null` result is receive from test result parser. For example nodes such as `fortify`: https://staging.kernelci.org:9000/viewer?node_id=6670ab43d0b7694b399897c4 Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: fix parsing of KUnit log file Handle both compressed(gzip) and plain text log files for getting log excerpt. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: HTTP exception handling for log excerpt Add HTTP exception handling for getting log excerpt data. Signed-off-by: Jeny Sadadia <[email protected]> * config: platforms-chromeos: Add serial delay for some Mediatek platforms Add test_character_delay to the Spherion, Tomato and Steelix platforms to workaround the fact that they're sometimes unable to process serial input fast enough, resulting in mangled commands and consequently flaky test results, as described in https://github.com/kernelci/kernelci-project/issues/366. The right place to do this change would be in the device-type template as described in LAVA's documentation [1]. This overriding in KernelCI is meant only as a temporary workaround to verify whether this fixes the issue. If it does, then we'll do it in LAVA upstream instead. [1] https://docs.lavasoftware.org/lava/debugging.html#differences-in-input-speeds Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * config: chromeos: Enable error-logs kselftest for MediaTek Chromebooks Run the error-logs kselftest on MediaTek Chromebooks. This test is currently under review upstream [1] so, in the meantime, it has been added to the collabora-next tree so it can prove its value by helping to detect issues upstream. [1] https://lore.kernel.org/all/[email protected] Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * config/pipeline.yaml: enable CIP lab Add configuration for LAVA CIP lab. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add baseline-x86 test for CIP Add `baseline-x86-cip` test to be submitted to CIP LAVA lab. Signed-off-by: Jeny Sadadia <[email protected]> * docker-compose.yaml: add `lab-cip` runtime Add runtime argument `lab-cip` to `scheduler-lava` container. This will enable the pipeline to run and submit jobs to CIP LAVA lab. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: enable `job` node submission to KCIDB Parse newly added job node and its child tests for KCIDB submission. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: don't submit `setup` test suite nodes `setup` test suite has been introduced to store test results for environment setup checks before running actual test suite. KCIDB doesn't require `setup` test suite result as long as main test job result is submitted. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: add a check before sending data Check if parsed data is available before sending revision data to KCIDB. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: fix logs Fix log statement about submitting node to KCIDB as we are not sending all the nodes we receive event for to KCIDB. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: handle skipped tests Do not retrieve artifacts or metadata from parent node for skipped tests as in pratice only kernel revision, test runtime and platform will be available for skipped tests. Signed-off-by: Jeny Sadadia <[email protected]> * result_summary/utils: ignore failures on log retrieval Make the script continue running if there was an error fetching a test log. Signed-off-by: Ricardo Cañuelo <[email protected]> * doc/developer-documentation: add docs for enabling new tests Add developer documentation for enabling new tests. Signed-off-by: Jeny Sadadia <[email protected]> * Fix links after docs page migration Documentation has been migrated to the "docs.*" subdomain. Signed-off-by: Paweł Wieczorek <[email protected]> * pipeline.yaml: Add kcidebug fragment Add useful low-overhead debug option to kernel, and test on most x86 boards we have available, with minimal baseline tests. Signed-off-by: Denys Fedoryshchenko <[email protected]> * configs: update gcc-10 to gcc-12 As we upgrade compiler images, we need update gcc version Signed-off-by: Denys Fedoryshchenko <[email protected]> * regression_tracker: workaround: match node paths programatically Don't use 'path' as an api search parameter. The use of lists as query parameters (path is a list) is undefined. Instead, do the filtering in code. Signed-off-by: Ricardo Cañuelo <[email protected]> * config: remove qemu jobs from lab-qualcomm QEMU jobs use container pulled from hub.docker.com. After the lab move pulling from this registry is no longer possible at Qualcomm. This patch disables QEMU jobs from Qualcomm lab. Signed-off-by: Milosz Wasilewski <[email protected]> * validate_yaml.py: Improve pipeline validation Add validation that scheduler entries have matching job entry, this is critical validation, and job entries have at least one entry in the scheduler. Fix one entry detected by this validation Signed-off-by: Denys Fedoryshchenko <[email protected]> * pipeline.yaml: Add broonie(Mark Brown) trees to pipeline It is time to enable even more trees. Signed-off-by: Denys Fedoryshchenko <[email protected]> * validate_yaml.py: Add additional verification for duplicate keys We might have redefined same keys in different yaml files, this tool will ensure consistency of this entries. Signed-off-by: Denys Fedoryshchenko <[email protected]> * validate_yaml.py: Remove path separator Signed-off-by: Denys Fedoryshchenko <[email protected]> * validate_yaml.py: Rename variable to schedules Signed-off-by: Denys Fedoryshchenko <[email protected]> * config/kernelci.toml: update KCIDB origin name As we agreed to refer new KernelCI API & Pipeline as "maestro", use the new name while submitting data to KCIDB. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: update KCI result mapping with KCIDB status Update evaluation of KCIDB status from KCI result. Create 2 categories for error codes: 1. When pre-check tests completed but actual test suite coudln't run - this will have `MISS` status 2. When pre-check tests completed, actual test suite could run but somehow couldn't complete - this will have `ERROR` status Some LAVA error codes can occur at any point of execution such as `Cancelled` and `Test`. Listed such error codes to the most relevant category based on analysis of available results. Signed-off-by: Jeny Sadadia <[email protected]> * result_summary presets: fix presets for v4l2-decoder-conformance Following recent updates to data representation on KernelCI nodes, the top-level nodes for tests now have their kind set to 'job' instead of 'test'. Update the presets for v4l2-decoder-conformance tests accordingly. Signed-off-by: Laura Nao <[email protected]> * result_summary presets: fix output file name in kselftest-acpi preset Signed-off-by: Laura Nao <[email protected]> * config: enable dmabuf-heaps, exec and iommu kselftest suites Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Add kcidb_test_suite * config: result-summary: add generic rule to monitor failures and regression Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: Add rt-stable builds Copy rt-stable builds from legacy KernelCI. Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Major changes to move to new way of writing kbuild jobs * config: pipeline: Add v6.6-rt branch for builds Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: result-summary: add rt-stable kbuilds presets Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: chromeos: Add 'nfs' suffix to KCIDB suite name for baseline-nfs The baseline test is currently run with both ramdisk and nfs rootfs. To distinguish baseline-nfs tests in KCIDB, add an 'nfs' suffix to the KCIDB test suite name. Signed-off-by: Laura Nao <[email protected]> * aks: Add kubernetes kcidb deployment We need file that will manage deployment of kcidb bridge in kubernetes production deployment. Signed-off-by: Denys Fedoryshchenko <[email protected]> * kubernetes: Adjust trigger k8s options Ignore kernelci tree on production, as it is special "staging"-only tree, and read all /config directory, not just default pipeline.yaml. Signed-off-by: Denys Fedoryshchenko <[email protected]> * regression_tracker: bugfix: catch empty search condition Fix _get_last_matching_node(), after the previous change there was an unhandled scenario where nodes may be empty but the function wouldn't return None immediately. Signed-off-by: Ricardo Cañuelo <[email protected]> * config: pipeline: correct the kind of kselftest suites to job Signed-off-by: Muhammad Usama Anjum <[email protected]> * scheduler-chromeos.yaml: Temporarily disable non-essential tast tests As per discussion, we disable temporary tast tests which unlikely will be reviewed. Signed-off-by: Denys Fedoryshchenko <[email protected]> * k8s/aks: Update deployment files 1)Update memory limit, as working with linux sources might require 3Gbyte of RAM. 2)Update config file path 3)Add callback environment variable 4)Update image reference to fresh one Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: pipeline: enable android builds with gcc-12 for all architectures Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: enable android builds with clang-17 for all architectures Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: remove build_variants from android build_configs The build_variants is legacy way to specify the different variants. We have moved to the newer way to specify the variants. Hence remove the build_variants from android build_configs. Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: add android15-6.6-lts branch for build as well The android15-6.6-lts has been included recently in legacy KernelCI: https://github.com/kernelci/kernelci-core/pull/2597 Add the same in newer KernelCI. Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: add blocklist for riscv older kernels for android builds Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: update KCIDB test suite mapping for baseline Use `boot` as KCIDB test suite mapping for all baseline tests. Signed-off-by: Jeny Sadadia <[email protected]> * callback_url: Update config and README As we are moving callback URL to environment variable, updating config and README accordingly. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: pipeline: enable android baseline (boot) testing for arm and arm64 in only allmodconfig Signed-off-by: Muhammad Usama Anjum <[email protected]> * scheduler.py: If event have jobfilter, inject it to the node data When someone generate artificial event with jobfilter, this is likely maintainer trying to repeat job. Treat this accordingly, and inject job filter to job node, so we will run only tests maintainer wants. Signed-off-by: Denys Fedoryshchenko <[email protected]> * lava_callback: migrate to fastapi It will be easier to maintain API and Pipeline, as both will be powered by FastAPI framework. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: chromeos: Update fluster rootfs URL Signed-off-by: Laura Nao <[email protected]> * config: pipeline: fix defconfigs in fragments Signed-off-by: Muhammad Usama Anjum <[email protected]> * kbuild.jinja2: support defconfig as list or str As required in https://github.com/kernelci/kernelci-core/pull/2608 defconfig might be two types. Support it in jinja2 accordingly. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: piepline: add kbuilds of lee-mfd with default defconfigs Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: enable baseline testing for mfd for one board of each arch Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: fix platform sections for Qualcomm and Android schedules Signed-off-by: Paweł Wieczorek <[email protected]> * k8s: Update deployment to uvicorn, as we use fastapi now Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: pipeline: Unblock android runs on lava-collabora Signed-off-by: Muhammad Usama Anjum <[email protected]> * pipeline: Enable preempt-rt cyclictest test Enable the first preempt-rt test, cyclictest in new KernelCI. Enable it on all platforms. Since these are all smoke test there is no point in running them too long. Thus reduce the runtime per test to one minute. This should keep the total preempt-rt runtime roughly in the same time frame. The changes have been ported from Daniel's PR [1]. [1] https://github.com/kernelci/kernelci-core/pull/2397 Signed-off-by: Daniel Wagner <[email protected]> Co-developed-by: Muhammad Usama Anjum <[email protected]> Signed-off-by: Muhammad Usama Anjum <[email protected]> * pipeline: add all the test jobs for all rt-test Add jobs definition of all the rt-tests. Enable cyclicdeadline and rtla tests to run on all targets. The changes have been ported from Daniel's PR [1]. [1] https://github.com/kernelci/kernelci-core/pull/2397 Signed-off-by: Daniel Wagner <[email protected]> Co-developed-by: Muhammad Usama Anjum <[email protected]> Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: add template and test properties for preempt_rt jobs Add template, job add kcidb_test_suite properties for all preempt-rt jobs Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: rename preempt-rt to rt-tests which is correct name of tests The legacy was using preempt-rt name of tests. But the repository has rt-tests name. We must use the same name to merge with execution results coming from other CIs in KCIDB. Suggested-by: Jeny Sadadia <[email protected]> Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: add the correct nfsroot for rt-tests Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: Remove android's deprecated branches It has been confirmed with Todd that we should remove the deprecated branches. Hence remove those branches. Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: run baseline on non-allmodconfig The allmodconfig generates very large kernel image. It cannot be booted on the arm64 and arm targets as tftp errors out that size is too large. Reduce the kernel image size. Use the default defconfig. The same defconfigs have been booting for other trees. Signed-off-by: Muhammad Usama Anjum <[email protected]> * doc: developer-documentation: Update documentation by adding more details - Reorganize some things - Specify how to write different variants by removing old syntax - Give two separate templates for kbuild and test - Try to put more details for new contributors Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes since v1: - Fix type - Apply suggestions from code review * doc/developer-documentation: fix a glitch in enabling new tree section Fix a minor bug in YAML block formatting. Fixes: f5f57de ("doc: developer-documentation: Update documentation by adding more details") Signed-off-by: Jeny Sadadia <[email protected]> * doc/developer-documentation: update a section title Rename a section from "Enabling a new Kernel tree" to "Enabling new KernelCI trees, builds, and tests" as it explains enabling tests as well. Signed-off-by: Jeny Sadadia <[email protected]> * config: use the new `tree:branch` format for rules For cases where we want a single branch to be allowed for a given tree, we can now use the `tree:branch` format in rules. Convert existing rules accordingly. Signed-off-by: Arnaud Ferraris <[email protected]> * config: pipeline: fix improper use of "filters" attribute The `filters` param was used in the legacy system but has been replaced by `rules`, with a different syntax. For Android RISC-V builds, this was used to deny job execution on kernels < 4.19, so let's translate this condition with the rules format, and do a similar change for the `rt-tests`-based jobs. Signed-off-by: Arnaud Ferraris <[email protected]> * config/pipeline.yaml: Fix x86 typo in kcidebug job names The kcidebug jobs that run on MediaTek and Qualcomm platforms should have arm64 in the name rather than x86. Fix the typo. Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * config: pipeline: remove params The parameters are only needed when they are changed or appeneded. Remvoe the parameters which aren't being modified. Signed-off-by: Muhammad Usama Anjum <[email protected]> * validate_yaml.py: Jobs are required to have template parameter Add more validation to config files of mandatory parameters. Signed-off-by: Denys Fedoryshchenko <[email protected]> * validate_yaml.py: Add more job validations Add basic validation, each job must have kind parameter Signed-off-by: Denys Fedoryshchenko <[email protected]> * workflows: Add label on CI check failures Automatically add label so broken PR wont go to staging Signed-off-by: Denys Fedoryshchenko <[email protected]> --------- Signed-off-by: Jeny Sadadia <[email protected]> Signed-off-by: Nícolas F. R. A. Prado <[email protected]> Signed-off-by: Denys Fedoryshchenko <[email protected]> Signed-off-by: Ricardo Cañuelo <[email protected]> Signed-off-by: Helen Koike <[email protected]> Signed-off-by: Arnaud Ferraris <[email protected]> Signed-off-by: Laura Nao <[email protected]> Signed-off-by: Muhammad Usama Anjum <[email protected]> Signed-off-by: Shreeya Patel <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Milosz Wasilewski <[email protected]> Signed-off-by: Paweł Wieczorek <[email protected]> Signed-off-by: Daniel Wagner <[email protected]> Co-authored-by: Jeny Sadadia <[email protected]> Co-authored-by: Nícolas F. R. A. Prado <[email protected]> Co-authored-by: Ricardo Cañuelo <[email protected]> Co-authored-by: Helen Koike <[email protected]> Co-authored-by: Arnaud Ferraris <[email protected]> Co-authored-by: Laura Nao <[email protected]> Co-authored-by: Muhammad Usama Anjum <[email protected]> Co-authored-by: Shreeya Patel <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Milosz Wasilewski <[email protected]> Co-authored-by: Paweł Wieczorek <[email protected]> Co-authored-by: Milosz Wasilewski <[email protected]> Co-authored-by: Daniel Wagner <[email protected]> Signed-off-by: Denys Fedoryshchenko <[email protected]>
* src/scheduler: store error message when job fails with "submit_error" It is helpful for debugging to catch error message when scheduler fails to submit job to runtime. Store the error message to `data.error_msg` field. Signed-off-by: Jeny Sadadia <[email protected]> * config: pipeline: Set minimum kernel version for DT kselftest to 6.7 The test was introduced upstream in version 6.7, so no point in trying to run it on earlier versions. Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * configs/: Update volteer device Update volteer devices according lab availability Signed-off-by: Denys Fedoryshchenko <[email protected]> * result_summary templates: detailed output for active/inactive regressions Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: new presets for active regressions Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: update CHANGELOG Signed-off-by: Ricardo Cañuelo <[email protected]> * data: chmod -R 777 ./data/output to avoid permission error Avoid errors like PermissionError: [Errno 13] Permission denied: '/home/kernelci/data/output/stable-rc-boot.html' Signed-off-by: Helen Koike <[email protected]> * result_summary: move code to _get_logs Signed-off-by: Helen Koike <[email protected]> * result_summary: use ThreadPoolExecutor to fetch logs Fetching logs is the bottleneck of the script. Fetch them in parallel with ThreadPoolExecutor. Signed-off-by: Helen Koike <[email protected]> * result_summary: fix result presets stable-rc-build-failures and stable-rc-boot-failures weren't querying specifically for test failures. Signed-off-by: Ricardo Cañuelo <[email protected]> * src/regression_tracker: rework regression detection Take into account "active" and "inactive" regressions when creating them and when processing new passed or failed nodes. When a node passes, it checks if it "inactivates" an existing "active" regression. When a node fails, it checks if it needs to create a new regression or update an existing "active" one. Signed-off-by: Ricardo Cañuelo <[email protected]> * src/regression_tracker: link failed nodes to active regressions When a failed node generates a regression, or when it's a re-run of a run that generated a still active regression, link the node to the regression id. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: support for date ranges for creation and update New command line options to let the user specify date ranges for node creation and last update: --created-from, --created-to, --last-updated-from, --last-updated-to Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: support for date ranges for creation and last update Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: support for extra query parameters in cmdline New command line option: --query-params to specify a set of extra query parameters to complete or override preset parameters. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: html markup in some preset titles Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary changelog: update and move to docs folder Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: move parameter loading and processing to 'setup' Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: refactor and split into two clases (single, run) Split the ResultSummary class into a base class and two child classes: ResultSummarySingle and ResultSummaryLoop (only a stub at this point). Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: WIP initial implementation of the "loop" command Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: huge refactoring Implement "summary" (single-shot) and "monitor" (loop) modes based on preset parameters instead of on the command-line main command. Split the logic into multiple files, move all monitor-specific and summary-specific code to independent files, common code in a separate file. Full of kludges, I don't like how this is looking so far, might consider reimplementing it without any dependencies on pipeline code. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: fix markup and indentation Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: new generic templates for monitor mode Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: examples for "monitor" and "summary" modes Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary changelog: summary and monitor modes Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: fix generic regression report Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: summary: fix last_updated option handling Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: embed css stylesheet in html files Signed-off-by: Ricardo Cañuelo <[email protected]> * regression_tracker: [trivial] make regression active by default Fixup for commit fcb29501663d78920bcd129bd57c36b9af624bc4 If the "result" field is ever made non-optional in the models we can probably remove this. Signed-off-by: Ricardo Cañuelo <[email protected]> * regression_tracker: [trivial] set default empty node sequence Fixup for commit fcb29501663d78920bcd129bd57c36b9af624bc4 If the "node_sequence" field is ever made non-optional in the models we can probably remove this. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: add cmdline option --output-dir Introduce a new command-line option: --output-dir, and rename the old --output to --output-file. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary changelog: command-line options change Signed-off-by: Ricardo Cañuelo <[email protected]> * config: jobs-chromeos: remove meaningless Tast tests Several Tast tests can only fail in the context of KernelCI: * `video.PlatformDecoding.v4l2_state*_vp9_0_svc` do not actually exist, causing the whole test job to fail * `platform.DLCService*` and `platform.Memd` rely on features only present in the downstream Chrom{e,ium}OS kernel (see b/247467814 and b/244479619 for those having access to Google's issue tracker) * `kernel.ConfigVerify.chromeos` relies on downstream-only config options such as `CONFIG_SECURITY_CHROMIUMOS` and other similar ones, and therefore can only fail when testing upstream kernels Signed-off-by: Arnaud Ferraris <[email protected]> * config: scheduler-chromeos: don't execute non-working Tast tests Currently, HEVC-related tests are known to either fail or be skipped as ChromeOS doesn't yet handle hardware decoding of HEVC media. This is expected to be fixed at some point though, so we're keeping the job definitions and only remove the corresponding scheduler entries in order to reinstate those jobs when relevant. Signed-off-by: Arnaud Ferraris <[email protected]> * config: jobs-chromeos: exclude Tast tests known to always fail Several decoder tests always fail on all platforms where they're executed, adding only noise to otherwise useful test results. Disable those for improving the quality of the results. Signed-off-by: Arnaud Ferraris <[email protected]> * config: chromeos: add special case for pre-6.7 qcom codec tests On Qualcomm-based ChromeBooks (`trogdor` being the only model in Collabora's lab), we noticed systematic failures of all `vp9_*_frm_resize` and `vp9_*_sub8x8_sf` tests when using a kernel up to 6.6. With 6.7 and above, all of those tests (except one) now pass. It therefore makes sense to exclude those on pre-6.7 kernels so we don't report known failures and get rid of some noise. This involves "duplicating" affected test jobs (although I did my best to minimize that) and setting rules so only the working variant is executed, based on the version of the kernel being tested. Signed-off-by: Arnaud Ferraris <[email protected]> * lava_callback: Compress the log files to save storage space As storage space in cloud and egress have high costs, better to compress potentially large files. Signed-off-by: Denys Fedoryshchenko <[email protected]> * tests: Add basic yaml validation Add yaml load to figure out earlier issues with yaml Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: chromeos: drop stoneyridge/pineview naming in platforms anchors The "stoneyridge" and "pineview" naming used in the Chromebook platform anchors refers to ChromiumOS specific config fragments, but doesn't necessarily match the actual platform of all the devices listed. Use more generic names to distinguish amd and intel Chromebooks. Signed-off-by: Laura Nao <[email protected]> * config: chromeos: rename test job anchors that use chromeos specific configs Rename test job anchors that use chromeos specific kernel configurations to include the 'chromeos' infix. Signed-off-by: Laura Nao <[email protected]> * config: chromeos: add baseline tests Enable the baseline tests on all the supported Chromebooks with their default kernel configuration. Signed-off-by: Laura Nao <[email protected]> * config: chromeos: drop stoneyridge/pineview naming in job defs The "stoneyridge" and "pineview" naming used in some Chromebook job definitions refers to ChromiumOS specific config fragments, but doesn't necessarily match the actual platforms targeted by the jobs. Replace all occurrences with more generic intel/amd naming. Signed-off-by: Laura Nao <[email protected]> * config: chromeos: drop chromeos infix from baseline jobs Keeping different job names for tests targeting different kernel configs might cause too much duplication. Drop the 'chromeos' infix from the job name for the tests using the chromeos config fragment. Users will be able to filter the results using the data.defconfig/data.config_full fields anyway. Signed-off-by: Laura Nao <[email protected]> * result_summary: post-process results for summary and monitor modes Split the post-processing of nodes to a common function that can be used for both summary and monitor modes. Currently, post-processing involves only the collection of logs. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: update and fix presets and templates Signed-off-by: Ricardo Cañuelo <[email protected]> * doc/result-summary-CHANGELOG: update Signed-off-by: Ricardo Cañuelo <[email protected]> * config/pipeline.yaml: enable 'BayLibre' lab Add lab configuration for BayLibre. Signed-off-by: Jeny Sadadia <[email protected]> * docker-compose.yaml: add `lab-baylibre` runtime Add runtime argument `lab-baylibre` to `scheduler-lava` container. This will enable the pipeline to run and submit jobs to BayLibre. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add `baseline-x86-baylibre` job Add job configuration `baseline-x86-baylibre` for BayLibre. Add scheduler entry as well. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add `baseline-armel-baylibre` job Add job configuration `baseline-armel-baylibre` for BayLibre. Add scheduler entry and platform config as well. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline: enable `android` tree and build configs Monitor linux `android` tree. Add build configs for `android-mainline` branch. Signed-off-by: Helen Koike <[email protected]> * config/pipeline.yaml: add kbuild definitions for android-mainline Add kbuild jobs to compile the kernel for android-mainline branch Signed-off-by: Helen Koike <[email protected]> * config/pipeline.yaml: add entries to schedule to build android-mainline Add entries to `scheduler:` section to run the builds for android-mainline. Signed-off-by: Helen Koike <[email protected]> * result_summary: fix node filter in monitor mode Signed-off-by: Ricardo Cañuelo <[email protected]> * kernelci.toml: set `checkout` node timeout to `180 min` Currently set `60 min` timeout is not enough as some `kbuild` jobs and its sub-tests take around 2 hrs to complete after getting submitted to runtime. Here is an example from staging. See the information for a `checkout` and its child nodes: | id | name | created | updated | timeout | |--------------------------|---------------------|----------------------------|----------------------------|----------------------------| | 661c9d59b60b785eb9fc42b0 | checkout | 2024-04-15T03:22:01.317000 | 2024-04-15T03:51:03.870000 | 2024-04-15T04:22:01.284000 | | 661c9d97b60b785eb9fc42b4 | kbuild-gcc-10-arm64 | 2024-04-15T03:23:03.399000 | 2024-04-15T03:50:15.031000 | 2024-04-15T09:23:03.399000 | | 661ca3f7b60b785eb9fc4ead | baseline-arm64 | 2024-04-15T03:50:15.304000 | 2024-04-15T05:09:45.247000 | 2024-04-15T09:50:15.304000 | Signed-off-by: Jeny Sadadia <[email protected]> * result_summary: add email report capabilities for monitor mode Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: plain text single report templates Signed-off-by: Ricardo Cañuelo <[email protected]> * config: chromeos: add baseline-nfs tests Enable the baseline-nfs tests on all the supported Chromebooks, with both the default and the chromeos kernel configurations. Signed-off-by: Laura Nao <[email protected]> * src/timeout: set `checkout` result For `TIMEOUT` mode, set `checkout` node result to `fail` if its state is `running` as it means code checkout is still going on and node timed-out. Set it to `pass` if its state is any other than `running`. Set `checkout` node result to `pass` if mode is `DONE` as it means once `checkout` has been in `available` or `closing` state and it could successfully complete source code checkout. Signed-off-by: Jeny Sadadia <[email protected]> * regression_tracker: bugfix, failed test with no prior runs Handle the case of a failed test run when it's the first occurence of that test case. Consider it "not a regression" for now, since we're defining a regression as a "breaking point" between a success and a failure. Signed-off-by: Ricardo Cañuelo <[email protected]> * config: platforms-chromeos: fix dalboz device type Due due to a copy/paste mishap, the device type for `asus-CM1400CXA-dalboz` had a trailing `_chromeos`, leading LAVA to fail finding the correct device type, and no job from the new system running on this platform. Signed-off-by: Arnaud Ferraris <[email protected]> * config: jobs-chromes: run Tast tests only on 5.4+ Current ChromeOS images have `ext4` filesystems using options not present in 4.19. Therefore tests cannot run on kernels that old, and this leads to false positives in corrupt device identification, so we should only run those tests on 5.4 and later kernels. Signed-off-by: Arnaud Ferraris <[email protected]> * config: platforms-chromes: drop non-existent platform `hp-x360-12b-ca0500na-n4000-octopus` isn't a device type available in Collabora's LAVA lab, so let's drop its definition. Signed-off-by: Arnaud Ferraris <[email protected]> * config: exclude android tree from kbuild jobs Only Android-specific kbuild jobs should run for this tree, let's not overload our system with unneeded builds. Take this opportunity to limit mediatek kbuilds to 6.1+ as that's the earliest version that has upstream support for at least one of our devices. Signed-off-by: Arnaud Ferraris <[email protected]> * src/timeout: a bug fix in `_submit_lapsed_nodes` Fix a glitch in the code related to setting `checkout` node result. Fixes: 361fc0d ("src/timeout: set `checkout` result") Signed-off-by: Jeny Sadadia <[email protected]> * pipeline.yaml: Update early access FQDN We are moving k8s from eastus to westus3 as it is cheaper Signed-off-by: Denys Fedoryshchenko <[email protected]> * src/tarball: fix `_kdir` in `update_repo` Fix the below error: ``` kernelci-pipeline-tarball | File "/home/kernelci/./pipeline/tarball.py", line 79, in _update_repo kernelci-pipeline-tarball | kernelci.shell_cmd(f"rm -rf {self._kdir}") kernelci-pipeline-tarball | ^^^^^^^^^^ kernelci-pipeline-tarball | AttributeError: 'Tarball' object has no attribute '_kdir' ``` Fixes: 0a2fe9c ("src/patchset.py: Implement Patchset service) Signed-off-by: Jeny Sadadia <[email protected]> * src/timeout: fix method to get child nodes recursively `TimeoutService._get_child_nodes_recursive` is used to get pending child nodes recursively for closing and timed-out nodes. It overwrites the result while being called recursively. Fix the method to make it work properly. Signed-off-by: Jeny Sadadia <[email protected]> * config: pipeline: rename "armel" arch to "arm" `armel` has various meanings depending on the system: for ChromeOS, it is ARMv7, while in Debian it's ARMv{5T,6}. Moreover, this project is *Kernel*CI and the kernel uses `arm` for all 32-bits ARM devices. In order to avoid confusion (including those wondering what the heck does `armel` mean), let's rename `armel` to `arm`. Signed-off-by: Arnaud Ferraris <[email protected]> * config: use per-system arch property where relevant With the new `*arch` fields present in the platform configurations, we don't have to hardcode the architecture strings in some specific cases. Let's adapt the config files so we use `{cros,deb,k}arch` wherever it makes sense. Signed-off-by: Arnaud Ferraris <[email protected]> * src/timeout: set timed-out `checkout` result Set timed-out `checkout` node result to `incomplete` while in `running` state. As it denotes that the node timed-out while checkout was still going on. Also, set error related information i.e. `error_code` and `error_msg`. Signed-off-by: Jeny Sadadia <[email protected]> * src/tarball: update checkout node when update repo fails Tarball updates source code repo and creates tarball. If update repo operation fails even with second attempt, it means it failed to checkout souce code. Hence, update `checkout` node with state `done` state and result `fail`. Also, set appropriate error information to the `data` field. Signed-off-by: Jeny Sadadia <[email protected]> * config: pipeline: enable collabora-next tree and build config Monitor the collabora-next tree. Add build config for the for-kernelci branch. Signed-off-by: Laura Nao <[email protected]> * config: chromeos: enable acpi kselftest on collabora-next tree Run the ACPI kselftest on the for-kernelci branch of the collabora-next tree. See: https://lore.kernel.org/linux-kselftest/[email protected]/T/#t Signed-off-by: Laura Nao <[email protected]> * result_summary: restore missing split_query_params function Restore this function that was accidentally removed during the last refactoring. Signed-off-by: Ricardo Cañuelo <[email protected]> * lava_callback: Don't upload empty files to Azure There is no use for lot of empty files on Azure, that only complicate cleanup. Signed-off-by: Denys Fedoryshchenko <[email protected]> * result_summary presets: unify preset and output names Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: update preset for aferraris Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: new presets for laura.nao Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: fixes and new presets for nfraprado Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: fix arch query parameters Signed-off-by: Ricardo Cañuelo <[email protected]> * k8s: Lot of deployment tested fixes Fixes in yaml files for k8s production deployment. Signed-off-by: Denys Fedoryshchenko <[email protected]> * result-summary presets: Fix build failure and regression monitors Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * result_summary: added debug traces to the monitor Show detailed info of the node filterings in real time. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary: fix corner case bug when no logs are found Cover rare case where neither the node nor any of its parents up to the checkout node have any log artifacts. Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: refine stable-rc presets Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: add regression info to test reports Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary templates: escape log snippets Signed-off-by: Ricardo Cañuelo <[email protected]> * src: lava_callback: add device ID to node data It can be useful to know the exact device on which a job ran, without having to open the LAVA job page. This is done by querying the device ID from the callback data and appending it to the node data. Signed-off-by: Arnaud Ferraris <[email protected]> * src: lava_callback: upload raw callback data as well Debugging callback issues is complex due to the raw data not being saved after processing. This change ensures we save the callback data as a JSON file in order to ease development. Signed-off-by: Arnaud Ferraris <[email protected]> * DONOTMERGE lava_callback: add debug statements Why the heck doesn't this just work??? Signed-off-by: Arnaud Ferraris <[email protected]> * result_summary_templates: fix error 'node' is undefined The object is named test and not node, so s/node/test Signed-off-by: Helen Koike <[email protected]> * config/runtime/kunit: set architecture info Set architecture field for `kunit` test nodes. If no `arch` argument is supplied, kunit takes `um` (User Mode Linux) as architecture to run tests. Signed-off-by: Jeny Sadadia <[email protected]> * src/timeout: count running child jobs of build nodes Add a method to count running jobs of `kbuild` nodes i.e. jobs being submitted after successful builds. Fox example `baseline` or `tast` jobs. Signed-off-by: Jeny Sadadia <[email protected]> * src/timeout: handle closing `checkout` node differently Usually, `checkout` should be transited to `done` state when all its child nodes are completed. In case of closing `checkout`, take into account running child jobs of build nodes before transiting its state to `done`. Otherwise, `checkout` will be assigned to `done` state even if some child jobs are still running. Signed-off-by: Jeny Sadadia <[email protected]> * src/timeout: handle holdoff reached `checkout` node differently Usually, available `checkout` for which holdoff is reached should be transited to `done` state only when all its child nodes are completed. In case of such `checkout` node, take into account running child jobs of build nodes before transiting its state to `done`. Otherwise, `checkout` will be assigned to `done` state even if some child jobs are still running. Signed-off-by: Jeny Sadadia <[email protected]> * Revert "DONOTMERGE lava_callback: add debug statements" This reverts commit 5ed8218d99840373bbba5830b1976813b52bf4b1. Signed-off-by: Arnaud Ferraris <[email protected]> * Create dependabot.yml * result_summary_templates: make generic-test-failures generic to all results The generic-test-failures templates can be used to show general results just replacing the name "failures" by "results". Makeing it easier to be re-used by communities that want to have pre-sets to list all results of the tests, so: s/generic-test-failures/generic-test-results Signed-off-by: Helen Koike <[email protected]> * result-summary.yaml: add preset to list android build tests Since we now build android, add a preset to allow result-summary.yaml to list all build results from Android tree. Signed-off-by: Helen Koike <[email protected]> * tarball: Implement checkout for specific commit We often need not ToT, but specific commit, implement this. Signed-off-by: Denys Fedoryshchenko <[email protected]> * jobs-chromeos.yaml: Disable module compression for every kernel version Commit d4bbe942098b ("kbuild: remove CONFIG_MODULE_COMPRESS"), introduced in kernel v5.13, substituted CONFIG_MODULE_COMPRESS=n for CONFIG_MODULE_COMPRESS_NONE=y as the way to disable module compression. Since module compression causes "Invalid ELF header magic: != ELF" errors during boot on the ChromeOS base config, add the missing config to disable module compression on kernels > v5.13 as well. Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * src: lava_callback: reduce callback data size The callback data is quite large, especially as it includes the full log which we already upload separately. By dropping it and compressing the whole file with `gzip` we can avoid wasting too much storage space. Signed-off-by: Arnaud Ferraris <[email protected]> * src: lava_callback: don't leak secret token The callback data contains the secret tokens value which shouldn't be leaked. Ensure we drop it from the uploaded data. Signed-off-by: Arnaud Ferraris <[email protected]> * config: platforms-chromeos: use new cros-flash image This ensures we use the new version of the `install-modules` script. Signed-off-by: Arnaud Ferraris <[email protected]> * src: regression_tracker: add the "device" field to regression data This can be helpful. We're not using it as a search param though, as we don't want to narrow down the search that much, using the platform only is better. Signed-off-by: Arnaud Ferraris <[email protected]> * config: result_summary_templates: report device used for job This information is now available, and it can be useful to know the affected device withouth having to look at the LAVA job details. Signed-off-by: Arnaud Ferraris <[email protected]> * kubernetes: Update deployment recipe Update list of labs and add KCI_INSTANCE variable. Signed-off-by: Denys Fedoryshchenko <[email protected]> * lava-callback: Limit threads of lava-callback Due inrush of lava callbacks and slow Azure Files processing, we need to make sure we dont spawn too many threads. Also add hard limit of memory 1Gbyte Signed-off-by: Denys Fedoryshchenko <[email protected]> * result_summary presets: add presetes for fluster test Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Make template generic for all v4l2 tests - Rebase on main * result_summary presets: make the name of fluster test generic Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: enable first fluster test for mt8195-cherry-tomato-r2 Enable first fluster test, AV1-TEST-VECTORS for mt8195-cherry-tomato-r2. Run the test on mainline and next until more trees are added. Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Create generic v4l2-decoder-conformance-job and use anchers from it - Update the rootfs address - Move anchor to _anchor - Update with nitpicks * config: jobs-chromeos: Add kernelci tree for testing purpose Remove this commit before merging. Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: chromeos: Enable cpufreq kselftest Enable cpufreq kselftest on all the trees and branches. Signed-off-by: Shreeya Patel <[email protected]> * result_summary presets: fix preset for kselftest-dt failures monitor Signed-off-by: Ricardo Cañuelo <[email protected]> * result_summary presets: new presets for kselftest-cpufreq Signed-off-by: Ricardo Cañuelo <[email protected]> * config: mt8195-cherry-tomato-r2: enable all fluster tests for all branches Add all the trees and branches on which the tests would be ran. Enable all the tests for tomato. Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - The build config cannot be added yet. Just list the trees, it will only use the branches configured in build_configs: - mainline will use master - next will use master - collabora-chromeos-kernel will use for-kernelci - media will use master and fixes - Remove kernelci tree as it was added just for testing purpose * config: mt8183-kukui-jacuzzi-juniper-sku16: enable add all supported fluster tests Signed-off-by: Muhammad Usama Anjum <[email protected]> jacuzzi * config: mt8186-corsola-steelix-sku131072: enable add all supported fluster tests Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: mt8192-asurada-spherion-r0: enable add all supported fluster tests Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Don't specify the platforms manually as they are already mentioned in test-job-arm64-mediatek * config: sc7180-trogdor-kingoftown/lazor-limozeen: enable add all supported fluster tests Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Use test-job-arm64-qualcomm instead and carete separate jobs for qualcomm devices - Don't specify platforms manually as they are already mentioned in test-job-arm64-qualcomm * build(deps): bump uwsgi from 2.0.21 to 2.0.22 in /docker/lava-callback Bumps [uwsgi](https://uwsgi-docs.readthedocs.io/en/latest/) from 2.0.21 to 2.0.22. --- updated-dependencies: - dependency-name: uwsgi dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> * pipeline.yaml: Add stable-rc build variants Add more build variants for stable-rc tree to match legacy system. Signed-off-by: Denys Fedoryshchenko <[email protected]> * result_summary: add error classification Classify errors according to patterns in the logs Signed-off-by: Helen Koike <[email protected]> * result_summary presets: add collabora-chromeos-kernel and media trees for fluster tests Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: Use media-stage instead of media-tree Signed-off-by: Muhammad Usama Anjum <[email protected]> * config/pipeline: enable android branches from legacy Enable all android branches from the legacy system Signed-off-by: Helen Koike <[email protected]> * trigger: Add exclude/include tree list for trigger As we need to restrict list of running kernels on staging, we need to add option allowing that. Also it will be good to exclude staging kernels from production kernel list. So in case of staging we need to run kernels only from tree "kernelci" and sometimes something else, for example "mediatek". Option will look like: --trees kernelci,mediatek or --trees kernelci On production we need to exclude trees kernelci and buggytree: --trees !kernelci,buggytree or just kernelci: --trees !kernelci Purpose of this option is that our compiling capacity is limited, and right now staging and production both compiling very large set of kernels, we need to reduce this amount to drop costs. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: platforms-chromeos: use CrOS R124 files ChromeBooks were upgraded with a new image based on ChromiumOS R124, so we must use those files now. Signed-off-by: Arnaud Ferraris <[email protected]> * config: jobs-chromeos: drop non-existent Tast tests Those were removed between R120 and R124 and therefore cause test failures with the new images. Signed-off-by: Arnaud Ferraris <[email protected]> * result_summary presets: fix acpi kselftest presets We're interested in catching regressions and failures in the both the kselftest-acpi test suites and its test cases. Match the nodes by group in the presets accordingly. Fix template used by the failure monitor preset. Signed-off-by: Laura Nao <[email protected]> * src: update return values of `APIHelper.receive_event_node` `APIHelper.receive_event_node` method is used to receive node data from PubSub event. The method has been updated to return `is_hierarchy` flag as well which represents events related to node hierarchy. Update pipeline services using the method accordingly. Signed-off-by: Jeny Sadadia <[email protected]> * result_summary presets: refine presets for v4l2-decoder-conformance Modify the regression preset to monitor regressions on both the v4l2-decoder-conformance test suites and its test cases, by matching the nodes by group instead of by name. Also, change the failure preset to monitor for all errors caused by runtime errors. Signed-off-by: Laura Nao <[email protected]> * result_summary presets: add summary presets for v4l2-decoder-conformance Add summary presets to fetch regressions and failures on v4l2-decoder-conformance tests. Two of the presets are the same used by the monitor; add one additional preset to fetch all the failures on both the test suites and their test cases. Signed-off-by: Laura Nao <[email protected]> * lava_callback.py: Remove error_code/error_msg on lava-callback Sometimes due congestion node might be set to timeout, but then result might arrive late and we need to use it properly. Signed-off-by: Denys Fedoryshchenko <[email protected]> * result_summary presets: fix dt kselftest presets Fix the dt kselftest preset, just like was done for the acpi one, as the current preset doesn't match the actual results we're interested in. Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * doc/connecting-lab: refine documentation Refine documentation for connecting LAVA labs and submitting jobs to the lab. Signed-off-by: Jeny Sadadia <[email protected]> * lava_callback: Sometimes we get totally invalid log file uploaded Most likely problems lays in threading of flask, and possibly callbacks are getting mixed. This commit attempts to introduce several countermeasures against that. Signed-off-by: Denys Fedoryshchenko <[email protected]> * doc: add `_index.md` page Add index documentation page. Signed-off-by: Jeny Sadadia <[email protected]> * doc: add `pipeline-details` page Move `pipeline-details` documentation from the API repository to this repo to make it close to the source. Signed-off-by: Jeny Sadadia <[email protected]> * doc/connecting-lab: adjust `weight` property Change `weight` property of existing doc page to accommodate with transition of pipeline related docs to pipeline repo. Signed-off-by: Jeny Sadadia <[email protected]> * doc: add `developer-documentation` page Add developer manual documentation. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add lab config for Qualcomm Add an entry to `runtimes` section for Qualcomm lab configurations. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add `baseline-x86` job for qualcomm Add job configuration `baseline-x86-qualcomm` for running baseline job in Qualcomm LAVA lab. Add scheduler entry as well. Signed-off-by: Jeny Sadadia <[email protected]> * docker-compose.yaml: add lab-qualcomm runtime Add runtime argument `lab-qualcomm` to `scheduler-lava` container. This will enable the pipeline to run and submit jobs to Qualcomm LAVA lab. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add `baseline-arm64` job for qualcomm Add job configuration `baseline-arm64-qualcomm` for running baseline job for `arm64` in Qualcomm LAVA lab. Add scheduler entry as well. Signed-off-by: Jeny Sadadia <[email protected]> * pipeline.yaml: Update RISC-V configs 1)rv32 defconfig doesn't exist, remove 2)nommu_k210_defconfig have modules disabled Signed-off-by: Denys Fedoryshchenko <[email protected]> * lava_callback.py: Sanitize lava log data As we use this data in reports, lets remove all non-printable characters as they confuse grafana, browsers and others. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config/runtime/kunit.jinja2: fix result map Fix result map for skipped tests. Initially, API didn't have `skip` available node result in the schema. That's why it was mapped to `None` result. But now API has `skip` result to denote skipped tests. Fix the result mapping accordingly. Signed-off-by: Jeny Sadadia <[email protected]> * config: jobs-chromeos: Add lab-setup fragment Add the lab-setup fragment to the chromebook builds, which contains the architecture independent kernel configs needed to run tests on the platform. Notably this disables IP autoconfig by the kernel. The result of this change is that the 12 seconds boot delay and the consequent deferred probe pending warnings will no longer happen on any platform. Particularly on mt8186-corsola-steelix-sku131072 (due to a different network adapter being used) on which it was still happening. Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * lava_callback: bump up slightly threads number Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: chromeos: enable watchdog reset test on Chromebooks Add a basic test to verify watchdog reset functionality. Enable the test on all ARM64 and AMD x86_64 Chromebooks. For Intel Chromebooks, enable the test only on octopus, as ACPI PM Timer on the other devices has been disabled in coreboot. Signed-off-by: Laura Nao <[email protected]> * src/send_kcidb: use schema version 4.3 Test status `MISS` was added to KCIDB in schema v4.2 and supported by the latest version i.e. v4.3. Hence, use the latest version for submission as API may send a few tests with "MISS" status. Signed-off-by: Jeny Sadadia <[email protected]> * send_kcidb: re-structure code for parsing checkout node Move code for parsing checkout node to a separate method. Add `valid` field to parsed checkout node. It denotes if source code was successfully checked out. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: print more information on invalid data Print details for invalid revision data for the sake of debugging. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: optimize `kcidb` import Remove redundant `kcidb` import and adjust kcidb Client call accordingly. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: remove keys with `None` values KCIDB doesn't allow `None` as field value. Remove all optional fields with `None` value to make it valid data for submitting to KCIDB. Signed-off-by: Jeny Sadadia <[email protected]> * config: add `kcidb_test_suite` property Every KernelCI test will be mapped to a unified test suite for KCIDB data submission. Add `kcidb_test_suite` property to test job definitions in YAML configuration files. The added property will store the mapped KCIDB test suite name. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: parse and submit node test and build data Listen to all the node events with node state `done` or `available` and submit the node to KCIDB. Parse node received from the event and create KCIDB schema compatible object based on type of the node i.e. checkout, build or test. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: set `log_excerpt` for builds and tests Fetch logs from compressed log file(*.log.gz) URL and send last 16*1024 characters for setting `log_excerpt` field for build and test nodes as it is the max allowed length of the KCIDB field. Signed-off-by: Jeny Sadadia <[email protected]> * config/jobs-chromes: add kcidb test suite property for watchdog test Add KCIDB test suite mapping for `watchdog_reset` test. Signed-off-by: Jeny Sadadia <[email protected]> * lava_callback.py: disable log removal from callback data We need it for investigations if we have any critical data loss during log sanitizing. Signed-off-by: Denys Fedoryshchenko <[email protected]> * src/send_kcidb: add error info to build nodes Add error metadata fields such as `error_code` and `error_msg` to `misc` field for build nodes. Signed-off-by: Jeny Sadadia <[email protected]> * result_summary presets: add watchdog-reset presets for mainline/next Add monitor and summary presets to track the results from the watchdog reset test on the mainline and next trees. Signed-off-by: Laura Nao <[email protected]> * pipeline.yaml: Fix fluster rootfs URL Signed-off-by: Denys Fedoryshchenko <[email protected]> * src/send_kcidb: get error metadata for failed/incomplete tests Tweak condition to get error metadata for test nodes. It should get error info for incomplete nodes as well and not just failed nodes. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: send tests only if KCIDB test mapping exists All test suite definitions must have `kcidb_test_suite` property i.e. KCIDB test suite mapping. Only send tests for those the mapping is found. Signed-off-by: Jeny Sadadia <[email protected]> * tests/validate_yaml: add validation for KCIDB mapping To submit KernelCI generated data to KCIDB, it is required to have a mapping for all the job definition with `kcidb_test_suite` property. Add validation to ensure all the jobs have a mapping present to avoid missing data submission. This check is to notify test authors trying to enable tests in maestro to include the required property for the mapping in their definition. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add qcs6490-rb3gen2 boot test Signed-off-by: Milosz Wasilewski <[email protected]> * config: chromeos: Enable kselftest-dt on Qualcomm platforms Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * pipeline.yaml: Add one um build for android trees As per request of Android team it will be good to check for breakages UM builds as well. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: use `kind=job` for test suites As part of re-structuring test hierarachy, `Job` model has been introduced for test suite/job nodes. It uses node kind `job`. Update test configurations in `pipeline.yaml` and `jobs-chromeos.yaml` to use `kind=job` to generate job nodes. Signed-off-by: Jeny Sadadia <[email protected]> * config/runtime/kunit.jinja2: provide `kind` value for child tests In case of submitting test hierarchy, child nodes by default inherit `kind` value from parent node. As we are re-structuring test hierarchy, test suit/job nodes will have `kind=job` where its child test nodes will have `kind=test`. Provide `kind` field explicitly to test result hierarchy to preserve different kind value than the parent node. Signed-off-by: Jeny Sadadia <[email protected]> * config/runtime/kunit.jinja2: fix `NameError` Fix the below error in `_submit` method: ``` Traceback (most recent call last): File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 287, in main job.submit(results) File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 138, in submit self._submit(result) File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 265, in _submit return node NameError: name 'node' is not defined ``` Signed-off-by: Jeny Sadadia <[email protected]> * config/runtime/kunit.jinja2: evaluate job node result Evaluate job node result from child node results if `null` result is receive from test result parser. For example nodes such as `fortify`: https://staging.kernelci.org:9000/viewer?node_id=6670ab43d0b7694b399897c4 Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: fix parsing of KUnit log file Handle both compressed(gzip) and plain text log files for getting log excerpt. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: HTTP exception handling for log excerpt Add HTTP exception handling for getting log excerpt data. Signed-off-by: Jeny Sadadia <[email protected]> * config: platforms-chromeos: Add serial delay for some Mediatek platforms Add test_character_delay to the Spherion, Tomato and Steelix platforms to workaround the fact that they're sometimes unable to process serial input fast enough, resulting in mangled commands and consequently flaky test results, as described in https://github.com/kernelci/kernelci-project/issues/366. The right place to do this change would be in the device-type template as described in LAVA's documentation [1]. This overriding in KernelCI is meant only as a temporary workaround to verify whether this fixes the issue. If it does, then we'll do it in LAVA upstream instead. [1] https://docs.lavasoftware.org/lava/debugging.html#differences-in-input-speeds Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * config: chromeos: Enable error-logs kselftest for MediaTek Chromebooks Run the error-logs kselftest on MediaTek Chromebooks. This test is currently under review upstream [1] so, in the meantime, it has been added to the collabora-next tree so it can prove its value by helping to detect issues upstream. [1] https://lore.kernel.org/all/[email protected] Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * config/pipeline.yaml: enable CIP lab Add configuration for LAVA CIP lab. Signed-off-by: Jeny Sadadia <[email protected]> * config/pipeline.yaml: add baseline-x86 test for CIP Add `baseline-x86-cip` test to be submitted to CIP LAVA lab. Signed-off-by: Jeny Sadadia <[email protected]> * docker-compose.yaml: add `lab-cip` runtime Add runtime argument `lab-cip` to `scheduler-lava` container. This will enable the pipeline to run and submit jobs to CIP LAVA lab. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: enable `job` node submission to KCIDB Parse newly added job node and its child tests for KCIDB submission. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: don't submit `setup` test suite nodes `setup` test suite has been introduced to store test results for environment setup checks before running actual test suite. KCIDB doesn't require `setup` test suite result as long as main test job result is submitted. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: add a check before sending data Check if parsed data is available before sending revision data to KCIDB. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: fix logs Fix log statement about submitting node to KCIDB as we are not sending all the nodes we receive event for to KCIDB. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: handle skipped tests Do not retrieve artifacts or metadata from parent node for skipped tests as in pratice only kernel revision, test runtime and platform will be available for skipped tests. Signed-off-by: Jeny Sadadia <[email protected]> * result_summary/utils: ignore failures on log retrieval Make the script continue running if there was an error fetching a test log. Signed-off-by: Ricardo Cañuelo <[email protected]> * doc/developer-documentation: add docs for enabling new tests Add developer documentation for enabling new tests. Signed-off-by: Jeny Sadadia <[email protected]> * Fix links after docs page migration Documentation has been migrated to the "docs.*" subdomain. Signed-off-by: Paweł Wieczorek <[email protected]> * pipeline.yaml: Add kcidebug fragment Add useful low-overhead debug option to kernel, and test on most x86 boards we have available, with minimal baseline tests. Signed-off-by: Denys Fedoryshchenko <[email protected]> * configs: update gcc-10 to gcc-12 As we upgrade compiler images, we need update gcc version Signed-off-by: Denys Fedoryshchenko <[email protected]> * regression_tracker: workaround: match node paths programatically Don't use 'path' as an api search parameter. The use of lists as query parameters (path is a list) is undefined. Instead, do the filtering in code. Signed-off-by: Ricardo Cañuelo <[email protected]> * config: remove qemu jobs from lab-qualcomm QEMU jobs use container pulled from hub.docker.com. After the lab move pulling from this registry is no longer possible at Qualcomm. This patch disables QEMU jobs from Qualcomm lab. Signed-off-by: Milosz Wasilewski <[email protected]> * validate_yaml.py: Improve pipeline validation Add validation that scheduler entries have matching job entry, this is critical validation, and job entries have at least one entry in the scheduler. Fix one entry detected by this validation Signed-off-by: Denys Fedoryshchenko <[email protected]> * pipeline.yaml: Add broonie(Mark Brown) trees to pipeline It is time to enable even more trees. Signed-off-by: Denys Fedoryshchenko <[email protected]> * validate_yaml.py: Add additional verification for duplicate keys We might have redefined same keys in different yaml files, this tool will ensure consistency of this entries. Signed-off-by: Denys Fedoryshchenko <[email protected]> * validate_yaml.py: Remove path separator Signed-off-by: Denys Fedoryshchenko <[email protected]> * validate_yaml.py: Rename variable to schedules Signed-off-by: Denys Fedoryshchenko <[email protected]> * config/kernelci.toml: update KCIDB origin name As we agreed to refer new KernelCI API & Pipeline as "maestro", use the new name while submitting data to KCIDB. Signed-off-by: Jeny Sadadia <[email protected]> * src/send_kcidb: update KCI result mapping with KCIDB status Update evaluation of KCIDB status from KCI result. Create 2 categories for error codes: 1. When pre-check tests completed but actual test suite coudln't run - this will have `MISS` status 2. When pre-check tests completed, actual test suite could run but somehow couldn't complete - this will have `ERROR` status Some LAVA error codes can occur at any point of execution such as `Cancelled` and `Test`. Listed such error codes to the most relevant category based on analysis of available results. Signed-off-by: Jeny Sadadia <[email protected]> * result_summary presets: fix presets for v4l2-decoder-conformance Following recent updates to data representation on KernelCI nodes, the top-level nodes for tests now have their kind set to 'job' instead of 'test'. Update the presets for v4l2-decoder-conformance tests accordingly. Signed-off-by: Laura Nao <[email protected]> * result_summary presets: fix output file name in kselftest-acpi preset Signed-off-by: Laura Nao <[email protected]> * config: enable dmabuf-heaps, exec and iommu kselftest suites Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Add kcidb_test_suite * config: result-summary: add generic rule to monitor failures and regression Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: Add rt-stable builds Copy rt-stable builds from legacy KernelCI. Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes: - Major changes to move to new way of writing kbuild jobs * config: pipeline: Add v6.6-rt branch for builds Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: result-summary: add rt-stable kbuilds presets Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: chromeos: Add 'nfs' suffix to KCIDB suite name for baseline-nfs The baseline test is currently run with both ramdisk and nfs rootfs. To distinguish baseline-nfs tests in KCIDB, add an 'nfs' suffix to the KCIDB test suite name. Signed-off-by: Laura Nao <[email protected]> * aks: Add kubernetes kcidb deployment We need file that will manage deployment of kcidb bridge in kubernetes production deployment. Signed-off-by: Denys Fedoryshchenko <[email protected]> * kubernetes: Adjust trigger k8s options Ignore kernelci tree on production, as it is special "staging"-only tree, and read all /config directory, not just default pipeline.yaml. Signed-off-by: Denys Fedoryshchenko <[email protected]> * regression_tracker: bugfix: catch empty search condition Fix _get_last_matching_node(), after the previous change there was an unhandled scenario where nodes may be empty but the function wouldn't return None immediately. Signed-off-by: Ricardo Cañuelo <[email protected]> * config: pipeline: correct the kind of kselftest suites to job Signed-off-by: Muhammad Usama Anjum <[email protected]> * scheduler-chromeos.yaml: Temporarily disable non-essential tast tests As per discussion, we disable temporary tast tests which unlikely will be reviewed. Signed-off-by: Denys Fedoryshchenko <[email protected]> * k8s/aks: Update deployment files 1)Update memory limit, as working with linux sources might require 3Gbyte of RAM. 2)Update config file path 3)Add callback environment variable 4)Update image reference to fresh one Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: pipeline: enable android builds with gcc-12 for all architectures Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: enable android builds with clang-17 for all architectures Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: remove build_variants from android build_configs The build_variants is legacy way to specify the different variants. We have moved to the newer way to specify the variants. Hence remove the build_variants from android build_configs. Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: add android15-6.6-lts branch for build as well The android15-6.6-lts has been included recently in legacy KernelCI: https://github.com/kernelci/kernelci-core/pull/2597 Add the same in newer KernelCI. Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: add blocklist for riscv older kernels for android builds Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: update KCIDB test suite mapping for baseline Use `boot` as KCIDB test suite mapping for all baseline tests. Signed-off-by: Jeny Sadadia <[email protected]> * callback_url: Update config and README As we are moving callback URL to environment variable, updating config and README accordingly. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: pipeline: enable android baseline (boot) testing for arm and arm64 in only allmodconfig Signed-off-by: Muhammad Usama Anjum <[email protected]> * scheduler.py: If event have jobfilter, inject it to the node data When someone generate artificial event with jobfilter, this is likely maintainer trying to repeat job. Treat this accordingly, and inject job filter to job node, so we will run only tests maintainer wants. Signed-off-by: Denys Fedoryshchenko <[email protected]> * lava_callback: migrate to fastapi It will be easier to maintain API and Pipeline, as both will be powered by FastAPI framework. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: chromeos: Update fluster rootfs URL Signed-off-by: Laura Nao <[email protected]> * config: pipeline: fix defconfigs in fragments Signed-off-by: Muhammad Usama Anjum <[email protected]> * kbuild.jinja2: support defconfig as list or str As required in https://github.com/kernelci/kernelci-core/pull/2608 defconfig might be two types. Support it in jinja2 accordingly. Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: piepline: add kbuilds of lee-mfd with default defconfigs Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: enable baseline testing for mfd for one board of each arch Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: fix platform sections for Qualcomm and Android schedules Signed-off-by: Paweł Wieczorek <[email protected]> * k8s: Update deployment to uvicorn, as we use fastapi now Signed-off-by: Denys Fedoryshchenko <[email protected]> * config: pipeline: Unblock android runs on lava-collabora Signed-off-by: Muhammad Usama Anjum <[email protected]> * pipeline: Enable preempt-rt cyclictest test Enable the first preempt-rt test, cyclictest in new KernelCI. Enable it on all platforms. Since these are all smoke test there is no point in running them too long. Thus reduce the runtime per test to one minute. This should keep the total preempt-rt runtime roughly in the same time frame. The changes have been ported from Daniel's PR [1]. [1] https://github.com/kernelci/kernelci-core/pull/2397 Signed-off-by: Daniel Wagner <[email protected]> Co-developed-by: Muhammad Usama Anjum <[email protected]> Signed-off-by: Muhammad Usama Anjum <[email protected]> * pipeline: add all the test jobs for all rt-test Add jobs definition of all the rt-tests. Enable cyclicdeadline and rtla tests to run on all targets. The changes have been ported from Daniel's PR [1]. [1] https://github.com/kernelci/kernelci-core/pull/2397 Signed-off-by: Daniel Wagner <[email protected]> Co-developed-by: Muhammad Usama Anjum <[email protected]> Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: add template and test properties for preempt_rt jobs Add template, job add kcidb_test_suite properties for all preempt-rt jobs Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: rename preempt-rt to rt-tests which is correct name of tests The legacy was using preempt-rt name of tests. But the repository has rt-tests name. We must use the same name to merge with execution results coming from other CIs in KCIDB. Suggested-by: Jeny Sadadia <[email protected]> Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: add the correct nfsroot for rt-tests Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: Remove android's deprecated branches It has been confirmed with Todd that we should remove the deprecated branches. Hence remove those branches. Signed-off-by: Muhammad Usama Anjum <[email protected]> * config: pipeline: run baseline on non-allmodconfig The allmodconfig generates very large kernel image. It cannot be booted on the arm64 and arm targets as tftp errors out that size is too large. Reduce the kernel image size. Use the default defconfig. The same defconfigs have been booting for other trees. Signed-off-by: Muhammad Usama Anjum <[email protected]> * doc: developer-documentation: Update documentation by adding more details - Reorganize some things - Specify how to write different variants by removing old syntax - Give two separate templates for kbuild and test - Try to put more details for new contributors Signed-off-by: Muhammad Usama Anjum <[email protected]> --- Changes since v1: - Fix type - Apply suggestions from code review * doc/developer-documentation: fix a glitch in enabling new tree section Fix a minor bug in YAML block formatting. Fixes: f5f57de ("doc: developer-documentation: Update documentation by adding more details") Signed-off-by: Jeny Sadadia <[email protected]> * doc/developer-documentation: update a section title Rename a section from "Enabling a new Kernel tree" to "Enabling new KernelCI trees, builds, and tests" as it explains enabling tests as well. Signed-off-by: Jeny Sadadia <[email protected]> * config: use the new `tree:branch` format for rules For cases where we want a single branch to be allowed for a given tree, we can now use the `tree:branch` format in rules. Convert existing rules accordingly. Signed-off-by: Arnaud Ferraris <[email protected]> * config: pipeline: fix improper use of "filters" attribute The `filters` param was used in the legacy system but has been replaced by `rules`, with a different syntax. For Android RISC-V builds, this was used to deny job execution on kernels < 4.19, so let's translate this condition with the rules format, and do a similar change for the `rt-tests`-based jobs. Signed-off-by: Arnaud Ferraris <[email protected]> * config/pipeline.yaml: Fix x86 typo in kcidebug job names The kcidebug jobs that run on MediaTek and Qualcomm platforms should have arm64 in the name rather than x86. Fix the typo. Signed-off-by: Nícolas F. R. A. Prado <[email protected]> * config: pipeline: remove params The parameters are only needed when they are changed or appeneded. Remvoe the parameters which aren't being modified. Signed-off-by: Muhammad Usama Anjum <[email protected]> * validate_yaml.py: Jobs are required to have template parameter Add more validation to config files of mandatory parameters. Signed-off-by: Denys Fedoryshchenko <[email protected]> * validate_yaml.py: Add more job validations Add basic validation, each job must have kind parameter Signed-off-by: Denys Fedoryshchenko <[email protected]> * workflows: Add label on CI check failures Automatically add label so broken PR wont go to staging Signed-off-by: Denys Fedoryshchenko <[email protected]> --------- Signed-off-by: Jeny Sadadia <[email protected]> Signed-off-by: Nícolas F. R. A. Prado <[email protected]> Signed-off-by: Denys Fedoryshchenko <[email protected]> Signed-off-by: Ricardo Cañuelo <[email protected]> Signed-off-by: Helen Koike <[email protected]> Signed-off-by: Arnaud Ferraris <[email protected]> Signed-off-by: Laura Nao <[email protected]> Signed-off-by: Muhammad Usama Anjum <[email protected]> Signed-off-by: Shreeya Patel <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Milosz Wasilewski <[email protected]> Signed-off-by: Paweł Wieczorek <[email protected]> Signed-off-by: Daniel Wagner <[email protected]> Co-authored-by: Jeny Sadadia <[email protected]> Co-authored-by: Nícolas F. R. A. Prado <[email protected]> Co-authored-by: Ricardo Cañuelo <[email protected]> Co-authored-by: Helen Koike <[email protected]> Co-authored-by: Arnaud Ferraris <[email protected]> Co-authored-by: Laura Nao <[email protected]> Co-authored-by: Muhammad Usama Anjum <[email protected]> Co-authored-by: Shreeya Patel <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Milosz Wasilewski <[email protected]> Co-authored-by: Paweł Wieczorek <[email protected]> Co-authored-by: Milosz Wasilewski <[email protected]> Co-authored-by: Daniel Wagner <[email protected]> Signed-off-by: Denys Fedoryshchenko <[email protected]>
The spherion and tomato platforms experience serial issues sometimes. Example on Spherion:
https://lava.collabora.dev/scheduler/job/13962478
This issue happens at random, and by just re-running the job I got it to work: https://lava.collabora.dev/scheduler/job/13995464
Log snippet:
After that LAVA reports
lava-test-shell timed out after 60 seconds
.As can be seen from the log, the
lava-test-runner
command gets messed up, and the shell replies withnot found
. There's also attyS ttyS0: 1 input overrun(s)
message, which is worrying.Example on Tomato: https://lava.collabora.dev/scheduler/job/13962479
It looks slightly different. It seems that the shell prompt itself got split, so LAVA didn't recognize it
And the LAVA error message is
wait for prompt timed out
.Researching a bit I found this LAVA documentation page, which suggests setting
boot_character_delay
and/ortest_character_delay
as ways to avoid missing characters in the serial when the device can't keep up.Also, when the login prompt is reached, there are still many messages being printed by the kernel, which might be interfering, so another idea would be to add a delay so LAVA only sends the command once the serial output has settled down.
The text was updated successfully, but these errors were encountered: