Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serial can't keep up on spherion and tomato (MediaTek Chromebooks) #366

Open
nfraprado opened this issue May 24, 2024 · 6 comments
Open
Assignees

Comments

@nfraprado
Copy link
Contributor

The spherion and tomato platforms experience serial issues sometimes. Example on Spherion:
https://lava.collabora.dev/scheduler/job/13962478

This issue happens at random, and by just re-running the job I got it to work: https://lava.collabora.dev/scheduler/job/13995464

Log snippet:

/ # /lava-13962478/bin/lava-test-runner /lava-13962478/0
[    9.705822] hub 1-1.4:1.0: USB hub found
/lava-13962478/bin/lava-tes[    9.706251] hub 1-1.4:1.0: 2 ports detected
[    9.782304] r8152-cfgselector 2-1.3: reset SuperSpeed USB device number 4 using xhci-mtk
[    9.811444] r8152 2-1.3:1.0: Direct firmware load for rtl_nic/rtl8153a-4.fw failed with error -2
[    9.811497] r8152 2-1.3:1.0: unable to load firmware patch rtl_nic/rtl8153a-4.fw (-2)
[    9.844050] r8152 2-1.3:1.0 eth0: v1.12.13
[    9.844408] r8152-cfgselector 2-1.3: USB disconnect, device number 4
[    9.919491] Console: switching to colour frame buffer device 240x67
t-runa-13962478/[    9.953121] ttyS ttyS0: 1 input overrun(s)
[   10.006751] mediatek-drm mediatek-drm.11.auto: [drm] fb0: mediatekdrmfb frame buffer device
0
/bin/sh: /lava-13962478/bin/lava-test-runa-13962478/0: not found
/ # [   10.034163] usb 2-1: reset SuperSpeed USB device number 2 using xhci-mtk

After that LAVA reports lava-test-shell timed out after 60 seconds.

As can be seen from the log, the lava-test-runner command gets messed up, and the shell replies with not found. There's also a ttyS ttyS0: 1 input overrun(s) message, which is worrying.

Example on Tomato: https://lava.collabora.dev/scheduler/job/13962479

It looks slightly different. It seems that the shell prompt itself got split, so LAVA didn't recognize it

. /lava-13962479[   10.704192] mtk-mdp3 14001000.dma-controller: can't get SCP node
/environment
/ [   10.711195] mtk-mdp3 14001000.dma-controller: Driver registered as /dev/video2
[   10.711728] mtk-mdp3 1400c000.dma-controller: Adding to iommu group 1
# [   10.726424] mtk-mdp3 14f08000.dma-controller: Adding to iommu group 1

And the LAVA error message is wait for prompt timed out.

Researching a bit I found this LAVA documentation page, which suggests setting boot_character_delay and/or test_character_delay as ways to avoid missing characters in the serial when the device can't keep up.

Also, when the login prompt is reached, there are still many messages being printed by the kernel, which might be interfering, so another idea would be to add a delay so LAVA only sends the command once the serial output has settled down.

@nfraprado nfraprado self-assigned this May 24, 2024
@laura-nao
Copy link

Bumped into the same issue when checking preliminary results for the watchdog reset test: https://lava.collabora.dev/scheduler/job/14140804#L10766

This is another instance where missing characters can lead to a false regression (here /sys/class/watchdog0/timeout became /sys/class/wathdog0/timeout). We should consider adding test_character_delay in the respective device type templates to see if it helps mitigating the issue. @nfraprado, based on your experience, does this only affect the spherion and tomato devices?

@nuclearcat
Copy link
Member

Maybe adding somewhere shell script:

if [ ! -e /sys/class/watchdog0/timeout ]; then
  echo ERR-NODEV
  echo ERR-NODEV
  echo ERR-NODEV
fi

and triggering on shorter string "ERR-NODEV" (that also repeats, to increase chance it wont be mixed with some other stuff)?

@nfraprado
Copy link
Contributor Author

@nuclearcat We need to fix the serial because we won't be able to change the code for the upstream tests, and we wouldn't want them to have to handle a flaky serial anyway.

@laura-nao So far I've only seen it happen on these two platforms, but I'll add updates here if I see it elsewhere.

I think adding test_character_delay for these platforms will help, but every instance of this issue I've seen so far happened while the serial was still printing a lot of output, so I feel like adding an initial delay would also help a lot. The suggestion I got was to run udevadm settle to wait for the activity on the device to settle down, but that command will be run on the device and so it could itself get mangled... Ideally we would define something in the LAVA template that makes LAVA wait for the serial to settle down before continuing, but I don't know if anything like that exists.

@nuclearcat
Copy link
Member

I think trogdor also have flaky serial

nfraprado added a commit to nfraprado/kernelci-pipeline that referenced this issue Jun 3, 2024
Add test_character_delay to the Spherion and Tomato platforms to
workaround the fact that they're sometimes unable to process serial
input fast enough, resulting in mangled commands and consequently flaky
test results, as described in kernelci-project#366 [1].

The right place to do this change would be in the device-type template
as describe in LAVA's documentation [2]. This overriding in KernelCI is
meant only as a temporary workaround to verify whether this fixes the
issue. If it does, then we'll do it in LAVA upstream instead.

[1] kernelci/kernelci-project#366.
[2] https://docs.lavasoftware.org/lava/debugging.html#differences-in-input-speeds
Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
nfraprado added a commit to nfraprado/kernelci-pipeline that referenced this issue Jun 3, 2024
Add test_character_delay to the Spherion and Tomato platforms to
workaround the fact that they're sometimes unable to process serial
input fast enough, resulting in mangled commands and consequently flaky
test results, as described in
kernelci/kernelci-project#366.

The right place to do this change would be in the device-type template
as described in LAVA's documentation [1]. This overriding in KernelCI is
meant only as a temporary workaround to verify whether this fixes the
issue. If it does, then we'll do it in LAVA upstream instead.

[1] https://docs.lavasoftware.org/lava/debugging.html#differences-in-input-speeds
Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
nfraprado added a commit to nfraprado/kernelci-pipeline that referenced this issue Jun 3, 2024
Add test_character_delay to the Spherion and Tomato platforms to
workaround the fact that they're sometimes unable to process serial
input fast enough, resulting in mangled commands and consequently flaky
test results, as described in
kernelci/kernelci-project#366.

The right place to do this change would be in the device-type template
as described in LAVA's documentation [1]. This overriding in KernelCI is
meant only as a temporary workaround to verify whether this fixes the
issue. If it does, then we'll do it in LAVA upstream instead.

[1] https://docs.lavasoftware.org/lava/debugging.html#differences-in-input-speeds
Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
@nfraprado
Copy link
Contributor Author

@nuclearcat right, I remember trogdor has flaky serial, but I think it was the output, not the input, and in that case the delay wouldn't help. But I'll check it and the other platforms.

For now I've created a PR to for us to test if this does fix it on spherion and tomato: kernelci/kernelci-pipeline#626.

@nfraprado
Copy link
Contributor Author

I noticed the input overrun issue on Spherion only happened on baseline, never on baseline-nfs, meaning that instead of the character delay, an initial delay would probably also solve the issue. In any case, since the PR with the character delay was opened, there hasn't been a failure yet, so it seems to be working so far. But let's give it a bit more time.

nfraprado added a commit to nfraprado/kernelci-pipeline that referenced this issue Jun 18, 2024
Add test_character_delay to the Spherion, Tomato and Steelix platforms
to workaround the fact that they're sometimes unable to process serial
input fast enough, resulting in mangled commands and consequently flaky
test results, as described in
kernelci/kernelci-project#366.

The right place to do this change would be in the device-type template
as described in LAVA's documentation [1]. This overriding in KernelCI is
meant only as a temporary workaround to verify whether this fixes the
issue. If it does, then we'll do it in LAVA upstream instead.

[1] https://docs.lavasoftware.org/lava/debugging.html#differences-in-input-speeds
Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
github-merge-queue bot pushed a commit to kernelci/kernelci-pipeline that referenced this issue Jun 20, 2024
Add test_character_delay to the Spherion, Tomato and Steelix platforms
to workaround the fact that they're sometimes unable to process serial
input fast enough, resulting in mangled commands and consequently flaky
test results, as described in
kernelci/kernelci-project#366.

The right place to do this change would be in the device-type template
as described in LAVA's documentation [1]. This overriding in KernelCI is
meant only as a temporary workaround to verify whether this fixes the
issue. If it does, then we'll do it in LAVA upstream instead.

[1] https://docs.lavasoftware.org/lava/debugging.html#differences-in-input-speeds
Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
nuclearcat added a commit to nuclearcat/kernelci-pipeline that referenced this issue Jul 24, 2024
* src/scheduler: store error message when job fails with "submit_error"

It is helpful for debugging to catch error message when
scheduler fails to submit job to runtime.
Store the error message to `data.error_msg` field.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: pipeline: Set minimum kernel version for DT kselftest to 6.7

The test was introduced upstream in version 6.7, so no point in trying
to run it on earlier versions.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* configs/: Update volteer device

Update volteer devices according lab availability

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result_summary templates: detailed output for active/inactive regressions

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: new presets for active regressions

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: update CHANGELOG

Signed-off-by: Ricardo Cañuelo <[email protected]>

* data: chmod -R 777 ./data/output to avoid permission error

Avoid errors like

PermissionError: [Errno 13] Permission denied: '/home/kernelci/data/output/stable-rc-boot.html'

Signed-off-by: Helen Koike <[email protected]>

* result_summary: move code to _get_logs

Signed-off-by: Helen Koike <[email protected]>

* result_summary: use ThreadPoolExecutor to fetch logs

Fetching logs is the bottleneck of the script. Fetch them in parallel
with ThreadPoolExecutor.

Signed-off-by: Helen Koike <[email protected]>

* result_summary: fix result presets

stable-rc-build-failures and stable-rc-boot-failures weren't querying
specifically for test failures.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* src/regression_tracker: rework regression detection

Take into account "active" and "inactive" regressions when creating them
and when processing new passed or failed nodes.

When a node passes, it checks if it "inactivates" an existing "active"
regression. When a node fails, it checks if it needs to create a new
regression or update an existing "active" one.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* src/regression_tracker: link failed nodes to active regressions

When a failed node generates a regression, or when it's a re-run of a
run that generated a still active regression, link the node to the
regression id.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: support for date ranges for creation and update

New command line options to let the user specify date ranges for node
creation and last update: --created-from, --created-to,
--last-updated-from, --last-updated-to

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: support for date ranges for creation and last update

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: support for extra query parameters in cmdline

New command line option: --query-params to specify a set of extra query
parameters to complete or override preset parameters.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: html markup in some preset titles

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary changelog: update and move to docs folder

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: move parameter loading and processing to 'setup'

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: refactor and split into two clases (single, run)

Split the ResultSummary class into a base class and two child classes:
ResultSummarySingle and ResultSummaryLoop (only a stub at this point).

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: WIP initial implementation of the "loop" command

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: huge refactoring

Implement "summary" (single-shot) and "monitor" (loop) modes based on
preset parameters instead of on the command-line main command.

Split the logic into multiple files, move all monitor-specific and
summary-specific code to independent files, common code in a separate
file.

Full of kludges, I don't like how this is looking so far, might consider
reimplementing it without any dependencies on pipeline code.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: fix markup and indentation

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: new generic templates for monitor mode

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: examples for "monitor" and "summary" modes

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary changelog: summary and monitor modes

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: fix generic regression report

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: summary: fix last_updated option handling

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: embed css stylesheet in html files

Signed-off-by: Ricardo Cañuelo <[email protected]>

* regression_tracker: [trivial] make regression active by default

Fixup for commit fcb29501663d78920bcd129bd57c36b9af624bc4
If the "result" field is ever made non-optional in the models we can
probably remove this.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* regression_tracker: [trivial] set default empty node sequence

Fixup for commit fcb29501663d78920bcd129bd57c36b9af624bc4
If the "node_sequence" field is ever made non-optional in the models we
can probably remove this.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: add cmdline option --output-dir

Introduce a new command-line option: --output-dir, and rename the old
--output to --output-file.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary changelog: command-line options change

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: jobs-chromeos: remove meaningless Tast tests

Several Tast tests can only fail in the context of KernelCI:
* `video.PlatformDecoding.v4l2_state*_vp9_0_svc` do not actually exist,
  causing the whole test job to fail
* `platform.DLCService*` and `platform.Memd` rely on features only
  present in the downstream Chrom{e,ium}OS kernel (see b/247467814 and
  b/244479619 for those having access to Google's issue tracker)
* `kernel.ConfigVerify.chromeos` relies on downstream-only config
  options such as `CONFIG_SECURITY_CHROMIUMOS` and other similar ones,
  and therefore can only fail when testing upstream kernels

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: scheduler-chromeos: don't execute non-working Tast tests

Currently, HEVC-related tests are known to either fail or be skipped as
ChromeOS doesn't yet handle hardware decoding of HEVC media. This is
expected to be fixed at some point though, so we're keeping the job
definitions and only remove the corresponding scheduler entries in order
to reinstate those jobs when relevant.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: jobs-chromeos: exclude Tast tests known to always fail

Several decoder tests always fail on all platforms where they're
executed, adding only noise to otherwise useful test results. Disable
those for improving the quality of the results.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: chromeos: add special case for pre-6.7 qcom codec tests

On Qualcomm-based ChromeBooks (`trogdor` being the only model in
Collabora's lab), we noticed systematic failures of all
`vp9_*_frm_resize` and `vp9_*_sub8x8_sf` tests when using a kernel up to
6.6. With 6.7 and above, all of those tests (except one) now pass. It
therefore makes sense to exclude those on pre-6.7 kernels so we don't
report known failures and get rid of some noise.

This involves "duplicating" affected test jobs (although I did my best
to minimize that) and setting rules so only the working variant is
executed, based on the version of the kernel being tested.

Signed-off-by: Arnaud Ferraris <[email protected]>

* lava_callback: Compress the log files to save storage space

As storage space in cloud and egress have high costs,
better to compress potentially large files.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* tests: Add basic yaml validation

Add yaml load to figure out earlier issues with yaml

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: chromeos: drop stoneyridge/pineview naming in platforms anchors

The "stoneyridge" and "pineview" naming used in the Chromebook platform
anchors refers to ChromiumOS specific config fragments, but doesn't
necessarily match the actual platform of all the devices listed.
Use more generic names to distinguish amd and intel Chromebooks.

Signed-off-by: Laura Nao <[email protected]>

* config: chromeos: rename test job anchors that use chromeos specific configs

Rename test job anchors that use chromeos specific kernel configurations
to include the 'chromeos' infix.

Signed-off-by: Laura Nao <[email protected]>

* config: chromeos: add baseline tests

Enable the baseline tests on all the supported Chromebooks with their
default kernel configuration.

Signed-off-by: Laura Nao <[email protected]>

* config: chromeos: drop stoneyridge/pineview naming in job defs

The "stoneyridge" and "pineview" naming used in some Chromebook job
definitions refers to ChromiumOS specific config fragments, but
doesn't necessarily match the actual platforms targeted by the jobs.
Replace all occurrences with more generic intel/amd naming.

Signed-off-by: Laura Nao <[email protected]>

* config: chromeos: drop chromeos infix from baseline jobs

Keeping different job names for tests targeting different kernel configs
might cause too much duplication. Drop the 'chromeos' infix from the job
name for the tests using the chromeos config fragment. Users will be
able to filter the results using the data.defconfig/data.config_full
fields anyway.

Signed-off-by: Laura Nao <[email protected]>

* result_summary: post-process results for summary and monitor modes

Split the post-processing of nodes to a common function that can be used
for both summary and monitor modes. Currently, post-processing involves
only the collection of logs.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: update and fix presets and templates

Signed-off-by: Ricardo Cañuelo <[email protected]>

* doc/result-summary-CHANGELOG: update

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config/pipeline.yaml: enable 'BayLibre' lab

Add lab configuration for BayLibre.

Signed-off-by: Jeny Sadadia <[email protected]>

* docker-compose.yaml: add `lab-baylibre` runtime

Add runtime argument `lab-baylibre` to `scheduler-lava`
container. This will enable the pipeline to run and
submit jobs to BayLibre.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add `baseline-x86-baylibre` job

Add job configuration `baseline-x86-baylibre` for BayLibre.
Add scheduler entry as well.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add `baseline-armel-baylibre` job

Add job configuration `baseline-armel-baylibre` for BayLibre.
Add scheduler entry and platform config as well.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline: enable `android` tree and build configs

Monitor linux `android` tree. Add build configs for `android-mainline`
branch.

Signed-off-by: Helen Koike <[email protected]>

* config/pipeline.yaml: add kbuild definitions for android-mainline

Add kbuild jobs to compile the kernel for android-mainline branch

Signed-off-by: Helen Koike <[email protected]>

* config/pipeline.yaml: add entries to schedule to build android-mainline

Add entries to `scheduler:` section to run the builds for
android-mainline.

Signed-off-by: Helen Koike <[email protected]>

* result_summary: fix node filter in monitor mode

Signed-off-by: Ricardo Cañuelo <[email protected]>

* kernelci.toml: set `checkout` node timeout to `180 min`

Currently set `60 min` timeout is not enough as some
`kbuild` jobs and its sub-tests take around 2 hrs to
complete after getting submitted to runtime.

Here is an example from staging. See the information
for a `checkout` and its child nodes:

| id                       | name                | created                    | updated                    | timeout                    |
|--------------------------|---------------------|----------------------------|----------------------------|----------------------------|
| 661c9d59b60b785eb9fc42b0 | checkout            | 2024-04-15T03:22:01.317000 | 2024-04-15T03:51:03.870000 | 2024-04-15T04:22:01.284000 |
| 661c9d97b60b785eb9fc42b4 | kbuild-gcc-10-arm64 | 2024-04-15T03:23:03.399000 | 2024-04-15T03:50:15.031000 | 2024-04-15T09:23:03.399000 |
| 661ca3f7b60b785eb9fc4ead | baseline-arm64      | 2024-04-15T03:50:15.304000 | 2024-04-15T05:09:45.247000 | 2024-04-15T09:50:15.304000 |

Signed-off-by: Jeny Sadadia <[email protected]>

* result_summary: add email report capabilities for monitor mode

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: plain text single report templates

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: chromeos: add baseline-nfs tests

Enable the baseline-nfs tests on all the supported Chromebooks, with
both the default and the chromeos kernel configurations.

Signed-off-by: Laura Nao <[email protected]>

* src/timeout: set `checkout` result

For `TIMEOUT` mode, set `checkout` node result to `fail`
if its state is `running` as it means code checkout is still
going on and node timed-out. Set it to `pass` if its state
is any other than `running`.
Set `checkout` node result to `pass` if mode is `DONE` as
it means once `checkout` has been in `available` or `closing`
state and it could successfully complete source code checkout.

Signed-off-by: Jeny Sadadia <[email protected]>

* regression_tracker: bugfix, failed test with no prior runs

Handle the case of a failed test run when it's the first occurence of
that test case. Consider it "not a regression" for now, since we're
defining a regression as a "breaking point" between a success and a
failure.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: platforms-chromeos: fix dalboz device type

Due due to a copy/paste mishap, the device type for
`asus-CM1400CXA-dalboz` had a trailing `_chromeos`, leading LAVA to fail
finding the correct device type, and no job from the new system running
on this platform.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: jobs-chromes: run Tast tests only on 5.4+

Current ChromeOS images have `ext4` filesystems using options not
present in 4.19. Therefore tests cannot run on kernels that old, and
this leads to false positives in corrupt device identification, so we
should only run those tests on 5.4 and later kernels.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: platforms-chromes: drop non-existent platform

`hp-x360-12b-ca0500na-n4000-octopus` isn't a device type available in
Collabora's LAVA lab, so let's drop its definition.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: exclude android tree from kbuild jobs

Only Android-specific kbuild jobs should run for this tree, let's not
overload our system with unneeded builds.

Take this opportunity to limit mediatek kbuilds to 6.1+ as that's the
earliest version that has upstream support for at least one of our
devices.

Signed-off-by: Arnaud Ferraris <[email protected]>

* src/timeout: a bug fix in `_submit_lapsed_nodes`

Fix a glitch in the code related to setting `checkout`
node result.

Fixes: 361fc0d ("src/timeout: set `checkout` result")
Signed-off-by: Jeny Sadadia <[email protected]>

* pipeline.yaml: Update early access FQDN

We are moving k8s from eastus to westus3 as it is cheaper

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* src/tarball: fix `_kdir` in `update_repo`

Fix the below error:
```
kernelci-pipeline-tarball |   File "/home/kernelci/./pipeline/tarball.py", line 79, in _update_repo
kernelci-pipeline-tarball |     kernelci.shell_cmd(f"rm -rf {self._kdir}")
kernelci-pipeline-tarball |                                  ^^^^^^^^^^
kernelci-pipeline-tarball | AttributeError: 'Tarball' object has no attribute '_kdir'
```

Fixes: 0a2fe9c ("src/patchset.py: Implement Patchset service)
Signed-off-by: Jeny Sadadia <[email protected]>

* src/timeout: fix method to get child nodes recursively

`TimeoutService._get_child_nodes_recursive` is used to get
pending child nodes recursively for closing and timed-out
nodes. It overwrites the result while being called recursively.
Fix the method to make it work properly.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: pipeline: rename "armel" arch to "arm"

`armel` has various meanings depending on the system: for ChromeOS, it
is ARMv7, while in Debian it's ARMv{5T,6}. Moreover, this project is
*Kernel*CI and the kernel uses `arm` for all 32-bits ARM devices. In
order to avoid confusion (including those wondering what the heck does
`armel` mean), let's rename `armel` to `arm`.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: use per-system arch property where relevant

With the new `*arch` fields present in the platform configurations, we
don't have to hardcode the architecture strings in some specific cases.
Let's adapt the config files so we use `{cros,deb,k}arch` wherever it
makes sense.

Signed-off-by: Arnaud Ferraris <[email protected]>

* src/timeout: set timed-out `checkout` result

Set timed-out `checkout` node result to `incomplete`
while in `running` state. As it denotes that the node
timed-out while checkout was still going on.
Also, set error related information i.e. `error_code`
and `error_msg`.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/tarball: update checkout node when update repo fails

Tarball updates source code repo and creates tarball.
If update repo operation fails even with second attempt,
it means it failed to checkout souce code.
Hence, update `checkout` node with state `done` state and
result `fail`. Also, set appropriate error information
to the `data` field.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: pipeline: enable collabora-next tree and build config

Monitor the collabora-next tree. Add build config for the for-kernelci
branch.

Signed-off-by: Laura Nao <[email protected]>

* config: chromeos: enable acpi kselftest on collabora-next tree

Run the ACPI kselftest on the for-kernelci branch of the collabora-next
tree.

See: https://lore.kernel.org/linux-kselftest/[email protected]/T/#t

Signed-off-by: Laura Nao <[email protected]>

* result_summary: restore missing split_query_params function

Restore this function that was accidentally removed during the last
refactoring.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* lava_callback: Don't upload empty files to Azure

There is no use for lot of empty files on Azure,
that only complicate cleanup.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result_summary presets: unify preset and output names

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: update preset for aferraris

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: new presets for laura.nao

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: fixes and new presets for nfraprado

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: fix arch query parameters

Signed-off-by: Ricardo Cañuelo <[email protected]>

* k8s: Lot of deployment tested fixes

Fixes in yaml files for k8s production deployment.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result-summary presets: Fix build failure and regression monitors

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* result_summary: added debug traces to the monitor

Show detailed info of the node filterings in real time.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: fix corner case bug when no logs are found

Cover rare case where neither the node nor any of its parents up to the
checkout node have any log artifacts.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: refine stable-rc presets

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: add regression info to test reports

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: escape log snippets

Signed-off-by: Ricardo Cañuelo <[email protected]>

* src: lava_callback: add device ID to node data

It can be useful to know the exact device on which a job ran, without
having to open the LAVA job page. This is done by querying the device ID
from the callback data and appending it to the node data.

Signed-off-by: Arnaud Ferraris <[email protected]>

* src: lava_callback: upload raw callback data as well

Debugging callback issues is complex due to the raw data not being saved
after processing. This change ensures we save the callback data as a
JSON file in order to ease development.

Signed-off-by: Arnaud Ferraris <[email protected]>

* DONOTMERGE lava_callback: add debug statements

Why the heck doesn't this just work???

Signed-off-by: Arnaud Ferraris <[email protected]>

* result_summary_templates: fix error 'node' is undefined

The object is named test and not node, so s/node/test

Signed-off-by: Helen Koike <[email protected]>

* config/runtime/kunit: set architecture info

Set architecture field for `kunit` test
nodes.
If no `arch` argument is supplied, kunit takes
`um` (User Mode Linux) as architecture to run
tests.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/timeout: count running child jobs of build nodes

Add a method to count running jobs of `kbuild`
nodes i.e. jobs being submitted after successful
builds. Fox example `baseline` or `tast` jobs.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/timeout: handle closing `checkout` node differently

Usually, `checkout` should be transited to `done` state
when all its child nodes are completed.
In case of closing `checkout`, take into account
running child jobs of build nodes before transiting
its state to `done`. Otherwise, `checkout` will be
assigned to `done` state even if some child jobs are still
running.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/timeout: handle holdoff reached `checkout` node differently

Usually, available `checkout` for which holdoff is
reached should be transited to `done` state only when
all its child nodes are completed.
In case of such `checkout` node, take into account
running child jobs of build nodes before transiting
its state to `done`. Otherwise, `checkout` will be
assigned to `done` state even if some child jobs are
still running.

Signed-off-by: Jeny Sadadia <[email protected]>

* Revert "DONOTMERGE lava_callback: add debug statements"

This reverts commit 5ed8218d99840373bbba5830b1976813b52bf4b1.

Signed-off-by: Arnaud Ferraris <[email protected]>

* Create dependabot.yml

* result_summary_templates: make generic-test-failures generic to all
results

The generic-test-failures templates can be used to show general results
just replacing the name "failures" by "results". Makeing it easier to be
re-used by communities that want to have pre-sets to list all results of
the tests, so:

	s/generic-test-failures/generic-test-results

Signed-off-by: Helen Koike <[email protected]>

* result-summary.yaml: add preset to list android build tests

Since we now build android, add a preset to allow result-summary.yaml to
list all build results from Android tree.

Signed-off-by: Helen Koike <[email protected]>

* tarball: Implement checkout for specific commit

We often need not ToT, but specific commit, implement this.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* jobs-chromeos.yaml: Disable module compression for every kernel version

Commit d4bbe942098b ("kbuild: remove CONFIG_MODULE_COMPRESS"),
introduced in kernel v5.13, substituted CONFIG_MODULE_COMPRESS=n for
CONFIG_MODULE_COMPRESS_NONE=y as the way to disable module compression.
Since module compression causes "Invalid ELF header magic: != ELF"
errors during boot on the ChromeOS base config, add the missing config
to disable module compression on kernels > v5.13 as well.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* src: lava_callback: reduce callback data size

The callback data is quite large, especially as it includes the full log
which we already upload separately. By dropping it and compressing the
whole file with `gzip` we can avoid wasting too much storage space.

Signed-off-by: Arnaud Ferraris <[email protected]>

* src: lava_callback: don't leak secret token

The callback data contains the secret tokens value which shouldn't be
leaked. Ensure we drop it from the uploaded data.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: platforms-chromeos: use new cros-flash image

This ensures we use the new version of the `install-modules` script.

Signed-off-by: Arnaud Ferraris <[email protected]>

* src: regression_tracker: add the "device" field to regression data

This can be helpful. We're not using it as a search param though, as we
don't want to narrow down the search that much, using the platform only
is better.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: result_summary_templates: report device used for job

This information is now available, and it can be useful to know the
affected device withouth having to look at the LAVA job details.

Signed-off-by: Arnaud Ferraris <[email protected]>

* kubernetes: Update deployment recipe

Update list of labs and add KCI_INSTANCE variable.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* lava-callback: Limit threads of lava-callback

Due inrush of lava callbacks and slow Azure Files
processing, we need to make sure we dont spawn too many
threads.
Also add hard limit of memory 1Gbyte

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result_summary presets: add presetes for fluster test

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Make template generic for all v4l2 tests
- Rebase on main

* result_summary presets: make the name of fluster test generic

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: enable first fluster test for mt8195-cherry-tomato-r2

Enable first fluster test, AV1-TEST-VECTORS for mt8195-cherry-tomato-r2.
Run the test on mainline and next until more trees are added.

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Create generic v4l2-decoder-conformance-job and use anchers from it
- Update the rootfs address
- Move anchor to _anchor
- Update with nitpicks

* config: jobs-chromeos: Add kernelci tree for testing purpose

Remove this commit before merging.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: chromeos: Enable cpufreq kselftest

Enable cpufreq kselftest on all the trees and branches.

Signed-off-by: Shreeya Patel <[email protected]>

* result_summary presets: fix preset for kselftest-dt failures monitor

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: new presets for kselftest-cpufreq

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: mt8195-cherry-tomato-r2: enable all fluster tests for all branches

Add all the trees and branches on which the tests would be ran. Enable
all the tests for tomato.

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- The build config cannot be added yet. Just list the trees, it will only use
  the branches configured in build_configs:
  - mainline will use master
  - next will use master
  - collabora-chromeos-kernel will use for-kernelci
  - media will use master and fixes
- Remove kernelci tree as it was added just for testing purpose

* config: mt8183-kukui-jacuzzi-juniper-sku16: enable add all supported fluster tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>

jacuzzi

* config: mt8186-corsola-steelix-sku131072: enable add all supported fluster tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: mt8192-asurada-spherion-r0: enable add all supported fluster tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Don't specify the platforms manually as they are already mentioned in
  test-job-arm64-mediatek

* config: sc7180-trogdor-kingoftown/lazor-limozeen: enable add all supported fluster tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Use test-job-arm64-qualcomm instead and carete separate jobs for
  qualcomm devices
- Don't specify platforms manually as they are already mentioned in
  test-job-arm64-qualcomm

* build(deps): bump uwsgi from 2.0.21 to 2.0.22 in /docker/lava-callback

Bumps [uwsgi](https://uwsgi-docs.readthedocs.io/en/latest/) from 2.0.21 to 2.0.22.

---
updated-dependencies:
- dependency-name: uwsgi
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* pipeline.yaml: Add stable-rc build variants

Add more build variants for stable-rc tree to match legacy system.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result_summary: add error classification

Classify errors according to patterns in the logs

Signed-off-by: Helen Koike <[email protected]>

* result_summary presets: add collabora-chromeos-kernel and media trees for fluster tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: Use media-stage instead of media-tree

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config/pipeline: enable android branches from legacy

Enable all android branches from the legacy system

Signed-off-by: Helen Koike <[email protected]>

* trigger: Add exclude/include tree list for trigger

As we need to restrict list of running kernels on staging,
we need to add option allowing that.
Also it will be good to exclude staging kernels from production
kernel list.

So in case of staging we need to run kernels only from tree "kernelci"
and sometimes something else, for example "mediatek".
Option will look like:

--trees kernelci,mediatek
or
--trees kernelci

On production we need to exclude trees kernelci and buggytree:
--trees !kernelci,buggytree
or just kernelci:
--trees !kernelci

Purpose of this option is that our compiling capacity is limited,
and right now staging and production both compiling very large set
of kernels, we need to reduce this amount to drop costs.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: platforms-chromeos: use CrOS R124 files

ChromeBooks were upgraded with a new image based on ChromiumOS R124, so
we must use those files now.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: jobs-chromeos: drop non-existent Tast tests

Those were removed between R120 and R124 and therefore cause test
failures with the new images.

Signed-off-by: Arnaud Ferraris <[email protected]>

* result_summary presets: fix acpi kselftest presets

We're interested in catching regressions and failures in the both the
kselftest-acpi test suites and its test cases. Match the nodes by group
in the presets accordingly.
Fix template used by the failure monitor preset.

Signed-off-by: Laura Nao <[email protected]>

* src: update return values of `APIHelper.receive_event_node`

`APIHelper.receive_event_node` method is used to receive
node data from PubSub event. The method has been updated
to return `is_hierarchy` flag as well which represents
events related to node hierarchy.
Update pipeline services using the method accordingly.

Signed-off-by: Jeny Sadadia <[email protected]>

* result_summary presets: refine presets for v4l2-decoder-conformance

Modify the regression preset to monitor regressions on both the
v4l2-decoder-conformance test suites and its test cases, by matching the
nodes by group instead of by name.
Also, change the failure preset to monitor for all errors caused by
runtime errors.

Signed-off-by: Laura Nao <[email protected]>

* result_summary presets: add summary presets for v4l2-decoder-conformance

Add summary presets to fetch regressions and failures on
v4l2-decoder-conformance tests. Two of the presets are the same used by
the monitor; add one additional preset to fetch all the failures on
both the test suites and their test cases.

Signed-off-by: Laura Nao <[email protected]>

* lava_callback.py: Remove error_code/error_msg on lava-callback

Sometimes due congestion node might be set to timeout, but
then result might arrive late and we need to use it properly.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result_summary presets: fix dt kselftest presets

Fix the dt kselftest preset, just like was done for the acpi one, as the
current preset doesn't match the actual results we're interested in.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* doc/connecting-lab: refine documentation

Refine documentation for connecting LAVA labs
and submitting jobs to the lab.

Signed-off-by: Jeny Sadadia <[email protected]>

* lava_callback: Sometimes we get totally invalid log file uploaded

Most likely problems lays in threading of flask, and possibly
callbacks are getting mixed. This commit attempts to introduce
several countermeasures against that.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* doc: add `_index.md` page

Add index documentation page.

Signed-off-by: Jeny Sadadia <[email protected]>

* doc: add `pipeline-details` page

Move `pipeline-details` documentation from the API
repository to this repo to make it close to the source.

Signed-off-by: Jeny Sadadia <[email protected]>

* doc/connecting-lab: adjust `weight` property

Change `weight` property of existing doc page to
accommodate with transition of pipeline related docs
to pipeline repo.

Signed-off-by: Jeny Sadadia <[email protected]>

* doc: add `developer-documentation` page

Add developer manual documentation.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add lab config for Qualcomm

Add an entry to `runtimes` section for Qualcomm
lab configurations.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add `baseline-x86` job for qualcomm

Add job configuration `baseline-x86-qualcomm` for
running baseline job in Qualcomm LAVA lab.
Add scheduler entry as well.

Signed-off-by: Jeny Sadadia <[email protected]>

* docker-compose.yaml: add lab-qualcomm runtime

Add runtime argument `lab-qualcomm` to `scheduler-lava`
container. This will enable the pipeline to run and
submit jobs to Qualcomm LAVA lab.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add `baseline-arm64` job for qualcomm

Add job configuration `baseline-arm64-qualcomm` for
running baseline job for `arm64` in Qualcomm LAVA lab.
Add scheduler entry as well.

Signed-off-by: Jeny Sadadia <[email protected]>

* pipeline.yaml: Update RISC-V configs

1)rv32 defconfig doesn't exist, remove
2)nommu_k210_defconfig have modules disabled

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* lava_callback.py: Sanitize lava log data

As we use this data in reports, lets remove all
non-printable characters as they confuse grafana, browsers and others.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config/runtime/kunit.jinja2: fix result map

Fix result map for skipped tests. Initially, API
didn't have `skip` available node result in the schema.
That's why it was mapped to `None` result. But now API
has `skip` result to denote skipped tests.
Fix the result mapping accordingly.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: jobs-chromeos: Add lab-setup fragment

Add the lab-setup fragment to the chromebook builds, which contains the
architecture independent kernel configs needed to run tests on the
platform. Notably this disables IP autoconfig by the kernel.

The result of this change is that the 12 seconds boot delay and the
consequent deferred probe pending warnings will no longer happen on any
platform. Particularly on mt8186-corsola-steelix-sku131072 (due to a
different network adapter being used) on which it was still happening.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* lava_callback: bump up slightly threads number

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: chromeos: enable watchdog reset test on Chromebooks

Add a basic test to verify watchdog reset functionality. Enable the
test on all ARM64 and AMD x86_64 Chromebooks. For Intel
Chromebooks, enable the test only on octopus, as ACPI PM Timer on the
other devices has been disabled in coreboot.

Signed-off-by: Laura Nao <[email protected]>

* src/send_kcidb: use schema version 4.3

Test status `MISS` was added to KCIDB in schema
v4.2 and supported by the latest version i.e. v4.3.
Hence, use the latest version for submission as
API may send a few tests with "MISS" status.

Signed-off-by: Jeny Sadadia <[email protected]>

* send_kcidb: re-structure code for parsing checkout node

Move code for parsing checkout node to a separate
method.
Add `valid` field to parsed checkout node. It denotes
if source code was successfully checked out.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: print more information on invalid data

Print details for invalid revision data for the
sake of debugging.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: optimize `kcidb` import

Remove redundant `kcidb` import and adjust
kcidb Client call accordingly.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: remove keys with `None` values

KCIDB doesn't allow `None` as field value.
Remove all optional fields with `None` value
to make it valid data for submitting to KCIDB.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: add `kcidb_test_suite` property

Every KernelCI test will be mapped to a unified
test suite for KCIDB data submission.
Add `kcidb_test_suite` property to test job
definitions in YAML configuration files.
The added property will store the mapped
KCIDB test suite name.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: parse and submit node test and build data

Listen to all the node events with node state
`done` or `available` and submit the node to KCIDB.
Parse node received from the event and create KCIDB
schema compatible object based on type of the node
i.e. checkout, build or test.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: set `log_excerpt` for builds and tests

Fetch logs from compressed log file(*.log.gz) URL
and send last 16*1024 characters for setting `log_excerpt`
field for build and test nodes as it is the max allowed
length of the KCIDB field.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/jobs-chromes: add kcidb test suite property for watchdog test

Add KCIDB test suite mapping for `watchdog_reset` test.

Signed-off-by: Jeny Sadadia <[email protected]>

* lava_callback.py: disable log removal from callback data

We need it for investigations if we have any critical data
loss during log sanitizing.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* src/send_kcidb: add error info to build nodes

Add error metadata fields such as `error_code` and
`error_msg` to `misc` field for build nodes.

Signed-off-by: Jeny Sadadia <[email protected]>

* result_summary presets: add watchdog-reset presets for mainline/next

Add monitor and summary presets to track the results from the watchdog
reset test on the mainline and next trees.

Signed-off-by: Laura Nao <[email protected]>

* pipeline.yaml: Fix fluster rootfs URL

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* src/send_kcidb: get error metadata for failed/incomplete tests

Tweak condition to get error metadata for test nodes.
It should get error info for incomplete nodes as well
and not just failed nodes.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: send tests only if KCIDB test mapping exists

All test suite definitions must have `kcidb_test_suite`
property i.e. KCIDB test suite mapping.
Only send tests for those the mapping is found.

Signed-off-by: Jeny Sadadia <[email protected]>

* tests/validate_yaml: add validation for KCIDB mapping

To submit KernelCI generated data to KCIDB, it is required
to have a mapping for all the job definition with
`kcidb_test_suite` property.
Add validation to ensure all the jobs have a mapping
present to avoid missing data submission.
This check is to notify test authors trying to enable tests
in maestro to include the required property for the mapping
in their definition.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add qcs6490-rb3gen2 boot test

Signed-off-by: Milosz Wasilewski <[email protected]>

* config: chromeos: Enable kselftest-dt on Qualcomm platforms

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* pipeline.yaml: Add one um build for android trees

As per request of Android team it will be good to check for breakages
UM builds as well.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: use `kind=job` for test suites

As part of re-structuring test hierarachy, `Job` model
has been introduced for test suite/job nodes.
It uses node kind `job`.
Update test configurations in `pipeline.yaml` and
`jobs-chromeos.yaml` to use `kind=job` to
generate job nodes.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/runtime/kunit.jinja2: provide `kind` value for child tests

In case of submitting test hierarchy, child nodes by default
inherit `kind` value from parent node.
As we are re-structuring test hierarchy, test suit/job nodes
will have `kind=job` where its child test nodes will have
`kind=test`. Provide `kind` field explicitly to test result
hierarchy to preserve different kind value than the parent
node.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/runtime/kunit.jinja2: fix `NameError`

Fix the below error in `_submit` method:
```
Traceback (most recent call last):
  File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 287, in main
    job.submit(results)
  File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 138, in submit
    self._submit(result)
  File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 265, in _submit
    return node
NameError: name 'node' is not defined
```

Signed-off-by: Jeny Sadadia <[email protected]>

* config/runtime/kunit.jinja2: evaluate job node result

Evaluate job node result from child node results if
`null` result is receive from test result parser.
For example nodes such as `fortify`:
https://staging.kernelci.org:9000/viewer?node_id=6670ab43d0b7694b399897c4

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: fix parsing of KUnit log file

Handle both compressed(gzip) and plain text log files
for getting log excerpt.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: HTTP exception handling for log excerpt

Add HTTP exception handling for getting
log excerpt data.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: platforms-chromeos: Add serial delay for some Mediatek platforms

Add test_character_delay to the Spherion, Tomato and Steelix platforms
to workaround the fact that they're sometimes unable to process serial
input fast enough, resulting in mangled commands and consequently flaky
test results, as described in
https://github.com/kernelci/kernelci-project/issues/366.

The right place to do this change would be in the device-type template
as described in LAVA's documentation [1]. This overriding in KernelCI is
meant only as a temporary workaround to verify whether this fixes the
issue. If it does, then we'll do it in LAVA upstream instead.

[1] https://docs.lavasoftware.org/lava/debugging.html#differences-in-input-speeds
Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* config: chromeos: Enable error-logs kselftest for MediaTek Chromebooks

Run the error-logs kselftest on MediaTek Chromebooks. This test is
currently under review upstream [1] so, in the meantime, it has been
added to the collabora-next tree so it can prove its value by helping to
detect issues upstream.

[1] https://lore.kernel.org/all/[email protected]

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* config/pipeline.yaml: enable CIP lab

Add configuration for LAVA CIP lab.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add baseline-x86 test for CIP

Add `baseline-x86-cip` test to be submitted to CIP
LAVA lab.

Signed-off-by: Jeny Sadadia <[email protected]>

* docker-compose.yaml: add `lab-cip` runtime

Add runtime argument `lab-cip` to `scheduler-lava`
container. This will enable the pipeline to run and
submit jobs to CIP LAVA lab.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: enable `job` node submission to KCIDB

Parse newly added job node and its child tests
for KCIDB submission.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: don't submit `setup` test suite nodes

`setup` test suite has been introduced to store test results
for environment setup checks before running actual test suite.
KCIDB doesn't require `setup` test suite result as long as
main test job result is submitted.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: add a check before sending data

Check if parsed data is available before
sending revision data to KCIDB.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: fix logs

Fix log statement about submitting node to KCIDB
as we are not sending all the nodes we receive
event for to KCIDB.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: handle skipped tests

Do not retrieve artifacts or metadata from parent
node for skipped tests as in pratice only kernel
revision, test runtime and platform will be
available for skipped tests.

Signed-off-by: Jeny Sadadia <[email protected]>

* result_summary/utils: ignore failures on log retrieval

Make the script continue running if there was an error fetching a test
log.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* doc/developer-documentation: add docs for enabling new tests

Add developer documentation for enabling new tests.

Signed-off-by: Jeny Sadadia <[email protected]>

* Fix links after docs page migration

Documentation has been migrated to the "docs.*" subdomain.

Signed-off-by: Paweł Wieczorek <[email protected]>

* pipeline.yaml: Add kcidebug fragment

Add useful low-overhead debug option to kernel,
and test on most x86 boards we have available,
with minimal baseline tests.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* configs: update gcc-10 to gcc-12

As we upgrade compiler images, we need update gcc version

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* regression_tracker: workaround: match node paths programatically

Don't use 'path' as an api search parameter. The use of lists as query
parameters (path is a list) is undefined. Instead, do the filtering in
code.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: remove qemu jobs from lab-qualcomm

QEMU jobs use container pulled from hub.docker.com. After the lab move
pulling from this registry is no longer possible at Qualcomm. This patch
disables QEMU jobs from Qualcomm lab.

Signed-off-by: Milosz Wasilewski <[email protected]>

* validate_yaml.py: Improve pipeline validation

Add validation that scheduler entries have matching job entry,
this is critical validation, and job entries have at least
one entry in the scheduler.
Fix one entry detected by this validation

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* pipeline.yaml: Add broonie(Mark Brown) trees to pipeline

It is time to enable even more trees.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* validate_yaml.py: Add additional verification for duplicate keys

We might have redefined same keys in different yaml files,
this tool will ensure consistency of this entries.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* validate_yaml.py: Remove path separator

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* validate_yaml.py: Rename variable to schedules

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config/kernelci.toml: update KCIDB origin name

As we agreed to refer new KernelCI API & Pipeline as
"maestro", use the new name while submitting data
to KCIDB.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: update KCI result mapping with KCIDB status

Update evaluation of KCIDB status from KCI result.

Create 2 categories for error codes:
1. When pre-check tests completed but actual test suite
coudln't run - this will have `MISS` status
2. When pre-check tests completed, actual test suite could
run but somehow couldn't complete - this will have `ERROR` status

Some LAVA error codes can occur at any point of execution
such as `Cancelled` and `Test`.
Listed such error codes to the most relevant category
based on analysis of available results.

Signed-off-by: Jeny Sadadia <[email protected]>

* result_summary presets: fix presets for v4l2-decoder-conformance

Following recent updates to data representation on KernelCI nodes,
the top-level nodes for tests now have their kind set to 'job' instead
of  'test'. Update the presets for v4l2-decoder-conformance tests
accordingly.

Signed-off-by: Laura Nao <[email protected]>

* result_summary presets: fix output file name in kselftest-acpi preset

Signed-off-by: Laura Nao <[email protected]>

* config: enable dmabuf-heaps, exec and iommu kselftest suites

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Add kcidb_test_suite

* config: result-summary: add generic rule to monitor failures and regression

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: Add rt-stable builds

Copy rt-stable builds from legacy KernelCI.

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Major changes to move to new way of writing kbuild jobs

* config: pipeline: Add v6.6-rt branch for builds

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: result-summary: add rt-stable kbuilds presets

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: chromeos: Add 'nfs' suffix to KCIDB suite name for baseline-nfs

The baseline test is currently run with both ramdisk and nfs rootfs. To
distinguish baseline-nfs tests in KCIDB, add an 'nfs' suffix to the KCIDB
test suite name.

Signed-off-by: Laura Nao <[email protected]>

* aks: Add kubernetes kcidb deployment

We need file that will manage deployment of kcidb bridge
in kubernetes production deployment.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* kubernetes: Adjust trigger k8s options

Ignore kernelci tree on production, as it is special
"staging"-only tree, and read all /config directory, not just default
pipeline.yaml.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* regression_tracker: bugfix: catch empty search condition

Fix _get_last_matching_node(), after the previous change there was an
unhandled scenario where nodes may be empty but the function wouldn't
return None immediately.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: pipeline: correct the kind of kselftest suites to job

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* scheduler-chromeos.yaml: Temporarily disable non-essential tast tests

As per discussion, we disable temporary tast tests which unlikely
will be reviewed.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* k8s/aks: Update deployment files

1)Update memory limit, as working with linux sources might require 3Gbyte of RAM.
2)Update config file path
3)Add callback environment variable
4)Update image reference to fresh one

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: pipeline: enable android builds with gcc-12 for all architectures

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: enable android builds with clang-17 for all architectures

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: remove build_variants from android build_configs

The build_variants is legacy way to specify the different variants. We
have moved to the newer way to specify the variants. Hence remove the
build_variants from android build_configs.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: add android15-6.6-lts branch for build as well

The android15-6.6-lts has been included recently in legacy KernelCI:
https://github.com/kernelci/kernelci-core/pull/2597

Add the same in newer KernelCI.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: add blocklist for riscv older kernels for android builds

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: update KCIDB test suite mapping for baseline

Use `boot` as KCIDB test suite mapping for all
baseline tests.

Signed-off-by: Jeny Sadadia <[email protected]>

* callback_url: Update config and README

As we are moving callback URL to environment variable,
updating config and README accordingly.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: pipeline: enable android baseline (boot) testing for arm and arm64 in only allmodconfig

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* scheduler.py: If event have jobfilter, inject it to the node data

When someone generate artificial event with jobfilter, this is
likely maintainer trying to repeat job. Treat this accordingly,
and inject job filter to job node, so we will run only tests
maintainer wants.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* lava_callback: migrate to fastapi

It will be easier to maintain API and Pipeline, as
both will be powered by FastAPI framework.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: chromeos: Update fluster rootfs URL

Signed-off-by: Laura Nao <[email protected]>

* config: pipeline: fix defconfigs in fragments

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* kbuild.jinja2: support defconfig as list or str

As required in https://github.com/kernelci/kernelci-core/pull/2608
defconfig might be two types. Support it in jinja2 accordingly.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: piepline: add kbuilds of lee-mfd with default defconfigs

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: enable baseline testing for mfd for one board of each arch

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: fix platform sections for Qualcomm and Android schedules

Signed-off-by: Paweł Wieczorek <[email protected]>

* k8s: Update deployment to uvicorn, as we use fastapi now

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: pipeline: Unblock android runs on lava-collabora

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* pipeline: Enable preempt-rt cyclictest test

Enable the first preempt-rt test, cyclictest in new KernelCI. Enable it
on all platforms.

Since these are all smoke test there is no point in running them too
long. Thus reduce the runtime per test to one minute. This should keep
the total preempt-rt runtime roughly in the same time frame.

The changes have been ported from Daniel's PR [1].

[1] https://github.com/kernelci/kernelci-core/pull/2397

Signed-off-by: Daniel Wagner <[email protected]>
Co-developed-by: Muhammad Usama Anjum <[email protected]>
Signed-off-by: Muhammad Usama Anjum <[email protected]>

* pipeline: add all the test jobs for all rt-test

Add jobs definition of all the rt-tests. Enable cyclicdeadline and rtla
tests to run on all targets.

The changes have been ported from Daniel's PR [1].

[1] https://github.com/kernelci/kernelci-core/pull/2397

Signed-off-by: Daniel Wagner <[email protected]>
Co-developed-by: Muhammad Usama Anjum <[email protected]>
Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: add template and test properties for preempt_rt jobs

Add template, job add kcidb_test_suite properties for all preempt-rt jobs

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: rename preempt-rt to rt-tests which is correct name of tests

The legacy was using preempt-rt name of tests. But the repository has
rt-tests name. We must use the same name to merge with execution results
coming from other CIs in KCIDB.

Suggested-by: Jeny Sadadia <[email protected]>
Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: add the correct nfsroot for rt-tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: Remove android's deprecated branches

It has been confirmed with Todd that we should remove the deprecated
branches. Hence remove those branches.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: run baseline on non-allmodconfig

The allmodconfig generates very large kernel image. It cannot be booted
on the arm64 and arm targets as tftp errors out that size is too large.
Reduce the kernel image size. Use the default defconfig. The same
defconfigs have been booting for other trees.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* doc: developer-documentation: Update documentation by adding more details

- Reorganize some things
- Specify how to write different variants by removing old syntax
- Give two separate templates for kbuild and test
- Try to put more details for new contributors

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes since v1:
- Fix type
- Apply suggestions from code review

* doc/developer-documentation: fix a glitch in enabling new tree section

Fix a minor bug in YAML block formatting.

Fixes: f5f57de ("doc: developer-documentation: Update documentation by adding more details")
Signed-off-by: Jeny Sadadia <[email protected]>

* doc/developer-documentation: update a section title

Rename a section from "Enabling a new Kernel tree" to
"Enabling new KernelCI trees, builds, and tests" as it explains
enabling tests as well.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: use the new `tree:branch` format for rules

For cases where we want a single branch to be allowed for a given tree,
we can now use the `tree:branch` format in rules. Convert existing rules
accordingly.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: pipeline: fix improper use of "filters" attribute

The `filters` param was used in the legacy system but has been replaced
by `rules`, with a different syntax.

For Android RISC-V builds, this was used to deny job execution on
kernels < 4.19, so let's translate this condition with the rules format,
and do a similar change for the `rt-tests`-based jobs.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config/pipeline.yaml: Fix x86 typo in kcidebug job names

The kcidebug jobs that run on MediaTek and Qualcomm platforms should
have arm64 in the name rather than x86. Fix the typo.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* config: pipeline: remove params

The parameters are only needed when they are changed or appeneded.
Remvoe the parameters which aren't being modified.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* validate_yaml.py: Jobs are required to have template parameter

Add more validation to config files of mandatory parameters.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* validate_yaml.py: Add more job validations

Add basic validation, each job must have kind parameter

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* workflows: Add label on CI check failures

Automatically add label so broken PR wont go to staging

Signed-off-by: Denys Fedoryshchenko <[email protected]>

---------

Signed-off-by: Jeny Sadadia <[email protected]>
Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
Signed-off-by: Denys Fedoryshchenko <[email protected]>
Signed-off-by: Ricardo Cañuelo <[email protected]>
Signed-off-by: Helen Koike <[email protected]>
Signed-off-by: Arnaud Ferraris <[email protected]>
Signed-off-by: Laura Nao <[email protected]>
Signed-off-by: Muhammad Usama Anjum <[email protected]>
Signed-off-by: Shreeya Patel <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Milosz Wasilewski <[email protected]>
Signed-off-by: Paweł Wieczorek <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
Co-authored-by: Jeny Sadadia <[email protected]>
Co-authored-by: Nícolas F. R. A. Prado <[email protected]>
Co-authored-by: Ricardo Cañuelo <[email protected]>
Co-authored-by: Helen Koike <[email protected]>
Co-authored-by: Arnaud Ferraris <[email protected]>
Co-authored-by: Laura Nao <[email protected]>
Co-authored-by: Muhammad Usama Anjum <[email protected]>
Co-authored-by: Shreeya Patel <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Milosz Wasilewski <[email protected]>
Co-authored-by: Paweł Wieczorek <[email protected]>
Co-authored-by: Milosz Wasil…
nuclearcat added a commit to nuclearcat/kernelci-pipeline that referenced this issue Jul 24, 2024
* src/scheduler: store error message when job fails with "submit_error"

It is helpful for debugging to catch error message when
scheduler fails to submit job to runtime.
Store the error message to `data.error_msg` field.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: pipeline: Set minimum kernel version for DT kselftest to 6.7

The test was introduced upstream in version 6.7, so no point in trying
to run it on earlier versions.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* configs/: Update volteer device

Update volteer devices according lab availability

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result_summary templates: detailed output for active/inactive regressions

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: new presets for active regressions

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: update CHANGELOG

Signed-off-by: Ricardo Cañuelo <[email protected]>

* data: chmod -R 777 ./data/output to avoid permission error

Avoid errors like

PermissionError: [Errno 13] Permission denied: '/home/kernelci/data/output/stable-rc-boot.html'

Signed-off-by: Helen Koike <[email protected]>

* result_summary: move code to _get_logs

Signed-off-by: Helen Koike <[email protected]>

* result_summary: use ThreadPoolExecutor to fetch logs

Fetching logs is the bottleneck of the script. Fetch them in parallel
with ThreadPoolExecutor.

Signed-off-by: Helen Koike <[email protected]>

* result_summary: fix result presets

stable-rc-build-failures and stable-rc-boot-failures weren't querying
specifically for test failures.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* src/regression_tracker: rework regression detection

Take into account "active" and "inactive" regressions when creating them
and when processing new passed or failed nodes.

When a node passes, it checks if it "inactivates" an existing "active"
regression. When a node fails, it checks if it needs to create a new
regression or update an existing "active" one.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* src/regression_tracker: link failed nodes to active regressions

When a failed node generates a regression, or when it's a re-run of a
run that generated a still active regression, link the node to the
regression id.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: support for date ranges for creation and update

New command line options to let the user specify date ranges for node
creation and last update: --created-from, --created-to,
--last-updated-from, --last-updated-to

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: support for date ranges for creation and last update

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: support for extra query parameters in cmdline

New command line option: --query-params to specify a set of extra query
parameters to complete or override preset parameters.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: html markup in some preset titles

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary changelog: update and move to docs folder

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: move parameter loading and processing to 'setup'

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: refactor and split into two clases (single, run)

Split the ResultSummary class into a base class and two child classes:
ResultSummarySingle and ResultSummaryLoop (only a stub at this point).

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: WIP initial implementation of the "loop" command

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: huge refactoring

Implement "summary" (single-shot) and "monitor" (loop) modes based on
preset parameters instead of on the command-line main command.

Split the logic into multiple files, move all monitor-specific and
summary-specific code to independent files, common code in a separate
file.

Full of kludges, I don't like how this is looking so far, might consider
reimplementing it without any dependencies on pipeline code.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: fix markup and indentation

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: new generic templates for monitor mode

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: examples for "monitor" and "summary" modes

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary changelog: summary and monitor modes

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: fix generic regression report

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: summary: fix last_updated option handling

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: embed css stylesheet in html files

Signed-off-by: Ricardo Cañuelo <[email protected]>

* regression_tracker: [trivial] make regression active by default

Fixup for commit fcb29501663d78920bcd129bd57c36b9af624bc4
If the "result" field is ever made non-optional in the models we can
probably remove this.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* regression_tracker: [trivial] set default empty node sequence

Fixup for commit fcb29501663d78920bcd129bd57c36b9af624bc4
If the "node_sequence" field is ever made non-optional in the models we
can probably remove this.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: add cmdline option --output-dir

Introduce a new command-line option: --output-dir, and rename the old
--output to --output-file.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary changelog: command-line options change

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: jobs-chromeos: remove meaningless Tast tests

Several Tast tests can only fail in the context of KernelCI:
* `video.PlatformDecoding.v4l2_state*_vp9_0_svc` do not actually exist,
  causing the whole test job to fail
* `platform.DLCService*` and `platform.Memd` rely on features only
  present in the downstream Chrom{e,ium}OS kernel (see b/247467814 and
  b/244479619 for those having access to Google's issue tracker)
* `kernel.ConfigVerify.chromeos` relies on downstream-only config
  options such as `CONFIG_SECURITY_CHROMIUMOS` and other similar ones,
  and therefore can only fail when testing upstream kernels

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: scheduler-chromeos: don't execute non-working Tast tests

Currently, HEVC-related tests are known to either fail or be skipped as
ChromeOS doesn't yet handle hardware decoding of HEVC media. This is
expected to be fixed at some point though, so we're keeping the job
definitions and only remove the corresponding scheduler entries in order
to reinstate those jobs when relevant.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: jobs-chromeos: exclude Tast tests known to always fail

Several decoder tests always fail on all platforms where they're
executed, adding only noise to otherwise useful test results. Disable
those for improving the quality of the results.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: chromeos: add special case for pre-6.7 qcom codec tests

On Qualcomm-based ChromeBooks (`trogdor` being the only model in
Collabora's lab), we noticed systematic failures of all
`vp9_*_frm_resize` and `vp9_*_sub8x8_sf` tests when using a kernel up to
6.6. With 6.7 and above, all of those tests (except one) now pass. It
therefore makes sense to exclude those on pre-6.7 kernels so we don't
report known failures and get rid of some noise.

This involves "duplicating" affected test jobs (although I did my best
to minimize that) and setting rules so only the working variant is
executed, based on the version of the kernel being tested.

Signed-off-by: Arnaud Ferraris <[email protected]>

* lava_callback: Compress the log files to save storage space

As storage space in cloud and egress have high costs,
better to compress potentially large files.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* tests: Add basic yaml validation

Add yaml load to figure out earlier issues with yaml

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: chromeos: drop stoneyridge/pineview naming in platforms anchors

The "stoneyridge" and "pineview" naming used in the Chromebook platform
anchors refers to ChromiumOS specific config fragments, but doesn't
necessarily match the actual platform of all the devices listed.
Use more generic names to distinguish amd and intel Chromebooks.

Signed-off-by: Laura Nao <[email protected]>

* config: chromeos: rename test job anchors that use chromeos specific configs

Rename test job anchors that use chromeos specific kernel configurations
to include the 'chromeos' infix.

Signed-off-by: Laura Nao <[email protected]>

* config: chromeos: add baseline tests

Enable the baseline tests on all the supported Chromebooks with their
default kernel configuration.

Signed-off-by: Laura Nao <[email protected]>

* config: chromeos: drop stoneyridge/pineview naming in job defs

The "stoneyridge" and "pineview" naming used in some Chromebook job
definitions refers to ChromiumOS specific config fragments, but
doesn't necessarily match the actual platforms targeted by the jobs.
Replace all occurrences with more generic intel/amd naming.

Signed-off-by: Laura Nao <[email protected]>

* config: chromeos: drop chromeos infix from baseline jobs

Keeping different job names for tests targeting different kernel configs
might cause too much duplication. Drop the 'chromeos' infix from the job
name for the tests using the chromeos config fragment. Users will be
able to filter the results using the data.defconfig/data.config_full
fields anyway.

Signed-off-by: Laura Nao <[email protected]>

* result_summary: post-process results for summary and monitor modes

Split the post-processing of nodes to a common function that can be used
for both summary and monitor modes. Currently, post-processing involves
only the collection of logs.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: update and fix presets and templates

Signed-off-by: Ricardo Cañuelo <[email protected]>

* doc/result-summary-CHANGELOG: update

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config/pipeline.yaml: enable 'BayLibre' lab

Add lab configuration for BayLibre.

Signed-off-by: Jeny Sadadia <[email protected]>

* docker-compose.yaml: add `lab-baylibre` runtime

Add runtime argument `lab-baylibre` to `scheduler-lava`
container. This will enable the pipeline to run and
submit jobs to BayLibre.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add `baseline-x86-baylibre` job

Add job configuration `baseline-x86-baylibre` for BayLibre.
Add scheduler entry as well.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add `baseline-armel-baylibre` job

Add job configuration `baseline-armel-baylibre` for BayLibre.
Add scheduler entry and platform config as well.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline: enable `android` tree and build configs

Monitor linux `android` tree. Add build configs for `android-mainline`
branch.

Signed-off-by: Helen Koike <[email protected]>

* config/pipeline.yaml: add kbuild definitions for android-mainline

Add kbuild jobs to compile the kernel for android-mainline branch

Signed-off-by: Helen Koike <[email protected]>

* config/pipeline.yaml: add entries to schedule to build android-mainline

Add entries to `scheduler:` section to run the builds for
android-mainline.

Signed-off-by: Helen Koike <[email protected]>

* result_summary: fix node filter in monitor mode

Signed-off-by: Ricardo Cañuelo <[email protected]>

* kernelci.toml: set `checkout` node timeout to `180 min`

Currently set `60 min` timeout is not enough as some
`kbuild` jobs and its sub-tests take around 2 hrs to
complete after getting submitted to runtime.

Here is an example from staging. See the information
for a `checkout` and its child nodes:

| id                       | name                | created                    | updated                    | timeout                    |
|--------------------------|---------------------|----------------------------|----------------------------|----------------------------|
| 661c9d59b60b785eb9fc42b0 | checkout            | 2024-04-15T03:22:01.317000 | 2024-04-15T03:51:03.870000 | 2024-04-15T04:22:01.284000 |
| 661c9d97b60b785eb9fc42b4 | kbuild-gcc-10-arm64 | 2024-04-15T03:23:03.399000 | 2024-04-15T03:50:15.031000 | 2024-04-15T09:23:03.399000 |
| 661ca3f7b60b785eb9fc4ead | baseline-arm64      | 2024-04-15T03:50:15.304000 | 2024-04-15T05:09:45.247000 | 2024-04-15T09:50:15.304000 |

Signed-off-by: Jeny Sadadia <[email protected]>

* result_summary: add email report capabilities for monitor mode

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: plain text single report templates

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: chromeos: add baseline-nfs tests

Enable the baseline-nfs tests on all the supported Chromebooks, with
both the default and the chromeos kernel configurations.

Signed-off-by: Laura Nao <[email protected]>

* src/timeout: set `checkout` result

For `TIMEOUT` mode, set `checkout` node result to `fail`
if its state is `running` as it means code checkout is still
going on and node timed-out. Set it to `pass` if its state
is any other than `running`.
Set `checkout` node result to `pass` if mode is `DONE` as
it means once `checkout` has been in `available` or `closing`
state and it could successfully complete source code checkout.

Signed-off-by: Jeny Sadadia <[email protected]>

* regression_tracker: bugfix, failed test with no prior runs

Handle the case of a failed test run when it's the first occurence of
that test case. Consider it "not a regression" for now, since we're
defining a regression as a "breaking point" between a success and a
failure.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: platforms-chromeos: fix dalboz device type

Due due to a copy/paste mishap, the device type for
`asus-CM1400CXA-dalboz` had a trailing `_chromeos`, leading LAVA to fail
finding the correct device type, and no job from the new system running
on this platform.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: jobs-chromes: run Tast tests only on 5.4+

Current ChromeOS images have `ext4` filesystems using options not
present in 4.19. Therefore tests cannot run on kernels that old, and
this leads to false positives in corrupt device identification, so we
should only run those tests on 5.4 and later kernels.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: platforms-chromes: drop non-existent platform

`hp-x360-12b-ca0500na-n4000-octopus` isn't a device type available in
Collabora's LAVA lab, so let's drop its definition.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: exclude android tree from kbuild jobs

Only Android-specific kbuild jobs should run for this tree, let's not
overload our system with unneeded builds.

Take this opportunity to limit mediatek kbuilds to 6.1+ as that's the
earliest version that has upstream support for at least one of our
devices.

Signed-off-by: Arnaud Ferraris <[email protected]>

* src/timeout: a bug fix in `_submit_lapsed_nodes`

Fix a glitch in the code related to setting `checkout`
node result.

Fixes: 361fc0d ("src/timeout: set `checkout` result")
Signed-off-by: Jeny Sadadia <[email protected]>

* pipeline.yaml: Update early access FQDN

We are moving k8s from eastus to westus3 as it is cheaper

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* src/tarball: fix `_kdir` in `update_repo`

Fix the below error:
```
kernelci-pipeline-tarball |   File "/home/kernelci/./pipeline/tarball.py", line 79, in _update_repo
kernelci-pipeline-tarball |     kernelci.shell_cmd(f"rm -rf {self._kdir}")
kernelci-pipeline-tarball |                                  ^^^^^^^^^^
kernelci-pipeline-tarball | AttributeError: 'Tarball' object has no attribute '_kdir'
```

Fixes: 0a2fe9c ("src/patchset.py: Implement Patchset service)
Signed-off-by: Jeny Sadadia <[email protected]>

* src/timeout: fix method to get child nodes recursively

`TimeoutService._get_child_nodes_recursive` is used to get
pending child nodes recursively for closing and timed-out
nodes. It overwrites the result while being called recursively.
Fix the method to make it work properly.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: pipeline: rename "armel" arch to "arm"

`armel` has various meanings depending on the system: for ChromeOS, it
is ARMv7, while in Debian it's ARMv{5T,6}. Moreover, this project is
*Kernel*CI and the kernel uses `arm` for all 32-bits ARM devices. In
order to avoid confusion (including those wondering what the heck does
`armel` mean), let's rename `armel` to `arm`.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: use per-system arch property where relevant

With the new `*arch` fields present in the platform configurations, we
don't have to hardcode the architecture strings in some specific cases.
Let's adapt the config files so we use `{cros,deb,k}arch` wherever it
makes sense.

Signed-off-by: Arnaud Ferraris <[email protected]>

* src/timeout: set timed-out `checkout` result

Set timed-out `checkout` node result to `incomplete`
while in `running` state. As it denotes that the node
timed-out while checkout was still going on.
Also, set error related information i.e. `error_code`
and `error_msg`.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/tarball: update checkout node when update repo fails

Tarball updates source code repo and creates tarball.
If update repo operation fails even with second attempt,
it means it failed to checkout souce code.
Hence, update `checkout` node with state `done` state and
result `fail`. Also, set appropriate error information
to the `data` field.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: pipeline: enable collabora-next tree and build config

Monitor the collabora-next tree. Add build config for the for-kernelci
branch.

Signed-off-by: Laura Nao <[email protected]>

* config: chromeos: enable acpi kselftest on collabora-next tree

Run the ACPI kselftest on the for-kernelci branch of the collabora-next
tree.

See: https://lore.kernel.org/linux-kselftest/[email protected]/T/#t

Signed-off-by: Laura Nao <[email protected]>

* result_summary: restore missing split_query_params function

Restore this function that was accidentally removed during the last
refactoring.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* lava_callback: Don't upload empty files to Azure

There is no use for lot of empty files on Azure,
that only complicate cleanup.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result_summary presets: unify preset and output names

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: update preset for aferraris

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: new presets for laura.nao

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: fixes and new presets for nfraprado

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: fix arch query parameters

Signed-off-by: Ricardo Cañuelo <[email protected]>

* k8s: Lot of deployment tested fixes

Fixes in yaml files for k8s production deployment.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result-summary presets: Fix build failure and regression monitors

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* result_summary: added debug traces to the monitor

Show detailed info of the node filterings in real time.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: fix corner case bug when no logs are found

Cover rare case where neither the node nor any of its parents up to the
checkout node have any log artifacts.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: refine stable-rc presets

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: add regression info to test reports

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: escape log snippets

Signed-off-by: Ricardo Cañuelo <[email protected]>

* src: lava_callback: add device ID to node data

It can be useful to know the exact device on which a job ran, without
having to open the LAVA job page. This is done by querying the device ID
from the callback data and appending it to the node data.

Signed-off-by: Arnaud Ferraris <[email protected]>

* src: lava_callback: upload raw callback data as well

Debugging callback issues is complex due to the raw data not being saved
after processing. This change ensures we save the callback data as a
JSON file in order to ease development.

Signed-off-by: Arnaud Ferraris <[email protected]>

* DONOTMERGE lava_callback: add debug statements

Why the heck doesn't this just work???

Signed-off-by: Arnaud Ferraris <[email protected]>

* result_summary_templates: fix error 'node' is undefined

The object is named test and not node, so s/node/test

Signed-off-by: Helen Koike <[email protected]>

* config/runtime/kunit: set architecture info

Set architecture field for `kunit` test
nodes.
If no `arch` argument is supplied, kunit takes
`um` (User Mode Linux) as architecture to run
tests.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/timeout: count running child jobs of build nodes

Add a method to count running jobs of `kbuild`
nodes i.e. jobs being submitted after successful
builds. Fox example `baseline` or `tast` jobs.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/timeout: handle closing `checkout` node differently

Usually, `checkout` should be transited to `done` state
when all its child nodes are completed.
In case of closing `checkout`, take into account
running child jobs of build nodes before transiting
its state to `done`. Otherwise, `checkout` will be
assigned to `done` state even if some child jobs are still
running.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/timeout: handle holdoff reached `checkout` node differently

Usually, available `checkout` for which holdoff is
reached should be transited to `done` state only when
all its child nodes are completed.
In case of such `checkout` node, take into account
running child jobs of build nodes before transiting
its state to `done`. Otherwise, `checkout` will be
assigned to `done` state even if some child jobs are
still running.

Signed-off-by: Jeny Sadadia <[email protected]>

* Revert "DONOTMERGE lava_callback: add debug statements"

This reverts commit 5ed8218d99840373bbba5830b1976813b52bf4b1.

Signed-off-by: Arnaud Ferraris <[email protected]>

* Create dependabot.yml

* result_summary_templates: make generic-test-failures generic to all
results

The generic-test-failures templates can be used to show general results
just replacing the name "failures" by "results". Makeing it easier to be
re-used by communities that want to have pre-sets to list all results of
the tests, so:

	s/generic-test-failures/generic-test-results

Signed-off-by: Helen Koike <[email protected]>

* result-summary.yaml: add preset to list android build tests

Since we now build android, add a preset to allow result-summary.yaml to
list all build results from Android tree.

Signed-off-by: Helen Koike <[email protected]>

* tarball: Implement checkout for specific commit

We often need not ToT, but specific commit, implement this.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* jobs-chromeos.yaml: Disable module compression for every kernel version

Commit d4bbe942098b ("kbuild: remove CONFIG_MODULE_COMPRESS"),
introduced in kernel v5.13, substituted CONFIG_MODULE_COMPRESS=n for
CONFIG_MODULE_COMPRESS_NONE=y as the way to disable module compression.
Since module compression causes "Invalid ELF header magic: != ELF"
errors during boot on the ChromeOS base config, add the missing config
to disable module compression on kernels > v5.13 as well.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* src: lava_callback: reduce callback data size

The callback data is quite large, especially as it includes the full log
which we already upload separately. By dropping it and compressing the
whole file with `gzip` we can avoid wasting too much storage space.

Signed-off-by: Arnaud Ferraris <[email protected]>

* src: lava_callback: don't leak secret token

The callback data contains the secret tokens value which shouldn't be
leaked. Ensure we drop it from the uploaded data.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: platforms-chromeos: use new cros-flash image

This ensures we use the new version of the `install-modules` script.

Signed-off-by: Arnaud Ferraris <[email protected]>

* src: regression_tracker: add the "device" field to regression data

This can be helpful. We're not using it as a search param though, as we
don't want to narrow down the search that much, using the platform only
is better.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: result_summary_templates: report device used for job

This information is now available, and it can be useful to know the
affected device withouth having to look at the LAVA job details.

Signed-off-by: Arnaud Ferraris <[email protected]>

* kubernetes: Update deployment recipe

Update list of labs and add KCI_INSTANCE variable.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* lava-callback: Limit threads of lava-callback

Due inrush of lava callbacks and slow Azure Files
processing, we need to make sure we dont spawn too many
threads.
Also add hard limit of memory 1Gbyte

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result_summary presets: add presetes for fluster test

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Make template generic for all v4l2 tests
- Rebase on main

* result_summary presets: make the name of fluster test generic

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: enable first fluster test for mt8195-cherry-tomato-r2

Enable first fluster test, AV1-TEST-VECTORS for mt8195-cherry-tomato-r2.
Run the test on mainline and next until more trees are added.

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Create generic v4l2-decoder-conformance-job and use anchers from it
- Update the rootfs address
- Move anchor to _anchor
- Update with nitpicks

* config: jobs-chromeos: Add kernelci tree for testing purpose

Remove this commit before merging.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: chromeos: Enable cpufreq kselftest

Enable cpufreq kselftest on all the trees and branches.

Signed-off-by: Shreeya Patel <[email protected]>

* result_summary presets: fix preset for kselftest-dt failures monitor

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: new presets for kselftest-cpufreq

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: mt8195-cherry-tomato-r2: enable all fluster tests for all branches

Add all the trees and branches on which the tests would be ran. Enable
all the tests for tomato.

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- The build config cannot be added yet. Just list the trees, it will only use
  the branches configured in build_configs:
  - mainline will use master
  - next will use master
  - collabora-chromeos-kernel will use for-kernelci
  - media will use master and fixes
- Remove kernelci tree as it was added just for testing purpose

* config: mt8183-kukui-jacuzzi-juniper-sku16: enable add all supported fluster tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>

jacuzzi

* config: mt8186-corsola-steelix-sku131072: enable add all supported fluster tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: mt8192-asurada-spherion-r0: enable add all supported fluster tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Don't specify the platforms manually as they are already mentioned in
  test-job-arm64-mediatek

* config: sc7180-trogdor-kingoftown/lazor-limozeen: enable add all supported fluster tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Use test-job-arm64-qualcomm instead and carete separate jobs for
  qualcomm devices
- Don't specify platforms manually as they are already mentioned in
  test-job-arm64-qualcomm

* build(deps): bump uwsgi from 2.0.21 to 2.0.22 in /docker/lava-callback

Bumps [uwsgi](https://uwsgi-docs.readthedocs.io/en/latest/) from 2.0.21 to 2.0.22.

---
updated-dependencies:
- dependency-name: uwsgi
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* pipeline.yaml: Add stable-rc build variants

Add more build variants for stable-rc tree to match legacy system.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result_summary: add error classification

Classify errors according to patterns in the logs

Signed-off-by: Helen Koike <[email protected]>

* result_summary presets: add collabora-chromeos-kernel and media trees for fluster tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: Use media-stage instead of media-tree

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config/pipeline: enable android branches from legacy

Enable all android branches from the legacy system

Signed-off-by: Helen Koike <[email protected]>

* trigger: Add exclude/include tree list for trigger

As we need to restrict list of running kernels on staging,
we need to add option allowing that.
Also it will be good to exclude staging kernels from production
kernel list.

So in case of staging we need to run kernels only from tree "kernelci"
and sometimes something else, for example "mediatek".
Option will look like:

--trees kernelci,mediatek
or
--trees kernelci

On production we need to exclude trees kernelci and buggytree:
--trees !kernelci,buggytree
or just kernelci:
--trees !kernelci

Purpose of this option is that our compiling capacity is limited,
and right now staging and production both compiling very large set
of kernels, we need to reduce this amount to drop costs.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: platforms-chromeos: use CrOS R124 files

ChromeBooks were upgraded with a new image based on ChromiumOS R124, so
we must use those files now.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: jobs-chromeos: drop non-existent Tast tests

Those were removed between R120 and R124 and therefore cause test
failures with the new images.

Signed-off-by: Arnaud Ferraris <[email protected]>

* result_summary presets: fix acpi kselftest presets

We're interested in catching regressions and failures in the both the
kselftest-acpi test suites and its test cases. Match the nodes by group
in the presets accordingly.
Fix template used by the failure monitor preset.

Signed-off-by: Laura Nao <[email protected]>

* src: update return values of `APIHelper.receive_event_node`

`APIHelper.receive_event_node` method is used to receive
node data from PubSub event. The method has been updated
to return `is_hierarchy` flag as well which represents
events related to node hierarchy.
Update pipeline services using the method accordingly.

Signed-off-by: Jeny Sadadia <[email protected]>

* result_summary presets: refine presets for v4l2-decoder-conformance

Modify the regression preset to monitor regressions on both the
v4l2-decoder-conformance test suites and its test cases, by matching the
nodes by group instead of by name.
Also, change the failure preset to monitor for all errors caused by
runtime errors.

Signed-off-by: Laura Nao <[email protected]>

* result_summary presets: add summary presets for v4l2-decoder-conformance

Add summary presets to fetch regressions and failures on
v4l2-decoder-conformance tests. Two of the presets are the same used by
the monitor; add one additional preset to fetch all the failures on
both the test suites and their test cases.

Signed-off-by: Laura Nao <[email protected]>

* lava_callback.py: Remove error_code/error_msg on lava-callback

Sometimes due congestion node might be set to timeout, but
then result might arrive late and we need to use it properly.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result_summary presets: fix dt kselftest presets

Fix the dt kselftest preset, just like was done for the acpi one, as the
current preset doesn't match the actual results we're interested in.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* doc/connecting-lab: refine documentation

Refine documentation for connecting LAVA labs
and submitting jobs to the lab.

Signed-off-by: Jeny Sadadia <[email protected]>

* lava_callback: Sometimes we get totally invalid log file uploaded

Most likely problems lays in threading of flask, and possibly
callbacks are getting mixed. This commit attempts to introduce
several countermeasures against that.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* doc: add `_index.md` page

Add index documentation page.

Signed-off-by: Jeny Sadadia <[email protected]>

* doc: add `pipeline-details` page

Move `pipeline-details` documentation from the API
repository to this repo to make it close to the source.

Signed-off-by: Jeny Sadadia <[email protected]>

* doc/connecting-lab: adjust `weight` property

Change `weight` property of existing doc page to
accommodate with transition of pipeline related docs
to pipeline repo.

Signed-off-by: Jeny Sadadia <[email protected]>

* doc: add `developer-documentation` page

Add developer manual documentation.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add lab config for Qualcomm

Add an entry to `runtimes` section for Qualcomm
lab configurations.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add `baseline-x86` job for qualcomm

Add job configuration `baseline-x86-qualcomm` for
running baseline job in Qualcomm LAVA lab.
Add scheduler entry as well.

Signed-off-by: Jeny Sadadia <[email protected]>

* docker-compose.yaml: add lab-qualcomm runtime

Add runtime argument `lab-qualcomm` to `scheduler-lava`
container. This will enable the pipeline to run and
submit jobs to Qualcomm LAVA lab.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add `baseline-arm64` job for qualcomm

Add job configuration `baseline-arm64-qualcomm` for
running baseline job for `arm64` in Qualcomm LAVA lab.
Add scheduler entry as well.

Signed-off-by: Jeny Sadadia <[email protected]>

* pipeline.yaml: Update RISC-V configs

1)rv32 defconfig doesn't exist, remove
2)nommu_k210_defconfig have modules disabled

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* lava_callback.py: Sanitize lava log data

As we use this data in reports, lets remove all
non-printable characters as they confuse grafana, browsers and others.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config/runtime/kunit.jinja2: fix result map

Fix result map for skipped tests. Initially, API
didn't have `skip` available node result in the schema.
That's why it was mapped to `None` result. But now API
has `skip` result to denote skipped tests.
Fix the result mapping accordingly.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: jobs-chromeos: Add lab-setup fragment

Add the lab-setup fragment to the chromebook builds, which contains the
architecture independent kernel configs needed to run tests on the
platform. Notably this disables IP autoconfig by the kernel.

The result of this change is that the 12 seconds boot delay and the
consequent deferred probe pending warnings will no longer happen on any
platform. Particularly on mt8186-corsola-steelix-sku131072 (due to a
different network adapter being used) on which it was still happening.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* lava_callback: bump up slightly threads number

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: chromeos: enable watchdog reset test on Chromebooks

Add a basic test to verify watchdog reset functionality. Enable the
test on all ARM64 and AMD x86_64 Chromebooks. For Intel
Chromebooks, enable the test only on octopus, as ACPI PM Timer on the
other devices has been disabled in coreboot.

Signed-off-by: Laura Nao <[email protected]>

* src/send_kcidb: use schema version 4.3

Test status `MISS` was added to KCIDB in schema
v4.2 and supported by the latest version i.e. v4.3.
Hence, use the latest version for submission as
API may send a few tests with "MISS" status.

Signed-off-by: Jeny Sadadia <[email protected]>

* send_kcidb: re-structure code for parsing checkout node

Move code for parsing checkout node to a separate
method.
Add `valid` field to parsed checkout node. It denotes
if source code was successfully checked out.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: print more information on invalid data

Print details for invalid revision data for the
sake of debugging.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: optimize `kcidb` import

Remove redundant `kcidb` import and adjust
kcidb Client call accordingly.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: remove keys with `None` values

KCIDB doesn't allow `None` as field value.
Remove all optional fields with `None` value
to make it valid data for submitting to KCIDB.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: add `kcidb_test_suite` property

Every KernelCI test will be mapped to a unified
test suite for KCIDB data submission.
Add `kcidb_test_suite` property to test job
definitions in YAML configuration files.
The added property will store the mapped
KCIDB test suite name.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: parse and submit node test and build data

Listen to all the node events with node state
`done` or `available` and submit the node to KCIDB.
Parse node received from the event and create KCIDB
schema compatible object based on type of the node
i.e. checkout, build or test.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: set `log_excerpt` for builds and tests

Fetch logs from compressed log file(*.log.gz) URL
and send last 16*1024 characters for setting `log_excerpt`
field for build and test nodes as it is the max allowed
length of the KCIDB field.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/jobs-chromes: add kcidb test suite property for watchdog test

Add KCIDB test suite mapping for `watchdog_reset` test.

Signed-off-by: Jeny Sadadia <[email protected]>

* lava_callback.py: disable log removal from callback data

We need it for investigations if we have any critical data
loss during log sanitizing.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* src/send_kcidb: add error info to build nodes

Add error metadata fields such as `error_code` and
`error_msg` to `misc` field for build nodes.

Signed-off-by: Jeny Sadadia <[email protected]>

* result_summary presets: add watchdog-reset presets for mainline/next

Add monitor and summary presets to track the results from the watchdog
reset test on the mainline and next trees.

Signed-off-by: Laura Nao <[email protected]>

* pipeline.yaml: Fix fluster rootfs URL

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* src/send_kcidb: get error metadata for failed/incomplete tests

Tweak condition to get error metadata for test nodes.
It should get error info for incomplete nodes as well
and not just failed nodes.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: send tests only if KCIDB test mapping exists

All test suite definitions must have `kcidb_test_suite`
property i.e. KCIDB test suite mapping.
Only send tests for those the mapping is found.

Signed-off-by: Jeny Sadadia <[email protected]>

* tests/validate_yaml: add validation for KCIDB mapping

To submit KernelCI generated data to KCIDB, it is required
to have a mapping for all the job definition with
`kcidb_test_suite` property.
Add validation to ensure all the jobs have a mapping
present to avoid missing data submission.
This check is to notify test authors trying to enable tests
in maestro to include the required property for the mapping
in their definition.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add qcs6490-rb3gen2 boot test

Signed-off-by: Milosz Wasilewski <[email protected]>

* config: chromeos: Enable kselftest-dt on Qualcomm platforms

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* pipeline.yaml: Add one um build for android trees

As per request of Android team it will be good to check for breakages
UM builds as well.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: use `kind=job` for test suites

As part of re-structuring test hierarachy, `Job` model
has been introduced for test suite/job nodes.
It uses node kind `job`.
Update test configurations in `pipeline.yaml` and
`jobs-chromeos.yaml` to use `kind=job` to
generate job nodes.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/runtime/kunit.jinja2: provide `kind` value for child tests

In case of submitting test hierarchy, child nodes by default
inherit `kind` value from parent node.
As we are re-structuring test hierarchy, test suit/job nodes
will have `kind=job` where its child test nodes will have
`kind=test`. Provide `kind` field explicitly to test result
hierarchy to preserve different kind value than the parent
node.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/runtime/kunit.jinja2: fix `NameError`

Fix the below error in `_submit` method:
```
Traceback (most recent call last):
  File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 287, in main
    job.submit(results)
  File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 138, in submit
    self._submit(result)
  File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 265, in _submit
    return node
NameError: name 'node' is not defined
```

Signed-off-by: Jeny Sadadia <[email protected]>

* config/runtime/kunit.jinja2: evaluate job node result

Evaluate job node result from child node results if
`null` result is receive from test result parser.
For example nodes such as `fortify`:
https://staging.kernelci.org:9000/viewer?node_id=6670ab43d0b7694b399897c4

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: fix parsing of KUnit log file

Handle both compressed(gzip) and plain text log files
for getting log excerpt.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: HTTP exception handling for log excerpt

Add HTTP exception handling for getting
log excerpt data.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: platforms-chromeos: Add serial delay for some Mediatek platforms

Add test_character_delay to the Spherion, Tomato and Steelix platforms
to workaround the fact that they're sometimes unable to process serial
input fast enough, resulting in mangled commands and consequently flaky
test results, as described in
https://github.com/kernelci/kernelci-project/issues/366.

The right place to do this change would be in the device-type template
as described in LAVA's documentation [1]. This overriding in KernelCI is
meant only as a temporary workaround to verify whether this fixes the
issue. If it does, then we'll do it in LAVA upstream instead.

[1] https://docs.lavasoftware.org/lava/debugging.html#differences-in-input-speeds
Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* config: chromeos: Enable error-logs kselftest for MediaTek Chromebooks

Run the error-logs kselftest on MediaTek Chromebooks. This test is
currently under review upstream [1] so, in the meantime, it has been
added to the collabora-next tree so it can prove its value by helping to
detect issues upstream.

[1] https://lore.kernel.org/all/[email protected]

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* config/pipeline.yaml: enable CIP lab

Add configuration for LAVA CIP lab.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add baseline-x86 test for CIP

Add `baseline-x86-cip` test to be submitted to CIP
LAVA lab.

Signed-off-by: Jeny Sadadia <[email protected]>

* docker-compose.yaml: add `lab-cip` runtime

Add runtime argument `lab-cip` to `scheduler-lava`
container. This will enable the pipeline to run and
submit jobs to CIP LAVA lab.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: enable `job` node submission to KCIDB

Parse newly added job node and its child tests
for KCIDB submission.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: don't submit `setup` test suite nodes

`setup` test suite has been introduced to store test results
for environment setup checks before running actual test suite.
KCIDB doesn't require `setup` test suite result as long as
main test job result is submitted.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: add a check before sending data

Check if parsed data is available before
sending revision data to KCIDB.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: fix logs

Fix log statement about submitting node to KCIDB
as we are not sending all the nodes we receive
event for to KCIDB.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: handle skipped tests

Do not retrieve artifacts or metadata from parent
node for skipped tests as in pratice only kernel
revision, test runtime and platform will be
available for skipped tests.

Signed-off-by: Jeny Sadadia <[email protected]>

* result_summary/utils: ignore failures on log retrieval

Make the script continue running if there was an error fetching a test
log.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* doc/developer-documentation: add docs for enabling new tests

Add developer documentation for enabling new tests.

Signed-off-by: Jeny Sadadia <[email protected]>

* Fix links after docs page migration

Documentation has been migrated to the "docs.*" subdomain.

Signed-off-by: Paweł Wieczorek <[email protected]>

* pipeline.yaml: Add kcidebug fragment

Add useful low-overhead debug option to kernel,
and test on most x86 boards we have available,
with minimal baseline tests.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* configs: update gcc-10 to gcc-12

As we upgrade compiler images, we need update gcc version

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* regression_tracker: workaround: match node paths programatically

Don't use 'path' as an api search parameter. The use of lists as query
parameters (path is a list) is undefined. Instead, do the filtering in
code.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: remove qemu jobs from lab-qualcomm

QEMU jobs use container pulled from hub.docker.com. After the lab move
pulling from this registry is no longer possible at Qualcomm. This patch
disables QEMU jobs from Qualcomm lab.

Signed-off-by: Milosz Wasilewski <[email protected]>

* validate_yaml.py: Improve pipeline validation

Add validation that scheduler entries have matching job entry,
this is critical validation, and job entries have at least
one entry in the scheduler.
Fix one entry detected by this validation

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* pipeline.yaml: Add broonie(Mark Brown) trees to pipeline

It is time to enable even more trees.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* validate_yaml.py: Add additional verification for duplicate keys

We might have redefined same keys in different yaml files,
this tool will ensure consistency of this entries.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* validate_yaml.py: Remove path separator

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* validate_yaml.py: Rename variable to schedules

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config/kernelci.toml: update KCIDB origin name

As we agreed to refer new KernelCI API & Pipeline as
"maestro", use the new name while submitting data
to KCIDB.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: update KCI result mapping with KCIDB status

Update evaluation of KCIDB status from KCI result.

Create 2 categories for error codes:
1. When pre-check tests completed but actual test suite
coudln't run - this will have `MISS` status
2. When pre-check tests completed, actual test suite could
run but somehow couldn't complete - this will have `ERROR` status

Some LAVA error codes can occur at any point of execution
such as `Cancelled` and `Test`.
Listed such error codes to the most relevant category
based on analysis of available results.

Signed-off-by: Jeny Sadadia <[email protected]>

* result_summary presets: fix presets for v4l2-decoder-conformance

Following recent updates to data representation on KernelCI nodes,
the top-level nodes for tests now have their kind set to 'job' instead
of  'test'. Update the presets for v4l2-decoder-conformance tests
accordingly.

Signed-off-by: Laura Nao <[email protected]>

* result_summary presets: fix output file name in kselftest-acpi preset

Signed-off-by: Laura Nao <[email protected]>

* config: enable dmabuf-heaps, exec and iommu kselftest suites

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Add kcidb_test_suite

* config: result-summary: add generic rule to monitor failures and regression

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: Add rt-stable builds

Copy rt-stable builds from legacy KernelCI.

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Major changes to move to new way of writing kbuild jobs

* config: pipeline: Add v6.6-rt branch for builds

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: result-summary: add rt-stable kbuilds presets

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: chromeos: Add 'nfs' suffix to KCIDB suite name for baseline-nfs

The baseline test is currently run with both ramdisk and nfs rootfs. To
distinguish baseline-nfs tests in KCIDB, add an 'nfs' suffix to the KCIDB
test suite name.

Signed-off-by: Laura Nao <[email protected]>

* aks: Add kubernetes kcidb deployment

We need file that will manage deployment of kcidb bridge
in kubernetes production deployment.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* kubernetes: Adjust trigger k8s options

Ignore kernelci tree on production, as it is special
"staging"-only tree, and read all /config directory, not just default
pipeline.yaml.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* regression_tracker: bugfix: catch empty search condition

Fix _get_last_matching_node(), after the previous change there was an
unhandled scenario where nodes may be empty but the function wouldn't
return None immediately.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: pipeline: correct the kind of kselftest suites to job

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* scheduler-chromeos.yaml: Temporarily disable non-essential tast tests

As per discussion, we disable temporary tast tests which unlikely
will be reviewed.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* k8s/aks: Update deployment files

1)Update memory limit, as working with linux sources might require 3Gbyte of RAM.
2)Update config file path
3)Add callback environment variable
4)Update image reference to fresh one

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: pipeline: enable android builds with gcc-12 for all architectures

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: enable android builds with clang-17 for all architectures

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: remove build_variants from android build_configs

The build_variants is legacy way to specify the different variants. We
have moved to the newer way to specify the variants. Hence remove the
build_variants from android build_configs.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: add android15-6.6-lts branch for build as well

The android15-6.6-lts has been included recently in legacy KernelCI:
https://github.com/kernelci/kernelci-core/pull/2597

Add the same in newer KernelCI.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: add blocklist for riscv older kernels for android builds

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: update KCIDB test suite mapping for baseline

Use `boot` as KCIDB test suite mapping for all
baseline tests.

Signed-off-by: Jeny Sadadia <[email protected]>

* callback_url: Update config and README

As we are moving callback URL to environment variable,
updating config and README accordingly.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: pipeline: enable android baseline (boot) testing for arm and arm64 in only allmodconfig

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* scheduler.py: If event have jobfilter, inject it to the node data

When someone generate artificial event with jobfilter, this is
likely maintainer trying to repeat job. Treat this accordingly,
and inject job filter to job node, so we will run only tests
maintainer wants.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* lava_callback: migrate to fastapi

It will be easier to maintain API and Pipeline, as
both will be powered by FastAPI framework.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: chromeos: Update fluster rootfs URL

Signed-off-by: Laura Nao <[email protected]>

* config: pipeline: fix defconfigs in fragments

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* kbuild.jinja2: support defconfig as list or str

As required in https://github.com/kernelci/kernelci-core/pull/2608
defconfig might be two types. Support it in jinja2 accordingly.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: piepline: add kbuilds of lee-mfd with default defconfigs

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: enable baseline testing for mfd for one board of each arch

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: fix platform sections for Qualcomm and Android schedules

Signed-off-by: Paweł Wieczorek <[email protected]>

* k8s: Update deployment to uvicorn, as we use fastapi now

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: pipeline: Unblock android runs on lava-collabora

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* pipeline: Enable preempt-rt cyclictest test

Enable the first preempt-rt test, cyclictest in new KernelCI. Enable it
on all platforms.

Since these are all smoke test there is no point in running them too
long. Thus reduce the runtime per test to one minute. This should keep
the total preempt-rt runtime roughly in the same time frame.

The changes have been ported from Daniel's PR [1].

[1] https://github.com/kernelci/kernelci-core/pull/2397

Signed-off-by: Daniel Wagner <[email protected]>
Co-developed-by: Muhammad Usama Anjum <[email protected]>
Signed-off-by: Muhammad Usama Anjum <[email protected]>

* pipeline: add all the test jobs for all rt-test

Add jobs definition of all the rt-tests. Enable cyclicdeadline and rtla
tests to run on all targets.

The changes have been ported from Daniel's PR [1].

[1] https://github.com/kernelci/kernelci-core/pull/2397

Signed-off-by: Daniel Wagner <[email protected]>
Co-developed-by: Muhammad Usama Anjum <[email protected]>
Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: add template and test properties for preempt_rt jobs

Add template, job add kcidb_test_suite properties for all preempt-rt jobs

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: rename preempt-rt to rt-tests which is correct name of tests

The legacy was using preempt-rt name of tests. But the repository has
rt-tests name. We must use the same name to merge with execution results
coming from other CIs in KCIDB.

Suggested-by: Jeny Sadadia <[email protected]>
Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: add the correct nfsroot for rt-tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: Remove android's deprecated branches

It has been confirmed with Todd that we should remove the deprecated
branches. Hence remove those branches.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: run baseline on non-allmodconfig

The allmodconfig generates very large kernel image. It cannot be booted
on the arm64 and arm targets as tftp errors out that size is too large.
Reduce the kernel image size. Use the default defconfig. The same
defconfigs have been booting for other trees.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* doc: developer-documentation: Update documentation by adding more details

- Reorganize some things
- Specify how to write different variants by removing old syntax
- Give two separate templates for kbuild and test
- Try to put more details for new contributors

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes since v1:
- Fix type
- Apply suggestions from code review

* doc/developer-documentation: fix a glitch in enabling new tree section

Fix a minor bug in YAML block formatting.

Fixes: f5f57de ("doc: developer-documentation: Update documentation by adding more details")
Signed-off-by: Jeny Sadadia <[email protected]>

* doc/developer-documentation: update a section title

Rename a section from "Enabling a new Kernel tree" to
"Enabling new KernelCI trees, builds, and tests" as it explains
enabling tests as well.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: use the new `tree:branch` format for rules

For cases where we want a single branch to be allowed for a given tree,
we can now use the `tree:branch` format in rules. Convert existing rules
accordingly.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: pipeline: fix improper use of "filters" attribute

The `filters` param was used in the legacy system but has been replaced
by `rules`, with a different syntax.

For Android RISC-V builds, this was used to deny job execution on
kernels < 4.19, so let's translate this condition with the rules format,
and do a similar change for the `rt-tests`-based jobs.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config/pipeline.yaml: Fix x86 typo in kcidebug job names

The kcidebug jobs that run on MediaTek and Qualcomm platforms should
have arm64 in the name rather than x86. Fix the typo.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* config: pipeline: remove params

The parameters are only needed when they are changed or appeneded.
Remvoe the parameters which aren't being modified.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* validate_yaml.py: Jobs are required to have template parameter

Add more validation to config files of mandatory parameters.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* validate_yaml.py: Add more job validations

Add basic validation, each job must have kind parameter

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* workflows: Add label on CI check failures

Automatically add label so broken PR wont go to staging

Signed-off-by: Denys Fedoryshchenko <[email protected]>

---------

Signed-off-by: Jeny Sadadia <[email protected]>
Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
Signed-off-by: Denys Fedoryshchenko <[email protected]>
Signed-off-by: Ricardo Cañuelo <[email protected]>
Signed-off-by: Helen Koike <[email protected]>
Signed-off-by: Arnaud Ferraris <[email protected]>
Signed-off-by: Laura Nao <[email protected]>
Signed-off-by: Muhammad Usama Anjum <[email protected]>
Signed-off-by: Shreeya Patel <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Milosz Wasilewski <[email protected]>
Signed-off-by: Paweł Wieczorek <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
Co-authored-by: Jeny Sadadia <[email protected]>
Co-authored-by: Nícolas F. R. A. Prado <[email protected]>
Co-authored-by: Ricardo Cañuelo <[email protected]>
Co-authored-by: Helen Koike <[email protected]>
Co-authored-by: Arnaud Ferraris <[email protected]>
Co-authored-by: Laura Nao <[email protected]>
Co-authored-by: Muhammad Usama Anjum <[email protected]>
Co-authored-by: Shreeya Patel <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Milosz Wasilewski <[email protected]>
Co-authored-by: Paweł Wieczorek <[email protected]>
Co-authored-by: Milosz Wasilewski <[email protected]>
Co-authored-by: Daniel Wagner <[email protected]>
Signed-off-by: Denys Fedoryshchenko <[email protected]>
nuclearcat added a commit to nuclearcat/kernelci-pipeline that referenced this issue Jul 24, 2024
* src/scheduler: store error message when job fails with "submit_error"

It is helpful for debugging to catch error message when
scheduler fails to submit job to runtime.
Store the error message to `data.error_msg` field.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: pipeline: Set minimum kernel version for DT kselftest to 6.7

The test was introduced upstream in version 6.7, so no point in trying
to run it on earlier versions.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* configs/: Update volteer device

Update volteer devices according lab availability

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result_summary templates: detailed output for active/inactive regressions

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: new presets for active regressions

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: update CHANGELOG

Signed-off-by: Ricardo Cañuelo <[email protected]>

* data: chmod -R 777 ./data/output to avoid permission error

Avoid errors like

PermissionError: [Errno 13] Permission denied: '/home/kernelci/data/output/stable-rc-boot.html'

Signed-off-by: Helen Koike <[email protected]>

* result_summary: move code to _get_logs

Signed-off-by: Helen Koike <[email protected]>

* result_summary: use ThreadPoolExecutor to fetch logs

Fetching logs is the bottleneck of the script. Fetch them in parallel
with ThreadPoolExecutor.

Signed-off-by: Helen Koike <[email protected]>

* result_summary: fix result presets

stable-rc-build-failures and stable-rc-boot-failures weren't querying
specifically for test failures.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* src/regression_tracker: rework regression detection

Take into account "active" and "inactive" regressions when creating them
and when processing new passed or failed nodes.

When a node passes, it checks if it "inactivates" an existing "active"
regression. When a node fails, it checks if it needs to create a new
regression or update an existing "active" one.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* src/regression_tracker: link failed nodes to active regressions

When a failed node generates a regression, or when it's a re-run of a
run that generated a still active regression, link the node to the
regression id.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: support for date ranges for creation and update

New command line options to let the user specify date ranges for node
creation and last update: --created-from, --created-to,
--last-updated-from, --last-updated-to

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: support for date ranges for creation and last update

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: support for extra query parameters in cmdline

New command line option: --query-params to specify a set of extra query
parameters to complete or override preset parameters.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: html markup in some preset titles

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary changelog: update and move to docs folder

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: move parameter loading and processing to 'setup'

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: refactor and split into two clases (single, run)

Split the ResultSummary class into a base class and two child classes:
ResultSummarySingle and ResultSummaryLoop (only a stub at this point).

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: WIP initial implementation of the "loop" command

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: huge refactoring

Implement "summary" (single-shot) and "monitor" (loop) modes based on
preset parameters instead of on the command-line main command.

Split the logic into multiple files, move all monitor-specific and
summary-specific code to independent files, common code in a separate
file.

Full of kludges, I don't like how this is looking so far, might consider
reimplementing it without any dependencies on pipeline code.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: fix markup and indentation

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: new generic templates for monitor mode

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: examples for "monitor" and "summary" modes

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary changelog: summary and monitor modes

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: fix generic regression report

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: summary: fix last_updated option handling

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: embed css stylesheet in html files

Signed-off-by: Ricardo Cañuelo <[email protected]>

* regression_tracker: [trivial] make regression active by default

Fixup for commit fcb29501663d78920bcd129bd57c36b9af624bc4
If the "result" field is ever made non-optional in the models we can
probably remove this.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* regression_tracker: [trivial] set default empty node sequence

Fixup for commit fcb29501663d78920bcd129bd57c36b9af624bc4
If the "node_sequence" field is ever made non-optional in the models we
can probably remove this.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: add cmdline option --output-dir

Introduce a new command-line option: --output-dir, and rename the old
--output to --output-file.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary changelog: command-line options change

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: jobs-chromeos: remove meaningless Tast tests

Several Tast tests can only fail in the context of KernelCI:
* `video.PlatformDecoding.v4l2_state*_vp9_0_svc` do not actually exist,
  causing the whole test job to fail
* `platform.DLCService*` and `platform.Memd` rely on features only
  present in the downstream Chrom{e,ium}OS kernel (see b/247467814 and
  b/244479619 for those having access to Google's issue tracker)
* `kernel.ConfigVerify.chromeos` relies on downstream-only config
  options such as `CONFIG_SECURITY_CHROMIUMOS` and other similar ones,
  and therefore can only fail when testing upstream kernels

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: scheduler-chromeos: don't execute non-working Tast tests

Currently, HEVC-related tests are known to either fail or be skipped as
ChromeOS doesn't yet handle hardware decoding of HEVC media. This is
expected to be fixed at some point though, so we're keeping the job
definitions and only remove the corresponding scheduler entries in order
to reinstate those jobs when relevant.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: jobs-chromeos: exclude Tast tests known to always fail

Several decoder tests always fail on all platforms where they're
executed, adding only noise to otherwise useful test results. Disable
those for improving the quality of the results.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: chromeos: add special case for pre-6.7 qcom codec tests

On Qualcomm-based ChromeBooks (`trogdor` being the only model in
Collabora's lab), we noticed systematic failures of all
`vp9_*_frm_resize` and `vp9_*_sub8x8_sf` tests when using a kernel up to
6.6. With 6.7 and above, all of those tests (except one) now pass. It
therefore makes sense to exclude those on pre-6.7 kernels so we don't
report known failures and get rid of some noise.

This involves "duplicating" affected test jobs (although I did my best
to minimize that) and setting rules so only the working variant is
executed, based on the version of the kernel being tested.

Signed-off-by: Arnaud Ferraris <[email protected]>

* lava_callback: Compress the log files to save storage space

As storage space in cloud and egress have high costs,
better to compress potentially large files.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* tests: Add basic yaml validation

Add yaml load to figure out earlier issues with yaml

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: chromeos: drop stoneyridge/pineview naming in platforms anchors

The "stoneyridge" and "pineview" naming used in the Chromebook platform
anchors refers to ChromiumOS specific config fragments, but doesn't
necessarily match the actual platform of all the devices listed.
Use more generic names to distinguish amd and intel Chromebooks.

Signed-off-by: Laura Nao <[email protected]>

* config: chromeos: rename test job anchors that use chromeos specific configs

Rename test job anchors that use chromeos specific kernel configurations
to include the 'chromeos' infix.

Signed-off-by: Laura Nao <[email protected]>

* config: chromeos: add baseline tests

Enable the baseline tests on all the supported Chromebooks with their
default kernel configuration.

Signed-off-by: Laura Nao <[email protected]>

* config: chromeos: drop stoneyridge/pineview naming in job defs

The "stoneyridge" and "pineview" naming used in some Chromebook job
definitions refers to ChromiumOS specific config fragments, but
doesn't necessarily match the actual platforms targeted by the jobs.
Replace all occurrences with more generic intel/amd naming.

Signed-off-by: Laura Nao <[email protected]>

* config: chromeos: drop chromeos infix from baseline jobs

Keeping different job names for tests targeting different kernel configs
might cause too much duplication. Drop the 'chromeos' infix from the job
name for the tests using the chromeos config fragment. Users will be
able to filter the results using the data.defconfig/data.config_full
fields anyway.

Signed-off-by: Laura Nao <[email protected]>

* result_summary: post-process results for summary and monitor modes

Split the post-processing of nodes to a common function that can be used
for both summary and monitor modes. Currently, post-processing involves
only the collection of logs.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: update and fix presets and templates

Signed-off-by: Ricardo Cañuelo <[email protected]>

* doc/result-summary-CHANGELOG: update

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config/pipeline.yaml: enable 'BayLibre' lab

Add lab configuration for BayLibre.

Signed-off-by: Jeny Sadadia <[email protected]>

* docker-compose.yaml: add `lab-baylibre` runtime

Add runtime argument `lab-baylibre` to `scheduler-lava`
container. This will enable the pipeline to run and
submit jobs to BayLibre.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add `baseline-x86-baylibre` job

Add job configuration `baseline-x86-baylibre` for BayLibre.
Add scheduler entry as well.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add `baseline-armel-baylibre` job

Add job configuration `baseline-armel-baylibre` for BayLibre.
Add scheduler entry and platform config as well.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline: enable `android` tree and build configs

Monitor linux `android` tree. Add build configs for `android-mainline`
branch.

Signed-off-by: Helen Koike <[email protected]>

* config/pipeline.yaml: add kbuild definitions for android-mainline

Add kbuild jobs to compile the kernel for android-mainline branch

Signed-off-by: Helen Koike <[email protected]>

* config/pipeline.yaml: add entries to schedule to build android-mainline

Add entries to `scheduler:` section to run the builds for
android-mainline.

Signed-off-by: Helen Koike <[email protected]>

* result_summary: fix node filter in monitor mode

Signed-off-by: Ricardo Cañuelo <[email protected]>

* kernelci.toml: set `checkout` node timeout to `180 min`

Currently set `60 min` timeout is not enough as some
`kbuild` jobs and its sub-tests take around 2 hrs to
complete after getting submitted to runtime.

Here is an example from staging. See the information
for a `checkout` and its child nodes:

| id                       | name                | created                    | updated                    | timeout                    |
|--------------------------|---------------------|----------------------------|----------------------------|----------------------------|
| 661c9d59b60b785eb9fc42b0 | checkout            | 2024-04-15T03:22:01.317000 | 2024-04-15T03:51:03.870000 | 2024-04-15T04:22:01.284000 |
| 661c9d97b60b785eb9fc42b4 | kbuild-gcc-10-arm64 | 2024-04-15T03:23:03.399000 | 2024-04-15T03:50:15.031000 | 2024-04-15T09:23:03.399000 |
| 661ca3f7b60b785eb9fc4ead | baseline-arm64      | 2024-04-15T03:50:15.304000 | 2024-04-15T05:09:45.247000 | 2024-04-15T09:50:15.304000 |

Signed-off-by: Jeny Sadadia <[email protected]>

* result_summary: add email report capabilities for monitor mode

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: plain text single report templates

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: chromeos: add baseline-nfs tests

Enable the baseline-nfs tests on all the supported Chromebooks, with
both the default and the chromeos kernel configurations.

Signed-off-by: Laura Nao <[email protected]>

* src/timeout: set `checkout` result

For `TIMEOUT` mode, set `checkout` node result to `fail`
if its state is `running` as it means code checkout is still
going on and node timed-out. Set it to `pass` if its state
is any other than `running`.
Set `checkout` node result to `pass` if mode is `DONE` as
it means once `checkout` has been in `available` or `closing`
state and it could successfully complete source code checkout.

Signed-off-by: Jeny Sadadia <[email protected]>

* regression_tracker: bugfix, failed test with no prior runs

Handle the case of a failed test run when it's the first occurence of
that test case. Consider it "not a regression" for now, since we're
defining a regression as a "breaking point" between a success and a
failure.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: platforms-chromeos: fix dalboz device type

Due due to a copy/paste mishap, the device type for
`asus-CM1400CXA-dalboz` had a trailing `_chromeos`, leading LAVA to fail
finding the correct device type, and no job from the new system running
on this platform.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: jobs-chromes: run Tast tests only on 5.4+

Current ChromeOS images have `ext4` filesystems using options not
present in 4.19. Therefore tests cannot run on kernels that old, and
this leads to false positives in corrupt device identification, so we
should only run those tests on 5.4 and later kernels.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: platforms-chromes: drop non-existent platform

`hp-x360-12b-ca0500na-n4000-octopus` isn't a device type available in
Collabora's LAVA lab, so let's drop its definition.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: exclude android tree from kbuild jobs

Only Android-specific kbuild jobs should run for this tree, let's not
overload our system with unneeded builds.

Take this opportunity to limit mediatek kbuilds to 6.1+ as that's the
earliest version that has upstream support for at least one of our
devices.

Signed-off-by: Arnaud Ferraris <[email protected]>

* src/timeout: a bug fix in `_submit_lapsed_nodes`

Fix a glitch in the code related to setting `checkout`
node result.

Fixes: 361fc0d ("src/timeout: set `checkout` result")
Signed-off-by: Jeny Sadadia <[email protected]>

* pipeline.yaml: Update early access FQDN

We are moving k8s from eastus to westus3 as it is cheaper

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* src/tarball: fix `_kdir` in `update_repo`

Fix the below error:
```
kernelci-pipeline-tarball |   File "/home/kernelci/./pipeline/tarball.py", line 79, in _update_repo
kernelci-pipeline-tarball |     kernelci.shell_cmd(f"rm -rf {self._kdir}")
kernelci-pipeline-tarball |                                  ^^^^^^^^^^
kernelci-pipeline-tarball | AttributeError: 'Tarball' object has no attribute '_kdir'
```

Fixes: 0a2fe9c ("src/patchset.py: Implement Patchset service)
Signed-off-by: Jeny Sadadia <[email protected]>

* src/timeout: fix method to get child nodes recursively

`TimeoutService._get_child_nodes_recursive` is used to get
pending child nodes recursively for closing and timed-out
nodes. It overwrites the result while being called recursively.
Fix the method to make it work properly.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: pipeline: rename "armel" arch to "arm"

`armel` has various meanings depending on the system: for ChromeOS, it
is ARMv7, while in Debian it's ARMv{5T,6}. Moreover, this project is
*Kernel*CI and the kernel uses `arm` for all 32-bits ARM devices. In
order to avoid confusion (including those wondering what the heck does
`armel` mean), let's rename `armel` to `arm`.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: use per-system arch property where relevant

With the new `*arch` fields present in the platform configurations, we
don't have to hardcode the architecture strings in some specific cases.
Let's adapt the config files so we use `{cros,deb,k}arch` wherever it
makes sense.

Signed-off-by: Arnaud Ferraris <[email protected]>

* src/timeout: set timed-out `checkout` result

Set timed-out `checkout` node result to `incomplete`
while in `running` state. As it denotes that the node
timed-out while checkout was still going on.
Also, set error related information i.e. `error_code`
and `error_msg`.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/tarball: update checkout node when update repo fails

Tarball updates source code repo and creates tarball.
If update repo operation fails even with second attempt,
it means it failed to checkout souce code.
Hence, update `checkout` node with state `done` state and
result `fail`. Also, set appropriate error information
to the `data` field.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: pipeline: enable collabora-next tree and build config

Monitor the collabora-next tree. Add build config for the for-kernelci
branch.

Signed-off-by: Laura Nao <[email protected]>

* config: chromeos: enable acpi kselftest on collabora-next tree

Run the ACPI kselftest on the for-kernelci branch of the collabora-next
tree.

See: https://lore.kernel.org/linux-kselftest/[email protected]/T/#t

Signed-off-by: Laura Nao <[email protected]>

* result_summary: restore missing split_query_params function

Restore this function that was accidentally removed during the last
refactoring.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* lava_callback: Don't upload empty files to Azure

There is no use for lot of empty files on Azure,
that only complicate cleanup.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result_summary presets: unify preset and output names

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: update preset for aferraris

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: new presets for laura.nao

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: fixes and new presets for nfraprado

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: fix arch query parameters

Signed-off-by: Ricardo Cañuelo <[email protected]>

* k8s: Lot of deployment tested fixes

Fixes in yaml files for k8s production deployment.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result-summary presets: Fix build failure and regression monitors

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* result_summary: added debug traces to the monitor

Show detailed info of the node filterings in real time.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: fix corner case bug when no logs are found

Cover rare case where neither the node nor any of its parents up to the
checkout node have any log artifacts.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: refine stable-rc presets

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: add regression info to test reports

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: escape log snippets

Signed-off-by: Ricardo Cañuelo <[email protected]>

* src: lava_callback: add device ID to node data

It can be useful to know the exact device on which a job ran, without
having to open the LAVA job page. This is done by querying the device ID
from the callback data and appending it to the node data.

Signed-off-by: Arnaud Ferraris <[email protected]>

* src: lava_callback: upload raw callback data as well

Debugging callback issues is complex due to the raw data not being saved
after processing. This change ensures we save the callback data as a
JSON file in order to ease development.

Signed-off-by: Arnaud Ferraris <[email protected]>

* DONOTMERGE lava_callback: add debug statements

Why the heck doesn't this just work???

Signed-off-by: Arnaud Ferraris <[email protected]>

* result_summary_templates: fix error 'node' is undefined

The object is named test and not node, so s/node/test

Signed-off-by: Helen Koike <[email protected]>

* config/runtime/kunit: set architecture info

Set architecture field for `kunit` test
nodes.
If no `arch` argument is supplied, kunit takes
`um` (User Mode Linux) as architecture to run
tests.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/timeout: count running child jobs of build nodes

Add a method to count running jobs of `kbuild`
nodes i.e. jobs being submitted after successful
builds. Fox example `baseline` or `tast` jobs.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/timeout: handle closing `checkout` node differently

Usually, `checkout` should be transited to `done` state
when all its child nodes are completed.
In case of closing `checkout`, take into account
running child jobs of build nodes before transiting
its state to `done`. Otherwise, `checkout` will be
assigned to `done` state even if some child jobs are still
running.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/timeout: handle holdoff reached `checkout` node differently

Usually, available `checkout` for which holdoff is
reached should be transited to `done` state only when
all its child nodes are completed.
In case of such `checkout` node, take into account
running child jobs of build nodes before transiting
its state to `done`. Otherwise, `checkout` will be
assigned to `done` state even if some child jobs are
still running.

Signed-off-by: Jeny Sadadia <[email protected]>

* Revert "DONOTMERGE lava_callback: add debug statements"

This reverts commit 5ed8218d99840373bbba5830b1976813b52bf4b1.

Signed-off-by: Arnaud Ferraris <[email protected]>

* Create dependabot.yml

* result_summary_templates: make generic-test-failures generic to all
results

The generic-test-failures templates can be used to show general results
just replacing the name "failures" by "results". Makeing it easier to be
re-used by communities that want to have pre-sets to list all results of
the tests, so:

	s/generic-test-failures/generic-test-results

Signed-off-by: Helen Koike <[email protected]>

* result-summary.yaml: add preset to list android build tests

Since we now build android, add a preset to allow result-summary.yaml to
list all build results from Android tree.

Signed-off-by: Helen Koike <[email protected]>

* tarball: Implement checkout for specific commit

We often need not ToT, but specific commit, implement this.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* jobs-chromeos.yaml: Disable module compression for every kernel version

Commit d4bbe942098b ("kbuild: remove CONFIG_MODULE_COMPRESS"),
introduced in kernel v5.13, substituted CONFIG_MODULE_COMPRESS=n for
CONFIG_MODULE_COMPRESS_NONE=y as the way to disable module compression.
Since module compression causes "Invalid ELF header magic: != ELF"
errors during boot on the ChromeOS base config, add the missing config
to disable module compression on kernels > v5.13 as well.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* src: lava_callback: reduce callback data size

The callback data is quite large, especially as it includes the full log
which we already upload separately. By dropping it and compressing the
whole file with `gzip` we can avoid wasting too much storage space.

Signed-off-by: Arnaud Ferraris <[email protected]>

* src: lava_callback: don't leak secret token

The callback data contains the secret tokens value which shouldn't be
leaked. Ensure we drop it from the uploaded data.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: platforms-chromeos: use new cros-flash image

This ensures we use the new version of the `install-modules` script.

Signed-off-by: Arnaud Ferraris <[email protected]>

* src: regression_tracker: add the "device" field to regression data

This can be helpful. We're not using it as a search param though, as we
don't want to narrow down the search that much, using the platform only
is better.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: result_summary_templates: report device used for job

This information is now available, and it can be useful to know the
affected device withouth having to look at the LAVA job details.

Signed-off-by: Arnaud Ferraris <[email protected]>

* kubernetes: Update deployment recipe

Update list of labs and add KCI_INSTANCE variable.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* lava-callback: Limit threads of lava-callback

Due inrush of lava callbacks and slow Azure Files
processing, we need to make sure we dont spawn too many
threads.
Also add hard limit of memory 1Gbyte

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result_summary presets: add presetes for fluster test

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Make template generic for all v4l2 tests
- Rebase on main

* result_summary presets: make the name of fluster test generic

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: enable first fluster test for mt8195-cherry-tomato-r2

Enable first fluster test, AV1-TEST-VECTORS for mt8195-cherry-tomato-r2.
Run the test on mainline and next until more trees are added.

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Create generic v4l2-decoder-conformance-job and use anchers from it
- Update the rootfs address
- Move anchor to _anchor
- Update with nitpicks

* config: jobs-chromeos: Add kernelci tree for testing purpose

Remove this commit before merging.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: chromeos: Enable cpufreq kselftest

Enable cpufreq kselftest on all the trees and branches.

Signed-off-by: Shreeya Patel <[email protected]>

* result_summary presets: fix preset for kselftest-dt failures monitor

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: new presets for kselftest-cpufreq

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: mt8195-cherry-tomato-r2: enable all fluster tests for all branches

Add all the trees and branches on which the tests would be ran. Enable
all the tests for tomato.

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- The build config cannot be added yet. Just list the trees, it will only use
  the branches configured in build_configs:
  - mainline will use master
  - next will use master
  - collabora-chromeos-kernel will use for-kernelci
  - media will use master and fixes
- Remove kernelci tree as it was added just for testing purpose

* config: mt8183-kukui-jacuzzi-juniper-sku16: enable add all supported fluster tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>

jacuzzi

* config: mt8186-corsola-steelix-sku131072: enable add all supported fluster tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: mt8192-asurada-spherion-r0: enable add all supported fluster tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Don't specify the platforms manually as they are already mentioned in
  test-job-arm64-mediatek

* config: sc7180-trogdor-kingoftown/lazor-limozeen: enable add all supported fluster tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Use test-job-arm64-qualcomm instead and carete separate jobs for
  qualcomm devices
- Don't specify platforms manually as they are already mentioned in
  test-job-arm64-qualcomm

* build(deps): bump uwsgi from 2.0.21 to 2.0.22 in /docker/lava-callback

Bumps [uwsgi](https://uwsgi-docs.readthedocs.io/en/latest/) from 2.0.21 to 2.0.22.

---
updated-dependencies:
- dependency-name: uwsgi
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* pipeline.yaml: Add stable-rc build variants

Add more build variants for stable-rc tree to match legacy system.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result_summary: add error classification

Classify errors according to patterns in the logs

Signed-off-by: Helen Koike <[email protected]>

* result_summary presets: add collabora-chromeos-kernel and media trees for fluster tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: Use media-stage instead of media-tree

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config/pipeline: enable android branches from legacy

Enable all android branches from the legacy system

Signed-off-by: Helen Koike <[email protected]>

* trigger: Add exclude/include tree list for trigger

As we need to restrict list of running kernels on staging,
we need to add option allowing that.
Also it will be good to exclude staging kernels from production
kernel list.

So in case of staging we need to run kernels only from tree "kernelci"
and sometimes something else, for example "mediatek".
Option will look like:

--trees kernelci,mediatek
or
--trees kernelci

On production we need to exclude trees kernelci and buggytree:
--trees !kernelci,buggytree
or just kernelci:
--trees !kernelci

Purpose of this option is that our compiling capacity is limited,
and right now staging and production both compiling very large set
of kernels, we need to reduce this amount to drop costs.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: platforms-chromeos: use CrOS R124 files

ChromeBooks were upgraded with a new image based on ChromiumOS R124, so
we must use those files now.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: jobs-chromeos: drop non-existent Tast tests

Those were removed between R120 and R124 and therefore cause test
failures with the new images.

Signed-off-by: Arnaud Ferraris <[email protected]>

* result_summary presets: fix acpi kselftest presets

We're interested in catching regressions and failures in the both the
kselftest-acpi test suites and its test cases. Match the nodes by group
in the presets accordingly.
Fix template used by the failure monitor preset.

Signed-off-by: Laura Nao <[email protected]>

* src: update return values of `APIHelper.receive_event_node`

`APIHelper.receive_event_node` method is used to receive
node data from PubSub event. The method has been updated
to return `is_hierarchy` flag as well which represents
events related to node hierarchy.
Update pipeline services using the method accordingly.

Signed-off-by: Jeny Sadadia <[email protected]>

* result_summary presets: refine presets for v4l2-decoder-conformance

Modify the regression preset to monitor regressions on both the
v4l2-decoder-conformance test suites and its test cases, by matching the
nodes by group instead of by name.
Also, change the failure preset to monitor for all errors caused by
runtime errors.

Signed-off-by: Laura Nao <[email protected]>

* result_summary presets: add summary presets for v4l2-decoder-conformance

Add summary presets to fetch regressions and failures on
v4l2-decoder-conformance tests. Two of the presets are the same used by
the monitor; add one additional preset to fetch all the failures on
both the test suites and their test cases.

Signed-off-by: Laura Nao <[email protected]>

* lava_callback.py: Remove error_code/error_msg on lava-callback

Sometimes due congestion node might be set to timeout, but
then result might arrive late and we need to use it properly.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result_summary presets: fix dt kselftest presets

Fix the dt kselftest preset, just like was done for the acpi one, as the
current preset doesn't match the actual results we're interested in.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* doc/connecting-lab: refine documentation

Refine documentation for connecting LAVA labs
and submitting jobs to the lab.

Signed-off-by: Jeny Sadadia <[email protected]>

* lava_callback: Sometimes we get totally invalid log file uploaded

Most likely problems lays in threading of flask, and possibly
callbacks are getting mixed. This commit attempts to introduce
several countermeasures against that.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* doc: add `_index.md` page

Add index documentation page.

Signed-off-by: Jeny Sadadia <[email protected]>

* doc: add `pipeline-details` page

Move `pipeline-details` documentation from the API
repository to this repo to make it close to the source.

Signed-off-by: Jeny Sadadia <[email protected]>

* doc/connecting-lab: adjust `weight` property

Change `weight` property of existing doc page to
accommodate with transition of pipeline related docs
to pipeline repo.

Signed-off-by: Jeny Sadadia <[email protected]>

* doc: add `developer-documentation` page

Add developer manual documentation.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add lab config for Qualcomm

Add an entry to `runtimes` section for Qualcomm
lab configurations.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add `baseline-x86` job for qualcomm

Add job configuration `baseline-x86-qualcomm` for
running baseline job in Qualcomm LAVA lab.
Add scheduler entry as well.

Signed-off-by: Jeny Sadadia <[email protected]>

* docker-compose.yaml: add lab-qualcomm runtime

Add runtime argument `lab-qualcomm` to `scheduler-lava`
container. This will enable the pipeline to run and
submit jobs to Qualcomm LAVA lab.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add `baseline-arm64` job for qualcomm

Add job configuration `baseline-arm64-qualcomm` for
running baseline job for `arm64` in Qualcomm LAVA lab.
Add scheduler entry as well.

Signed-off-by: Jeny Sadadia <[email protected]>

* pipeline.yaml: Update RISC-V configs

1)rv32 defconfig doesn't exist, remove
2)nommu_k210_defconfig have modules disabled

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* lava_callback.py: Sanitize lava log data

As we use this data in reports, lets remove all
non-printable characters as they confuse grafana, browsers and others.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config/runtime/kunit.jinja2: fix result map

Fix result map for skipped tests. Initially, API
didn't have `skip` available node result in the schema.
That's why it was mapped to `None` result. But now API
has `skip` result to denote skipped tests.
Fix the result mapping accordingly.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: jobs-chromeos: Add lab-setup fragment

Add the lab-setup fragment to the chromebook builds, which contains the
architecture independent kernel configs needed to run tests on the
platform. Notably this disables IP autoconfig by the kernel.

The result of this change is that the 12 seconds boot delay and the
consequent deferred probe pending warnings will no longer happen on any
platform. Particularly on mt8186-corsola-steelix-sku131072 (due to a
different network adapter being used) on which it was still happening.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* lava_callback: bump up slightly threads number

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: chromeos: enable watchdog reset test on Chromebooks

Add a basic test to verify watchdog reset functionality. Enable the
test on all ARM64 and AMD x86_64 Chromebooks. For Intel
Chromebooks, enable the test only on octopus, as ACPI PM Timer on the
other devices has been disabled in coreboot.

Signed-off-by: Laura Nao <[email protected]>

* src/send_kcidb: use schema version 4.3

Test status `MISS` was added to KCIDB in schema
v4.2 and supported by the latest version i.e. v4.3.
Hence, use the latest version for submission as
API may send a few tests with "MISS" status.

Signed-off-by: Jeny Sadadia <[email protected]>

* send_kcidb: re-structure code for parsing checkout node

Move code for parsing checkout node to a separate
method.
Add `valid` field to parsed checkout node. It denotes
if source code was successfully checked out.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: print more information on invalid data

Print details for invalid revision data for the
sake of debugging.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: optimize `kcidb` import

Remove redundant `kcidb` import and adjust
kcidb Client call accordingly.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: remove keys with `None` values

KCIDB doesn't allow `None` as field value.
Remove all optional fields with `None` value
to make it valid data for submitting to KCIDB.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: add `kcidb_test_suite` property

Every KernelCI test will be mapped to a unified
test suite for KCIDB data submission.
Add `kcidb_test_suite` property to test job
definitions in YAML configuration files.
The added property will store the mapped
KCIDB test suite name.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: parse and submit node test and build data

Listen to all the node events with node state
`done` or `available` and submit the node to KCIDB.
Parse node received from the event and create KCIDB
schema compatible object based on type of the node
i.e. checkout, build or test.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: set `log_excerpt` for builds and tests

Fetch logs from compressed log file(*.log.gz) URL
and send last 16*1024 characters for setting `log_excerpt`
field for build and test nodes as it is the max allowed
length of the KCIDB field.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/jobs-chromes: add kcidb test suite property for watchdog test

Add KCIDB test suite mapping for `watchdog_reset` test.

Signed-off-by: Jeny Sadadia <[email protected]>

* lava_callback.py: disable log removal from callback data

We need it for investigations if we have any critical data
loss during log sanitizing.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* src/send_kcidb: add error info to build nodes

Add error metadata fields such as `error_code` and
`error_msg` to `misc` field for build nodes.

Signed-off-by: Jeny Sadadia <[email protected]>

* result_summary presets: add watchdog-reset presets for mainline/next

Add monitor and summary presets to track the results from the watchdog
reset test on the mainline and next trees.

Signed-off-by: Laura Nao <[email protected]>

* pipeline.yaml: Fix fluster rootfs URL

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* src/send_kcidb: get error metadata for failed/incomplete tests

Tweak condition to get error metadata for test nodes.
It should get error info for incomplete nodes as well
and not just failed nodes.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: send tests only if KCIDB test mapping exists

All test suite definitions must have `kcidb_test_suite`
property i.e. KCIDB test suite mapping.
Only send tests for those the mapping is found.

Signed-off-by: Jeny Sadadia <[email protected]>

* tests/validate_yaml: add validation for KCIDB mapping

To submit KernelCI generated data to KCIDB, it is required
to have a mapping for all the job definition with
`kcidb_test_suite` property.
Add validation to ensure all the jobs have a mapping
present to avoid missing data submission.
This check is to notify test authors trying to enable tests
in maestro to include the required property for the mapping
in their definition.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add qcs6490-rb3gen2 boot test

Signed-off-by: Milosz Wasilewski <[email protected]>

* config: chromeos: Enable kselftest-dt on Qualcomm platforms

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* pipeline.yaml: Add one um build for android trees

As per request of Android team it will be good to check for breakages
UM builds as well.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: use `kind=job` for test suites

As part of re-structuring test hierarachy, `Job` model
has been introduced for test suite/job nodes.
It uses node kind `job`.
Update test configurations in `pipeline.yaml` and
`jobs-chromeos.yaml` to use `kind=job` to
generate job nodes.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/runtime/kunit.jinja2: provide `kind` value for child tests

In case of submitting test hierarchy, child nodes by default
inherit `kind` value from parent node.
As we are re-structuring test hierarchy, test suit/job nodes
will have `kind=job` where its child test nodes will have
`kind=test`. Provide `kind` field explicitly to test result
hierarchy to preserve different kind value than the parent
node.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/runtime/kunit.jinja2: fix `NameError`

Fix the below error in `_submit` method:
```
Traceback (most recent call last):
  File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 287, in main
    job.submit(results)
  File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 138, in submit
    self._submit(result)
  File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 265, in _submit
    return node
NameError: name 'node' is not defined
```

Signed-off-by: Jeny Sadadia <[email protected]>

* config/runtime/kunit.jinja2: evaluate job node result

Evaluate job node result from child node results if
`null` result is receive from test result parser.
For example nodes such as `fortify`:
https://staging.kernelci.org:9000/viewer?node_id=6670ab43d0b7694b399897c4

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: fix parsing of KUnit log file

Handle both compressed(gzip) and plain text log files
for getting log excerpt.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: HTTP exception handling for log excerpt

Add HTTP exception handling for getting
log excerpt data.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: platforms-chromeos: Add serial delay for some Mediatek platforms

Add test_character_delay to the Spherion, Tomato and Steelix platforms
to workaround the fact that they're sometimes unable to process serial
input fast enough, resulting in mangled commands and consequently flaky
test results, as described in
https://github.com/kernelci/kernelci-project/issues/366.

The right place to do this change would be in the device-type template
as described in LAVA's documentation [1]. This overriding in KernelCI is
meant only as a temporary workaround to verify whether this fixes the
issue. If it does, then we'll do it in LAVA upstream instead.

[1] https://docs.lavasoftware.org/lava/debugging.html#differences-in-input-speeds
Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* config: chromeos: Enable error-logs kselftest for MediaTek Chromebooks

Run the error-logs kselftest on MediaTek Chromebooks. This test is
currently under review upstream [1] so, in the meantime, it has been
added to the collabora-next tree so it can prove its value by helping to
detect issues upstream.

[1] https://lore.kernel.org/all/[email protected]

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* config/pipeline.yaml: enable CIP lab

Add configuration for LAVA CIP lab.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add baseline-x86 test for CIP

Add `baseline-x86-cip` test to be submitted to CIP
LAVA lab.

Signed-off-by: Jeny Sadadia <[email protected]>

* docker-compose.yaml: add `lab-cip` runtime

Add runtime argument `lab-cip` to `scheduler-lava`
container. This will enable the pipeline to run and
submit jobs to CIP LAVA lab.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: enable `job` node submission to KCIDB

Parse newly added job node and its child tests
for KCIDB submission.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: don't submit `setup` test suite nodes

`setup` test suite has been introduced to store test results
for environment setup checks before running actual test suite.
KCIDB doesn't require `setup` test suite result as long as
main test job result is submitted.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: add a check before sending data

Check if parsed data is available before
sending revision data to KCIDB.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: fix logs

Fix log statement about submitting node to KCIDB
as we are not sending all the nodes we receive
event for to KCIDB.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: handle skipped tests

Do not retrieve artifacts or metadata from parent
node for skipped tests as in pratice only kernel
revision, test runtime and platform will be
available for skipped tests.

Signed-off-by: Jeny Sadadia <[email protected]>

* result_summary/utils: ignore failures on log retrieval

Make the script continue running if there was an error fetching a test
log.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* doc/developer-documentation: add docs for enabling new tests

Add developer documentation for enabling new tests.

Signed-off-by: Jeny Sadadia <[email protected]>

* Fix links after docs page migration

Documentation has been migrated to the "docs.*" subdomain.

Signed-off-by: Paweł Wieczorek <[email protected]>

* pipeline.yaml: Add kcidebug fragment

Add useful low-overhead debug option to kernel,
and test on most x86 boards we have available,
with minimal baseline tests.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* configs: update gcc-10 to gcc-12

As we upgrade compiler images, we need update gcc version

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* regression_tracker: workaround: match node paths programatically

Don't use 'path' as an api search parameter. The use of lists as query
parameters (path is a list) is undefined. Instead, do the filtering in
code.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: remove qemu jobs from lab-qualcomm

QEMU jobs use container pulled from hub.docker.com. After the lab move
pulling from this registry is no longer possible at Qualcomm. This patch
disables QEMU jobs from Qualcomm lab.

Signed-off-by: Milosz Wasilewski <[email protected]>

* validate_yaml.py: Improve pipeline validation

Add validation that scheduler entries have matching job entry,
this is critical validation, and job entries have at least
one entry in the scheduler.
Fix one entry detected by this validation

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* pipeline.yaml: Add broonie(Mark Brown) trees to pipeline

It is time to enable even more trees.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* validate_yaml.py: Add additional verification for duplicate keys

We might have redefined same keys in different yaml files,
this tool will ensure consistency of this entries.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* validate_yaml.py: Remove path separator

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* validate_yaml.py: Rename variable to schedules

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config/kernelci.toml: update KCIDB origin name

As we agreed to refer new KernelCI API & Pipeline as
"maestro", use the new name while submitting data
to KCIDB.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: update KCI result mapping with KCIDB status

Update evaluation of KCIDB status from KCI result.

Create 2 categories for error codes:
1. When pre-check tests completed but actual test suite
coudln't run - this will have `MISS` status
2. When pre-check tests completed, actual test suite could
run but somehow couldn't complete - this will have `ERROR` status

Some LAVA error codes can occur at any point of execution
such as `Cancelled` and `Test`.
Listed such error codes to the most relevant category
based on analysis of available results.

Signed-off-by: Jeny Sadadia <[email protected]>

* result_summary presets: fix presets for v4l2-decoder-conformance

Following recent updates to data representation on KernelCI nodes,
the top-level nodes for tests now have their kind set to 'job' instead
of  'test'. Update the presets for v4l2-decoder-conformance tests
accordingly.

Signed-off-by: Laura Nao <[email protected]>

* result_summary presets: fix output file name in kselftest-acpi preset

Signed-off-by: Laura Nao <[email protected]>

* config: enable dmabuf-heaps, exec and iommu kselftest suites

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Add kcidb_test_suite

* config: result-summary: add generic rule to monitor failures and regression

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: Add rt-stable builds

Copy rt-stable builds from legacy KernelCI.

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Major changes to move to new way of writing kbuild jobs

* config: pipeline: Add v6.6-rt branch for builds

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: result-summary: add rt-stable kbuilds presets

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: chromeos: Add 'nfs' suffix to KCIDB suite name for baseline-nfs

The baseline test is currently run with both ramdisk and nfs rootfs. To
distinguish baseline-nfs tests in KCIDB, add an 'nfs' suffix to the KCIDB
test suite name.

Signed-off-by: Laura Nao <[email protected]>

* aks: Add kubernetes kcidb deployment

We need file that will manage deployment of kcidb bridge
in kubernetes production deployment.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* kubernetes: Adjust trigger k8s options

Ignore kernelci tree on production, as it is special
"staging"-only tree, and read all /config directory, not just default
pipeline.yaml.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* regression_tracker: bugfix: catch empty search condition

Fix _get_last_matching_node(), after the previous change there was an
unhandled scenario where nodes may be empty but the function wouldn't
return None immediately.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: pipeline: correct the kind of kselftest suites to job

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* scheduler-chromeos.yaml: Temporarily disable non-essential tast tests

As per discussion, we disable temporary tast tests which unlikely
will be reviewed.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* k8s/aks: Update deployment files

1)Update memory limit, as working with linux sources might require 3Gbyte of RAM.
2)Update config file path
3)Add callback environment variable
4)Update image reference to fresh one

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: pipeline: enable android builds with gcc-12 for all architectures

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: enable android builds with clang-17 for all architectures

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: remove build_variants from android build_configs

The build_variants is legacy way to specify the different variants. We
have moved to the newer way to specify the variants. Hence remove the
build_variants from android build_configs.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: add android15-6.6-lts branch for build as well

The android15-6.6-lts has been included recently in legacy KernelCI:
https://github.com/kernelci/kernelci-core/pull/2597

Add the same in newer KernelCI.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: add blocklist for riscv older kernels for android builds

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: update KCIDB test suite mapping for baseline

Use `boot` as KCIDB test suite mapping for all
baseline tests.

Signed-off-by: Jeny Sadadia <[email protected]>

* callback_url: Update config and README

As we are moving callback URL to environment variable,
updating config and README accordingly.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: pipeline: enable android baseline (boot) testing for arm and arm64 in only allmodconfig

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* scheduler.py: If event have jobfilter, inject it to the node data

When someone generate artificial event with jobfilter, this is
likely maintainer trying to repeat job. Treat this accordingly,
and inject job filter to job node, so we will run only tests
maintainer wants.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* lava_callback: migrate to fastapi

It will be easier to maintain API and Pipeline, as
both will be powered by FastAPI framework.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: chromeos: Update fluster rootfs URL

Signed-off-by: Laura Nao <[email protected]>

* config: pipeline: fix defconfigs in fragments

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* kbuild.jinja2: support defconfig as list or str

As required in https://github.com/kernelci/kernelci-core/pull/2608
defconfig might be two types. Support it in jinja2 accordingly.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: piepline: add kbuilds of lee-mfd with default defconfigs

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: enable baseline testing for mfd for one board of each arch

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: fix platform sections for Qualcomm and Android schedules

Signed-off-by: Paweł Wieczorek <[email protected]>

* k8s: Update deployment to uvicorn, as we use fastapi now

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: pipeline: Unblock android runs on lava-collabora

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* pipeline: Enable preempt-rt cyclictest test

Enable the first preempt-rt test, cyclictest in new KernelCI. Enable it
on all platforms.

Since these are all smoke test there is no point in running them too
long. Thus reduce the runtime per test to one minute. This should keep
the total preempt-rt runtime roughly in the same time frame.

The changes have been ported from Daniel's PR [1].

[1] https://github.com/kernelci/kernelci-core/pull/2397

Signed-off-by: Daniel Wagner <[email protected]>
Co-developed-by: Muhammad Usama Anjum <[email protected]>
Signed-off-by: Muhammad Usama Anjum <[email protected]>

* pipeline: add all the test jobs for all rt-test

Add jobs definition of all the rt-tests. Enable cyclicdeadline and rtla
tests to run on all targets.

The changes have been ported from Daniel's PR [1].

[1] https://github.com/kernelci/kernelci-core/pull/2397

Signed-off-by: Daniel Wagner <[email protected]>
Co-developed-by: Muhammad Usama Anjum <[email protected]>
Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: add template and test properties for preempt_rt jobs

Add template, job add kcidb_test_suite properties for all preempt-rt jobs

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: rename preempt-rt to rt-tests which is correct name of tests

The legacy was using preempt-rt name of tests. But the repository has
rt-tests name. We must use the same name to merge with execution results
coming from other CIs in KCIDB.

Suggested-by: Jeny Sadadia <[email protected]>
Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: add the correct nfsroot for rt-tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: Remove android's deprecated branches

It has been confirmed with Todd that we should remove the deprecated
branches. Hence remove those branches.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: run baseline on non-allmodconfig

The allmodconfig generates very large kernel image. It cannot be booted
on the arm64 and arm targets as tftp errors out that size is too large.
Reduce the kernel image size. Use the default defconfig. The same
defconfigs have been booting for other trees.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* doc: developer-documentation: Update documentation by adding more details

- Reorganize some things
- Specify how to write different variants by removing old syntax
- Give two separate templates for kbuild and test
- Try to put more details for new contributors

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes since v1:
- Fix type
- Apply suggestions from code review

* doc/developer-documentation: fix a glitch in enabling new tree section

Fix a minor bug in YAML block formatting.

Fixes: f5f57de ("doc: developer-documentation: Update documentation by adding more details")
Signed-off-by: Jeny Sadadia <[email protected]>

* doc/developer-documentation: update a section title

Rename a section from "Enabling a new Kernel tree" to
"Enabling new KernelCI trees, builds, and tests" as it explains
enabling tests as well.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: use the new `tree:branch` format for rules

For cases where we want a single branch to be allowed for a given tree,
we can now use the `tree:branch` format in rules. Convert existing rules
accordingly.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: pipeline: fix improper use of "filters" attribute

The `filters` param was used in the legacy system but has been replaced
by `rules`, with a different syntax.

For Android RISC-V builds, this was used to deny job execution on
kernels < 4.19, so let's translate this condition with the rules format,
and do a similar change for the `rt-tests`-based jobs.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config/pipeline.yaml: Fix x86 typo in kcidebug job names

The kcidebug jobs that run on MediaTek and Qualcomm platforms should
have arm64 in the name rather than x86. Fix the typo.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* config: pipeline: remove params

The parameters are only needed when they are changed or appeneded.
Remvoe the parameters which aren't being modified.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* validate_yaml.py: Jobs are required to have template parameter

Add more validation to config files of mandatory parameters.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* validate_yaml.py: Add more job validations

Add basic validation, each job must have kind parameter

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* workflows: Add label on CI check failures

Automatically add label so broken PR wont go to staging

Signed-off-by: Denys Fedoryshchenko <[email protected]>

---------

Signed-off-by: Jeny Sadadia <[email protected]>
Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
Signed-off-by: Denys Fedoryshchenko <[email protected]>
Signed-off-by: Ricardo Cañuelo <[email protected]>
Signed-off-by: Helen Koike <[email protected]>
Signed-off-by: Arnaud Ferraris <[email protected]>
Signed-off-by: Laura Nao <[email protected]>
Signed-off-by: Muhammad Usama Anjum <[email protected]>
Signed-off-by: Shreeya Patel <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Milosz Wasilewski <[email protected]>
Signed-off-by: Paweł Wieczorek <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
Co-authored-by: Jeny Sadadia <[email protected]>
Co-authored-by: Nícolas F. R. A. Prado <[email protected]>
Co-authored-by: Ricardo Cañuelo <[email protected]>
Co-authored-by: Helen Koike <[email protected]>
Co-authored-by: Arnaud Ferraris <[email protected]>
Co-authored-by: Laura Nao <[email protected]>
Co-authored-by: Muhammad Usama Anjum <[email protected]>
Co-authored-by: Shreeya Patel <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Milosz Wasilewski <[email protected]>
Co-authored-by: Paweł Wieczorek <[email protected]>
Co-authored-by: Milosz Wasilewski <[email protected]>
Co-authored-by: Daniel Wagner <[email protected]>
Signed-off-by: Denys Fedoryshchenko <[email protected]>
nuclearcat added a commit to nuclearcat/kernelci-pipeline that referenced this issue Jul 24, 2024
* src/scheduler: store error message when job fails with "submit_error"

It is helpful for debugging to catch error message when
scheduler fails to submit job to runtime.
Store the error message to `data.error_msg` field.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: pipeline: Set minimum kernel version for DT kselftest to 6.7

The test was introduced upstream in version 6.7, so no point in trying
to run it on earlier versions.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* configs/: Update volteer device

Update volteer devices according lab availability

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result_summary templates: detailed output for active/inactive regressions

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: new presets for active regressions

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: update CHANGELOG

Signed-off-by: Ricardo Cañuelo <[email protected]>

* data: chmod -R 777 ./data/output to avoid permission error

Avoid errors like

PermissionError: [Errno 13] Permission denied: '/home/kernelci/data/output/stable-rc-boot.html'

Signed-off-by: Helen Koike <[email protected]>

* result_summary: move code to _get_logs

Signed-off-by: Helen Koike <[email protected]>

* result_summary: use ThreadPoolExecutor to fetch logs

Fetching logs is the bottleneck of the script. Fetch them in parallel
with ThreadPoolExecutor.

Signed-off-by: Helen Koike <[email protected]>

* result_summary: fix result presets

stable-rc-build-failures and stable-rc-boot-failures weren't querying
specifically for test failures.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* src/regression_tracker: rework regression detection

Take into account "active" and "inactive" regressions when creating them
and when processing new passed or failed nodes.

When a node passes, it checks if it "inactivates" an existing "active"
regression. When a node fails, it checks if it needs to create a new
regression or update an existing "active" one.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* src/regression_tracker: link failed nodes to active regressions

When a failed node generates a regression, or when it's a re-run of a
run that generated a still active regression, link the node to the
regression id.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: support for date ranges for creation and update

New command line options to let the user specify date ranges for node
creation and last update: --created-from, --created-to,
--last-updated-from, --last-updated-to

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: support for date ranges for creation and last update

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: support for extra query parameters in cmdline

New command line option: --query-params to specify a set of extra query
parameters to complete or override preset parameters.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: html markup in some preset titles

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary changelog: update and move to docs folder

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: move parameter loading and processing to 'setup'

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: refactor and split into two clases (single, run)

Split the ResultSummary class into a base class and two child classes:
ResultSummarySingle and ResultSummaryLoop (only a stub at this point).

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: WIP initial implementation of the "loop" command

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: huge refactoring

Implement "summary" (single-shot) and "monitor" (loop) modes based on
preset parameters instead of on the command-line main command.

Split the logic into multiple files, move all monitor-specific and
summary-specific code to independent files, common code in a separate
file.

Full of kludges, I don't like how this is looking so far, might consider
reimplementing it without any dependencies on pipeline code.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: fix markup and indentation

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: new generic templates for monitor mode

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: examples for "monitor" and "summary" modes

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary changelog: summary and monitor modes

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: fix generic regression report

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: summary: fix last_updated option handling

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: embed css stylesheet in html files

Signed-off-by: Ricardo Cañuelo <[email protected]>

* regression_tracker: [trivial] make regression active by default

Fixup for commit fcb29501663d78920bcd129bd57c36b9af624bc4
If the "result" field is ever made non-optional in the models we can
probably remove this.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* regression_tracker: [trivial] set default empty node sequence

Fixup for commit fcb29501663d78920bcd129bd57c36b9af624bc4
If the "node_sequence" field is ever made non-optional in the models we
can probably remove this.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: add cmdline option --output-dir

Introduce a new command-line option: --output-dir, and rename the old
--output to --output-file.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary changelog: command-line options change

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: jobs-chromeos: remove meaningless Tast tests

Several Tast tests can only fail in the context of KernelCI:
* `video.PlatformDecoding.v4l2_state*_vp9_0_svc` do not actually exist,
  causing the whole test job to fail
* `platform.DLCService*` and `platform.Memd` rely on features only
  present in the downstream Chrom{e,ium}OS kernel (see b/247467814 and
  b/244479619 for those having access to Google's issue tracker)
* `kernel.ConfigVerify.chromeos` relies on downstream-only config
  options such as `CONFIG_SECURITY_CHROMIUMOS` and other similar ones,
  and therefore can only fail when testing upstream kernels

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: scheduler-chromeos: don't execute non-working Tast tests

Currently, HEVC-related tests are known to either fail or be skipped as
ChromeOS doesn't yet handle hardware decoding of HEVC media. This is
expected to be fixed at some point though, so we're keeping the job
definitions and only remove the corresponding scheduler entries in order
to reinstate those jobs when relevant.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: jobs-chromeos: exclude Tast tests known to always fail

Several decoder tests always fail on all platforms where they're
executed, adding only noise to otherwise useful test results. Disable
those for improving the quality of the results.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: chromeos: add special case for pre-6.7 qcom codec tests

On Qualcomm-based ChromeBooks (`trogdor` being the only model in
Collabora's lab), we noticed systematic failures of all
`vp9_*_frm_resize` and `vp9_*_sub8x8_sf` tests when using a kernel up to
6.6. With 6.7 and above, all of those tests (except one) now pass. It
therefore makes sense to exclude those on pre-6.7 kernels so we don't
report known failures and get rid of some noise.

This involves "duplicating" affected test jobs (although I did my best
to minimize that) and setting rules so only the working variant is
executed, based on the version of the kernel being tested.

Signed-off-by: Arnaud Ferraris <[email protected]>

* lava_callback: Compress the log files to save storage space

As storage space in cloud and egress have high costs,
better to compress potentially large files.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* tests: Add basic yaml validation

Add yaml load to figure out earlier issues with yaml

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: chromeos: drop stoneyridge/pineview naming in platforms anchors

The "stoneyridge" and "pineview" naming used in the Chromebook platform
anchors refers to ChromiumOS specific config fragments, but doesn't
necessarily match the actual platform of all the devices listed.
Use more generic names to distinguish amd and intel Chromebooks.

Signed-off-by: Laura Nao <[email protected]>

* config: chromeos: rename test job anchors that use chromeos specific configs

Rename test job anchors that use chromeos specific kernel configurations
to include the 'chromeos' infix.

Signed-off-by: Laura Nao <[email protected]>

* config: chromeos: add baseline tests

Enable the baseline tests on all the supported Chromebooks with their
default kernel configuration.

Signed-off-by: Laura Nao <[email protected]>

* config: chromeos: drop stoneyridge/pineview naming in job defs

The "stoneyridge" and "pineview" naming used in some Chromebook job
definitions refers to ChromiumOS specific config fragments, but
doesn't necessarily match the actual platforms targeted by the jobs.
Replace all occurrences with more generic intel/amd naming.

Signed-off-by: Laura Nao <[email protected]>

* config: chromeos: drop chromeos infix from baseline jobs

Keeping different job names for tests targeting different kernel configs
might cause too much duplication. Drop the 'chromeos' infix from the job
name for the tests using the chromeos config fragment. Users will be
able to filter the results using the data.defconfig/data.config_full
fields anyway.

Signed-off-by: Laura Nao <[email protected]>

* result_summary: post-process results for summary and monitor modes

Split the post-processing of nodes to a common function that can be used
for both summary and monitor modes. Currently, post-processing involves
only the collection of logs.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: update and fix presets and templates

Signed-off-by: Ricardo Cañuelo <[email protected]>

* doc/result-summary-CHANGELOG: update

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config/pipeline.yaml: enable 'BayLibre' lab

Add lab configuration for BayLibre.

Signed-off-by: Jeny Sadadia <[email protected]>

* docker-compose.yaml: add `lab-baylibre` runtime

Add runtime argument `lab-baylibre` to `scheduler-lava`
container. This will enable the pipeline to run and
submit jobs to BayLibre.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add `baseline-x86-baylibre` job

Add job configuration `baseline-x86-baylibre` for BayLibre.
Add scheduler entry as well.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add `baseline-armel-baylibre` job

Add job configuration `baseline-armel-baylibre` for BayLibre.
Add scheduler entry and platform config as well.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline: enable `android` tree and build configs

Monitor linux `android` tree. Add build configs for `android-mainline`
branch.

Signed-off-by: Helen Koike <[email protected]>

* config/pipeline.yaml: add kbuild definitions for android-mainline

Add kbuild jobs to compile the kernel for android-mainline branch

Signed-off-by: Helen Koike <[email protected]>

* config/pipeline.yaml: add entries to schedule to build android-mainline

Add entries to `scheduler:` section to run the builds for
android-mainline.

Signed-off-by: Helen Koike <[email protected]>

* result_summary: fix node filter in monitor mode

Signed-off-by: Ricardo Cañuelo <[email protected]>

* kernelci.toml: set `checkout` node timeout to `180 min`

Currently set `60 min` timeout is not enough as some
`kbuild` jobs and its sub-tests take around 2 hrs to
complete after getting submitted to runtime.

Here is an example from staging. See the information
for a `checkout` and its child nodes:

| id                       | name                | created                    | updated                    | timeout                    |
|--------------------------|---------------------|----------------------------|----------------------------|----------------------------|
| 661c9d59b60b785eb9fc42b0 | checkout            | 2024-04-15T03:22:01.317000 | 2024-04-15T03:51:03.870000 | 2024-04-15T04:22:01.284000 |
| 661c9d97b60b785eb9fc42b4 | kbuild-gcc-10-arm64 | 2024-04-15T03:23:03.399000 | 2024-04-15T03:50:15.031000 | 2024-04-15T09:23:03.399000 |
| 661ca3f7b60b785eb9fc4ead | baseline-arm64      | 2024-04-15T03:50:15.304000 | 2024-04-15T05:09:45.247000 | 2024-04-15T09:50:15.304000 |

Signed-off-by: Jeny Sadadia <[email protected]>

* result_summary: add email report capabilities for monitor mode

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: plain text single report templates

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: chromeos: add baseline-nfs tests

Enable the baseline-nfs tests on all the supported Chromebooks, with
both the default and the chromeos kernel configurations.

Signed-off-by: Laura Nao <[email protected]>

* src/timeout: set `checkout` result

For `TIMEOUT` mode, set `checkout` node result to `fail`
if its state is `running` as it means code checkout is still
going on and node timed-out. Set it to `pass` if its state
is any other than `running`.
Set `checkout` node result to `pass` if mode is `DONE` as
it means once `checkout` has been in `available` or `closing`
state and it could successfully complete source code checkout.

Signed-off-by: Jeny Sadadia <[email protected]>

* regression_tracker: bugfix, failed test with no prior runs

Handle the case of a failed test run when it's the first occurence of
that test case. Consider it "not a regression" for now, since we're
defining a regression as a "breaking point" between a success and a
failure.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: platforms-chromeos: fix dalboz device type

Due due to a copy/paste mishap, the device type for
`asus-CM1400CXA-dalboz` had a trailing `_chromeos`, leading LAVA to fail
finding the correct device type, and no job from the new system running
on this platform.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: jobs-chromes: run Tast tests only on 5.4+

Current ChromeOS images have `ext4` filesystems using options not
present in 4.19. Therefore tests cannot run on kernels that old, and
this leads to false positives in corrupt device identification, so we
should only run those tests on 5.4 and later kernels.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: platforms-chromes: drop non-existent platform

`hp-x360-12b-ca0500na-n4000-octopus` isn't a device type available in
Collabora's LAVA lab, so let's drop its definition.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: exclude android tree from kbuild jobs

Only Android-specific kbuild jobs should run for this tree, let's not
overload our system with unneeded builds.

Take this opportunity to limit mediatek kbuilds to 6.1+ as that's the
earliest version that has upstream support for at least one of our
devices.

Signed-off-by: Arnaud Ferraris <[email protected]>

* src/timeout: a bug fix in `_submit_lapsed_nodes`

Fix a glitch in the code related to setting `checkout`
node result.

Fixes: 361fc0d ("src/timeout: set `checkout` result")
Signed-off-by: Jeny Sadadia <[email protected]>

* pipeline.yaml: Update early access FQDN

We are moving k8s from eastus to westus3 as it is cheaper

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* src/tarball: fix `_kdir` in `update_repo`

Fix the below error:
```
kernelci-pipeline-tarball |   File "/home/kernelci/./pipeline/tarball.py", line 79, in _update_repo
kernelci-pipeline-tarball |     kernelci.shell_cmd(f"rm -rf {self._kdir}")
kernelci-pipeline-tarball |                                  ^^^^^^^^^^
kernelci-pipeline-tarball | AttributeError: 'Tarball' object has no attribute '_kdir'
```

Fixes: 0a2fe9c ("src/patchset.py: Implement Patchset service)
Signed-off-by: Jeny Sadadia <[email protected]>

* src/timeout: fix method to get child nodes recursively

`TimeoutService._get_child_nodes_recursive` is used to get
pending child nodes recursively for closing and timed-out
nodes. It overwrites the result while being called recursively.
Fix the method to make it work properly.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: pipeline: rename "armel" arch to "arm"

`armel` has various meanings depending on the system: for ChromeOS, it
is ARMv7, while in Debian it's ARMv{5T,6}. Moreover, this project is
*Kernel*CI and the kernel uses `arm` for all 32-bits ARM devices. In
order to avoid confusion (including those wondering what the heck does
`armel` mean), let's rename `armel` to `arm`.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: use per-system arch property where relevant

With the new `*arch` fields present in the platform configurations, we
don't have to hardcode the architecture strings in some specific cases.
Let's adapt the config files so we use `{cros,deb,k}arch` wherever it
makes sense.

Signed-off-by: Arnaud Ferraris <[email protected]>

* src/timeout: set timed-out `checkout` result

Set timed-out `checkout` node result to `incomplete`
while in `running` state. As it denotes that the node
timed-out while checkout was still going on.
Also, set error related information i.e. `error_code`
and `error_msg`.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/tarball: update checkout node when update repo fails

Tarball updates source code repo and creates tarball.
If update repo operation fails even with second attempt,
it means it failed to checkout souce code.
Hence, update `checkout` node with state `done` state and
result `fail`. Also, set appropriate error information
to the `data` field.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: pipeline: enable collabora-next tree and build config

Monitor the collabora-next tree. Add build config for the for-kernelci
branch.

Signed-off-by: Laura Nao <[email protected]>

* config: chromeos: enable acpi kselftest on collabora-next tree

Run the ACPI kselftest on the for-kernelci branch of the collabora-next
tree.

See: https://lore.kernel.org/linux-kselftest/[email protected]/T/#t

Signed-off-by: Laura Nao <[email protected]>

* result_summary: restore missing split_query_params function

Restore this function that was accidentally removed during the last
refactoring.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* lava_callback: Don't upload empty files to Azure

There is no use for lot of empty files on Azure,
that only complicate cleanup.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result_summary presets: unify preset and output names

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: update preset for aferraris

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: new presets for laura.nao

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: fixes and new presets for nfraprado

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: fix arch query parameters

Signed-off-by: Ricardo Cañuelo <[email protected]>

* k8s: Lot of deployment tested fixes

Fixes in yaml files for k8s production deployment.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result-summary presets: Fix build failure and regression monitors

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* result_summary: added debug traces to the monitor

Show detailed info of the node filterings in real time.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: fix corner case bug when no logs are found

Cover rare case where neither the node nor any of its parents up to the
checkout node have any log artifacts.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: refine stable-rc presets

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: add regression info to test reports

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: escape log snippets

Signed-off-by: Ricardo Cañuelo <[email protected]>

* src: lava_callback: add device ID to node data

It can be useful to know the exact device on which a job ran, without
having to open the LAVA job page. This is done by querying the device ID
from the callback data and appending it to the node data.

Signed-off-by: Arnaud Ferraris <[email protected]>

* src: lava_callback: upload raw callback data as well

Debugging callback issues is complex due to the raw data not being saved
after processing. This change ensures we save the callback data as a
JSON file in order to ease development.

Signed-off-by: Arnaud Ferraris <[email protected]>

* DONOTMERGE lava_callback: add debug statements

Why the heck doesn't this just work???

Signed-off-by: Arnaud Ferraris <[email protected]>

* result_summary_templates: fix error 'node' is undefined

The object is named test and not node, so s/node/test

Signed-off-by: Helen Koike <[email protected]>

* config/runtime/kunit: set architecture info

Set architecture field for `kunit` test
nodes.
If no `arch` argument is supplied, kunit takes
`um` (User Mode Linux) as architecture to run
tests.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/timeout: count running child jobs of build nodes

Add a method to count running jobs of `kbuild`
nodes i.e. jobs being submitted after successful
builds. Fox example `baseline` or `tast` jobs.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/timeout: handle closing `checkout` node differently

Usually, `checkout` should be transited to `done` state
when all its child nodes are completed.
In case of closing `checkout`, take into account
running child jobs of build nodes before transiting
its state to `done`. Otherwise, `checkout` will be
assigned to `done` state even if some child jobs are still
running.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/timeout: handle holdoff reached `checkout` node differently

Usually, available `checkout` for which holdoff is
reached should be transited to `done` state only when
all its child nodes are completed.
In case of such `checkout` node, take into account
running child jobs of build nodes before transiting
its state to `done`. Otherwise, `checkout` will be
assigned to `done` state even if some child jobs are
still running.

Signed-off-by: Jeny Sadadia <[email protected]>

* Revert "DONOTMERGE lava_callback: add debug statements"

This reverts commit 5ed8218d99840373bbba5830b1976813b52bf4b1.

Signed-off-by: Arnaud Ferraris <[email protected]>

* Create dependabot.yml

* result_summary_templates: make generic-test-failures generic to all
results

The generic-test-failures templates can be used to show general results
just replacing the name "failures" by "results". Makeing it easier to be
re-used by communities that want to have pre-sets to list all results of
the tests, so:

	s/generic-test-failures/generic-test-results

Signed-off-by: Helen Koike <[email protected]>

* result-summary.yaml: add preset to list android build tests

Since we now build android, add a preset to allow result-summary.yaml to
list all build results from Android tree.

Signed-off-by: Helen Koike <[email protected]>

* tarball: Implement checkout for specific commit

We often need not ToT, but specific commit, implement this.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* jobs-chromeos.yaml: Disable module compression for every kernel version

Commit d4bbe942098b ("kbuild: remove CONFIG_MODULE_COMPRESS"),
introduced in kernel v5.13, substituted CONFIG_MODULE_COMPRESS=n for
CONFIG_MODULE_COMPRESS_NONE=y as the way to disable module compression.
Since module compression causes "Invalid ELF header magic: != ELF"
errors during boot on the ChromeOS base config, add the missing config
to disable module compression on kernels > v5.13 as well.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* src: lava_callback: reduce callback data size

The callback data is quite large, especially as it includes the full log
which we already upload separately. By dropping it and compressing the
whole file with `gzip` we can avoid wasting too much storage space.

Signed-off-by: Arnaud Ferraris <[email protected]>

* src: lava_callback: don't leak secret token

The callback data contains the secret tokens value which shouldn't be
leaked. Ensure we drop it from the uploaded data.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: platforms-chromeos: use new cros-flash image

This ensures we use the new version of the `install-modules` script.

Signed-off-by: Arnaud Ferraris <[email protected]>

* src: regression_tracker: add the "device" field to regression data

This can be helpful. We're not using it as a search param though, as we
don't want to narrow down the search that much, using the platform only
is better.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: result_summary_templates: report device used for job

This information is now available, and it can be useful to know the
affected device withouth having to look at the LAVA job details.

Signed-off-by: Arnaud Ferraris <[email protected]>

* kubernetes: Update deployment recipe

Update list of labs and add KCI_INSTANCE variable.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* lava-callback: Limit threads of lava-callback

Due inrush of lava callbacks and slow Azure Files
processing, we need to make sure we dont spawn too many
threads.
Also add hard limit of memory 1Gbyte

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result_summary presets: add presetes for fluster test

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Make template generic for all v4l2 tests
- Rebase on main

* result_summary presets: make the name of fluster test generic

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: enable first fluster test for mt8195-cherry-tomato-r2

Enable first fluster test, AV1-TEST-VECTORS for mt8195-cherry-tomato-r2.
Run the test on mainline and next until more trees are added.

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Create generic v4l2-decoder-conformance-job and use anchers from it
- Update the rootfs address
- Move anchor to _anchor
- Update with nitpicks

* config: jobs-chromeos: Add kernelci tree for testing purpose

Remove this commit before merging.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: chromeos: Enable cpufreq kselftest

Enable cpufreq kselftest on all the trees and branches.

Signed-off-by: Shreeya Patel <[email protected]>

* result_summary presets: fix preset for kselftest-dt failures monitor

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: new presets for kselftest-cpufreq

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: mt8195-cherry-tomato-r2: enable all fluster tests for all branches

Add all the trees and branches on which the tests would be ran. Enable
all the tests for tomato.

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- The build config cannot be added yet. Just list the trees, it will only use
  the branches configured in build_configs:
  - mainline will use master
  - next will use master
  - collabora-chromeos-kernel will use for-kernelci
  - media will use master and fixes
- Remove kernelci tree as it was added just for testing purpose

* config: mt8183-kukui-jacuzzi-juniper-sku16: enable add all supported fluster tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>

jacuzzi

* config: mt8186-corsola-steelix-sku131072: enable add all supported fluster tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: mt8192-asurada-spherion-r0: enable add all supported fluster tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Don't specify the platforms manually as they are already mentioned in
  test-job-arm64-mediatek

* config: sc7180-trogdor-kingoftown/lazor-limozeen: enable add all supported fluster tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Use test-job-arm64-qualcomm instead and carete separate jobs for
  qualcomm devices
- Don't specify platforms manually as they are already mentioned in
  test-job-arm64-qualcomm

* build(deps): bump uwsgi from 2.0.21 to 2.0.22 in /docker/lava-callback

Bumps [uwsgi](https://uwsgi-docs.readthedocs.io/en/latest/) from 2.0.21 to 2.0.22.

---
updated-dependencies:
- dependency-name: uwsgi
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* pipeline.yaml: Add stable-rc build variants

Add more build variants for stable-rc tree to match legacy system.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result_summary: add error classification

Classify errors according to patterns in the logs

Signed-off-by: Helen Koike <[email protected]>

* result_summary presets: add collabora-chromeos-kernel and media trees for fluster tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: Use media-stage instead of media-tree

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config/pipeline: enable android branches from legacy

Enable all android branches from the legacy system

Signed-off-by: Helen Koike <[email protected]>

* trigger: Add exclude/include tree list for trigger

As we need to restrict list of running kernels on staging,
we need to add option allowing that.
Also it will be good to exclude staging kernels from production
kernel list.

So in case of staging we need to run kernels only from tree "kernelci"
and sometimes something else, for example "mediatek".
Option will look like:

--trees kernelci,mediatek
or
--trees kernelci

On production we need to exclude trees kernelci and buggytree:
--trees !kernelci,buggytree
or just kernelci:
--trees !kernelci

Purpose of this option is that our compiling capacity is limited,
and right now staging and production both compiling very large set
of kernels, we need to reduce this amount to drop costs.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: platforms-chromeos: use CrOS R124 files

ChromeBooks were upgraded with a new image based on ChromiumOS R124, so
we must use those files now.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: jobs-chromeos: drop non-existent Tast tests

Those were removed between R120 and R124 and therefore cause test
failures with the new images.

Signed-off-by: Arnaud Ferraris <[email protected]>

* result_summary presets: fix acpi kselftest presets

We're interested in catching regressions and failures in the both the
kselftest-acpi test suites and its test cases. Match the nodes by group
in the presets accordingly.
Fix template used by the failure monitor preset.

Signed-off-by: Laura Nao <[email protected]>

* src: update return values of `APIHelper.receive_event_node`

`APIHelper.receive_event_node` method is used to receive
node data from PubSub event. The method has been updated
to return `is_hierarchy` flag as well which represents
events related to node hierarchy.
Update pipeline services using the method accordingly.

Signed-off-by: Jeny Sadadia <[email protected]>

* result_summary presets: refine presets for v4l2-decoder-conformance

Modify the regression preset to monitor regressions on both the
v4l2-decoder-conformance test suites and its test cases, by matching the
nodes by group instead of by name.
Also, change the failure preset to monitor for all errors caused by
runtime errors.

Signed-off-by: Laura Nao <[email protected]>

* result_summary presets: add summary presets for v4l2-decoder-conformance

Add summary presets to fetch regressions and failures on
v4l2-decoder-conformance tests. Two of the presets are the same used by
the monitor; add one additional preset to fetch all the failures on
both the test suites and their test cases.

Signed-off-by: Laura Nao <[email protected]>

* lava_callback.py: Remove error_code/error_msg on lava-callback

Sometimes due congestion node might be set to timeout, but
then result might arrive late and we need to use it properly.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result_summary presets: fix dt kselftest presets

Fix the dt kselftest preset, just like was done for the acpi one, as the
current preset doesn't match the actual results we're interested in.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* doc/connecting-lab: refine documentation

Refine documentation for connecting LAVA labs
and submitting jobs to the lab.

Signed-off-by: Jeny Sadadia <[email protected]>

* lava_callback: Sometimes we get totally invalid log file uploaded

Most likely problems lays in threading of flask, and possibly
callbacks are getting mixed. This commit attempts to introduce
several countermeasures against that.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* doc: add `_index.md` page

Add index documentation page.

Signed-off-by: Jeny Sadadia <[email protected]>

* doc: add `pipeline-details` page

Move `pipeline-details` documentation from the API
repository to this repo to make it close to the source.

Signed-off-by: Jeny Sadadia <[email protected]>

* doc/connecting-lab: adjust `weight` property

Change `weight` property of existing doc page to
accommodate with transition of pipeline related docs
to pipeline repo.

Signed-off-by: Jeny Sadadia <[email protected]>

* doc: add `developer-documentation` page

Add developer manual documentation.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add lab config for Qualcomm

Add an entry to `runtimes` section for Qualcomm
lab configurations.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add `baseline-x86` job for qualcomm

Add job configuration `baseline-x86-qualcomm` for
running baseline job in Qualcomm LAVA lab.
Add scheduler entry as well.

Signed-off-by: Jeny Sadadia <[email protected]>

* docker-compose.yaml: add lab-qualcomm runtime

Add runtime argument `lab-qualcomm` to `scheduler-lava`
container. This will enable the pipeline to run and
submit jobs to Qualcomm LAVA lab.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add `baseline-arm64` job for qualcomm

Add job configuration `baseline-arm64-qualcomm` for
running baseline job for `arm64` in Qualcomm LAVA lab.
Add scheduler entry as well.

Signed-off-by: Jeny Sadadia <[email protected]>

* pipeline.yaml: Update RISC-V configs

1)rv32 defconfig doesn't exist, remove
2)nommu_k210_defconfig have modules disabled

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* lava_callback.py: Sanitize lava log data

As we use this data in reports, lets remove all
non-printable characters as they confuse grafana, browsers and others.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config/runtime/kunit.jinja2: fix result map

Fix result map for skipped tests. Initially, API
didn't have `skip` available node result in the schema.
That's why it was mapped to `None` result. But now API
has `skip` result to denote skipped tests.
Fix the result mapping accordingly.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: jobs-chromeos: Add lab-setup fragment

Add the lab-setup fragment to the chromebook builds, which contains the
architecture independent kernel configs needed to run tests on the
platform. Notably this disables IP autoconfig by the kernel.

The result of this change is that the 12 seconds boot delay and the
consequent deferred probe pending warnings will no longer happen on any
platform. Particularly on mt8186-corsola-steelix-sku131072 (due to a
different network adapter being used) on which it was still happening.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* lava_callback: bump up slightly threads number

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: chromeos: enable watchdog reset test on Chromebooks

Add a basic test to verify watchdog reset functionality. Enable the
test on all ARM64 and AMD x86_64 Chromebooks. For Intel
Chromebooks, enable the test only on octopus, as ACPI PM Timer on the
other devices has been disabled in coreboot.

Signed-off-by: Laura Nao <[email protected]>

* src/send_kcidb: use schema version 4.3

Test status `MISS` was added to KCIDB in schema
v4.2 and supported by the latest version i.e. v4.3.
Hence, use the latest version for submission as
API may send a few tests with "MISS" status.

Signed-off-by: Jeny Sadadia <[email protected]>

* send_kcidb: re-structure code for parsing checkout node

Move code for parsing checkout node to a separate
method.
Add `valid` field to parsed checkout node. It denotes
if source code was successfully checked out.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: print more information on invalid data

Print details for invalid revision data for the
sake of debugging.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: optimize `kcidb` import

Remove redundant `kcidb` import and adjust
kcidb Client call accordingly.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: remove keys with `None` values

KCIDB doesn't allow `None` as field value.
Remove all optional fields with `None` value
to make it valid data for submitting to KCIDB.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: add `kcidb_test_suite` property

Every KernelCI test will be mapped to a unified
test suite for KCIDB data submission.
Add `kcidb_test_suite` property to test job
definitions in YAML configuration files.
The added property will store the mapped
KCIDB test suite name.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: parse and submit node test and build data

Listen to all the node events with node state
`done` or `available` and submit the node to KCIDB.
Parse node received from the event and create KCIDB
schema compatible object based on type of the node
i.e. checkout, build or test.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: set `log_excerpt` for builds and tests

Fetch logs from compressed log file(*.log.gz) URL
and send last 16*1024 characters for setting `log_excerpt`
field for build and test nodes as it is the max allowed
length of the KCIDB field.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/jobs-chromes: add kcidb test suite property for watchdog test

Add KCIDB test suite mapping for `watchdog_reset` test.

Signed-off-by: Jeny Sadadia <[email protected]>

* lava_callback.py: disable log removal from callback data

We need it for investigations if we have any critical data
loss during log sanitizing.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* src/send_kcidb: add error info to build nodes

Add error metadata fields such as `error_code` and
`error_msg` to `misc` field for build nodes.

Signed-off-by: Jeny Sadadia <[email protected]>

* result_summary presets: add watchdog-reset presets for mainline/next

Add monitor and summary presets to track the results from the watchdog
reset test on the mainline and next trees.

Signed-off-by: Laura Nao <[email protected]>

* pipeline.yaml: Fix fluster rootfs URL

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* src/send_kcidb: get error metadata for failed/incomplete tests

Tweak condition to get error metadata for test nodes.
It should get error info for incomplete nodes as well
and not just failed nodes.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: send tests only if KCIDB test mapping exists

All test suite definitions must have `kcidb_test_suite`
property i.e. KCIDB test suite mapping.
Only send tests for those the mapping is found.

Signed-off-by: Jeny Sadadia <[email protected]>

* tests/validate_yaml: add validation for KCIDB mapping

To submit KernelCI generated data to KCIDB, it is required
to have a mapping for all the job definition with
`kcidb_test_suite` property.
Add validation to ensure all the jobs have a mapping
present to avoid missing data submission.
This check is to notify test authors trying to enable tests
in maestro to include the required property for the mapping
in their definition.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add qcs6490-rb3gen2 boot test

Signed-off-by: Milosz Wasilewski <[email protected]>

* config: chromeos: Enable kselftest-dt on Qualcomm platforms

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* pipeline.yaml: Add one um build for android trees

As per request of Android team it will be good to check for breakages
UM builds as well.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: use `kind=job` for test suites

As part of re-structuring test hierarachy, `Job` model
has been introduced for test suite/job nodes.
It uses node kind `job`.
Update test configurations in `pipeline.yaml` and
`jobs-chromeos.yaml` to use `kind=job` to
generate job nodes.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/runtime/kunit.jinja2: provide `kind` value for child tests

In case of submitting test hierarchy, child nodes by default
inherit `kind` value from parent node.
As we are re-structuring test hierarchy, test suit/job nodes
will have `kind=job` where its child test nodes will have
`kind=test`. Provide `kind` field explicitly to test result
hierarchy to preserve different kind value than the parent
node.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/runtime/kunit.jinja2: fix `NameError`

Fix the below error in `_submit` method:
```
Traceback (most recent call last):
  File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 287, in main
    job.submit(results)
  File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 138, in submit
    self._submit(result)
  File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 265, in _submit
    return node
NameError: name 'node' is not defined
```

Signed-off-by: Jeny Sadadia <[email protected]>

* config/runtime/kunit.jinja2: evaluate job node result

Evaluate job node result from child node results if
`null` result is receive from test result parser.
For example nodes such as `fortify`:
https://staging.kernelci.org:9000/viewer?node_id=6670ab43d0b7694b399897c4

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: fix parsing of KUnit log file

Handle both compressed(gzip) and plain text log files
for getting log excerpt.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: HTTP exception handling for log excerpt

Add HTTP exception handling for getting
log excerpt data.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: platforms-chromeos: Add serial delay for some Mediatek platforms

Add test_character_delay to the Spherion, Tomato and Steelix platforms
to workaround the fact that they're sometimes unable to process serial
input fast enough, resulting in mangled commands and consequently flaky
test results, as described in
https://github.com/kernelci/kernelci-project/issues/366.

The right place to do this change would be in the device-type template
as described in LAVA's documentation [1]. This overriding in KernelCI is
meant only as a temporary workaround to verify whether this fixes the
issue. If it does, then we'll do it in LAVA upstream instead.

[1] https://docs.lavasoftware.org/lava/debugging.html#differences-in-input-speeds
Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* config: chromeos: Enable error-logs kselftest for MediaTek Chromebooks

Run the error-logs kselftest on MediaTek Chromebooks. This test is
currently under review upstream [1] so, in the meantime, it has been
added to the collabora-next tree so it can prove its value by helping to
detect issues upstream.

[1] https://lore.kernel.org/all/[email protected]

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* config/pipeline.yaml: enable CIP lab

Add configuration for LAVA CIP lab.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add baseline-x86 test for CIP

Add `baseline-x86-cip` test to be submitted to CIP
LAVA lab.

Signed-off-by: Jeny Sadadia <[email protected]>

* docker-compose.yaml: add `lab-cip` runtime

Add runtime argument `lab-cip` to `scheduler-lava`
container. This will enable the pipeline to run and
submit jobs to CIP LAVA lab.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: enable `job` node submission to KCIDB

Parse newly added job node and its child tests
for KCIDB submission.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: don't submit `setup` test suite nodes

`setup` test suite has been introduced to store test results
for environment setup checks before running actual test suite.
KCIDB doesn't require `setup` test suite result as long as
main test job result is submitted.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: add a check before sending data

Check if parsed data is available before
sending revision data to KCIDB.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: fix logs

Fix log statement about submitting node to KCIDB
as we are not sending all the nodes we receive
event for to KCIDB.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: handle skipped tests

Do not retrieve artifacts or metadata from parent
node for skipped tests as in pratice only kernel
revision, test runtime and platform will be
available for skipped tests.

Signed-off-by: Jeny Sadadia <[email protected]>

* result_summary/utils: ignore failures on log retrieval

Make the script continue running if there was an error fetching a test
log.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* doc/developer-documentation: add docs for enabling new tests

Add developer documentation for enabling new tests.

Signed-off-by: Jeny Sadadia <[email protected]>

* Fix links after docs page migration

Documentation has been migrated to the "docs.*" subdomain.

Signed-off-by: Paweł Wieczorek <[email protected]>

* pipeline.yaml: Add kcidebug fragment

Add useful low-overhead debug option to kernel,
and test on most x86 boards we have available,
with minimal baseline tests.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* configs: update gcc-10 to gcc-12

As we upgrade compiler images, we need update gcc version

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* regression_tracker: workaround: match node paths programatically

Don't use 'path' as an api search parameter. The use of lists as query
parameters (path is a list) is undefined. Instead, do the filtering in
code.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: remove qemu jobs from lab-qualcomm

QEMU jobs use container pulled from hub.docker.com. After the lab move
pulling from this registry is no longer possible at Qualcomm. This patch
disables QEMU jobs from Qualcomm lab.

Signed-off-by: Milosz Wasilewski <[email protected]>

* validate_yaml.py: Improve pipeline validation

Add validation that scheduler entries have matching job entry,
this is critical validation, and job entries have at least
one entry in the scheduler.
Fix one entry detected by this validation

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* pipeline.yaml: Add broonie(Mark Brown) trees to pipeline

It is time to enable even more trees.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* validate_yaml.py: Add additional verification for duplicate keys

We might have redefined same keys in different yaml files,
this tool will ensure consistency of this entries.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* validate_yaml.py: Remove path separator

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* validate_yaml.py: Rename variable to schedules

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config/kernelci.toml: update KCIDB origin name

As we agreed to refer new KernelCI API & Pipeline as
"maestro", use the new name while submitting data
to KCIDB.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: update KCI result mapping with KCIDB status

Update evaluation of KCIDB status from KCI result.

Create 2 categories for error codes:
1. When pre-check tests completed but actual test suite
coudln't run - this will have `MISS` status
2. When pre-check tests completed, actual test suite could
run but somehow couldn't complete - this will have `ERROR` status

Some LAVA error codes can occur at any point of execution
such as `Cancelled` and `Test`.
Listed such error codes to the most relevant category
based on analysis of available results.

Signed-off-by: Jeny Sadadia <[email protected]>

* result_summary presets: fix presets for v4l2-decoder-conformance

Following recent updates to data representation on KernelCI nodes,
the top-level nodes for tests now have their kind set to 'job' instead
of  'test'. Update the presets for v4l2-decoder-conformance tests
accordingly.

Signed-off-by: Laura Nao <[email protected]>

* result_summary presets: fix output file name in kselftest-acpi preset

Signed-off-by: Laura Nao <[email protected]>

* config: enable dmabuf-heaps, exec and iommu kselftest suites

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Add kcidb_test_suite

* config: result-summary: add generic rule to monitor failures and regression

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: Add rt-stable builds

Copy rt-stable builds from legacy KernelCI.

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Major changes to move to new way of writing kbuild jobs

* config: pipeline: Add v6.6-rt branch for builds

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: result-summary: add rt-stable kbuilds presets

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: chromeos: Add 'nfs' suffix to KCIDB suite name for baseline-nfs

The baseline test is currently run with both ramdisk and nfs rootfs. To
distinguish baseline-nfs tests in KCIDB, add an 'nfs' suffix to the KCIDB
test suite name.

Signed-off-by: Laura Nao <[email protected]>

* aks: Add kubernetes kcidb deployment

We need file that will manage deployment of kcidb bridge
in kubernetes production deployment.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* kubernetes: Adjust trigger k8s options

Ignore kernelci tree on production, as it is special
"staging"-only tree, and read all /config directory, not just default
pipeline.yaml.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* regression_tracker: bugfix: catch empty search condition

Fix _get_last_matching_node(), after the previous change there was an
unhandled scenario where nodes may be empty but the function wouldn't
return None immediately.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: pipeline: correct the kind of kselftest suites to job

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* scheduler-chromeos.yaml: Temporarily disable non-essential tast tests

As per discussion, we disable temporary tast tests which unlikely
will be reviewed.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* k8s/aks: Update deployment files

1)Update memory limit, as working with linux sources might require 3Gbyte of RAM.
2)Update config file path
3)Add callback environment variable
4)Update image reference to fresh one

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: pipeline: enable android builds with gcc-12 for all architectures

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: enable android builds with clang-17 for all architectures

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: remove build_variants from android build_configs

The build_variants is legacy way to specify the different variants. We
have moved to the newer way to specify the variants. Hence remove the
build_variants from android build_configs.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: add android15-6.6-lts branch for build as well

The android15-6.6-lts has been included recently in legacy KernelCI:
https://github.com/kernelci/kernelci-core/pull/2597

Add the same in newer KernelCI.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: add blocklist for riscv older kernels for android builds

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: update KCIDB test suite mapping for baseline

Use `boot` as KCIDB test suite mapping for all
baseline tests.

Signed-off-by: Jeny Sadadia <[email protected]>

* callback_url: Update config and README

As we are moving callback URL to environment variable,
updating config and README accordingly.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: pipeline: enable android baseline (boot) testing for arm and arm64 in only allmodconfig

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* scheduler.py: If event have jobfilter, inject it to the node data

When someone generate artificial event with jobfilter, this is
likely maintainer trying to repeat job. Treat this accordingly,
and inject job filter to job node, so we will run only tests
maintainer wants.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* lava_callback: migrate to fastapi

It will be easier to maintain API and Pipeline, as
both will be powered by FastAPI framework.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: chromeos: Update fluster rootfs URL

Signed-off-by: Laura Nao <[email protected]>

* config: pipeline: fix defconfigs in fragments

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* kbuild.jinja2: support defconfig as list or str

As required in https://github.com/kernelci/kernelci-core/pull/2608
defconfig might be two types. Support it in jinja2 accordingly.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: piepline: add kbuilds of lee-mfd with default defconfigs

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: enable baseline testing for mfd for one board of each arch

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: fix platform sections for Qualcomm and Android schedules

Signed-off-by: Paweł Wieczorek <[email protected]>

* k8s: Update deployment to uvicorn, as we use fastapi now

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: pipeline: Unblock android runs on lava-collabora

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* pipeline: Enable preempt-rt cyclictest test

Enable the first preempt-rt test, cyclictest in new KernelCI. Enable it
on all platforms.

Since these are all smoke test there is no point in running them too
long. Thus reduce the runtime per test to one minute. This should keep
the total preempt-rt runtime roughly in the same time frame.

The changes have been ported from Daniel's PR [1].

[1] https://github.com/kernelci/kernelci-core/pull/2397

Signed-off-by: Daniel Wagner <[email protected]>
Co-developed-by: Muhammad Usama Anjum <[email protected]>
Signed-off-by: Muhammad Usama Anjum <[email protected]>

* pipeline: add all the test jobs for all rt-test

Add jobs definition of all the rt-tests. Enable cyclicdeadline and rtla
tests to run on all targets.

The changes have been ported from Daniel's PR [1].

[1] https://github.com/kernelci/kernelci-core/pull/2397

Signed-off-by: Daniel Wagner <[email protected]>
Co-developed-by: Muhammad Usama Anjum <[email protected]>
Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: add template and test properties for preempt_rt jobs

Add template, job add kcidb_test_suite properties for all preempt-rt jobs

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: rename preempt-rt to rt-tests which is correct name of tests

The legacy was using preempt-rt name of tests. But the repository has
rt-tests name. We must use the same name to merge with execution results
coming from other CIs in KCIDB.

Suggested-by: Jeny Sadadia <[email protected]>
Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: add the correct nfsroot for rt-tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: Remove android's deprecated branches

It has been confirmed with Todd that we should remove the deprecated
branches. Hence remove those branches.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: run baseline on non-allmodconfig

The allmodconfig generates very large kernel image. It cannot be booted
on the arm64 and arm targets as tftp errors out that size is too large.
Reduce the kernel image size. Use the default defconfig. The same
defconfigs have been booting for other trees.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* doc: developer-documentation: Update documentation by adding more details

- Reorganize some things
- Specify how to write different variants by removing old syntax
- Give two separate templates for kbuild and test
- Try to put more details for new contributors

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes since v1:
- Fix type
- Apply suggestions from code review

* doc/developer-documentation: fix a glitch in enabling new tree section

Fix a minor bug in YAML block formatting.

Fixes: f5f57de ("doc: developer-documentation: Update documentation by adding more details")
Signed-off-by: Jeny Sadadia <[email protected]>

* doc/developer-documentation: update a section title

Rename a section from "Enabling a new Kernel tree" to
"Enabling new KernelCI trees, builds, and tests" as it explains
enabling tests as well.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: use the new `tree:branch` format for rules

For cases where we want a single branch to be allowed for a given tree,
we can now use the `tree:branch` format in rules. Convert existing rules
accordingly.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: pipeline: fix improper use of "filters" attribute

The `filters` param was used in the legacy system but has been replaced
by `rules`, with a different syntax.

For Android RISC-V builds, this was used to deny job execution on
kernels < 4.19, so let's translate this condition with the rules format,
and do a similar change for the `rt-tests`-based jobs.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config/pipeline.yaml: Fix x86 typo in kcidebug job names

The kcidebug jobs that run on MediaTek and Qualcomm platforms should
have arm64 in the name rather than x86. Fix the typo.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* config: pipeline: remove params

The parameters are only needed when they are changed or appeneded.
Remvoe the parameters which aren't being modified.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* validate_yaml.py: Jobs are required to have template parameter

Add more validation to config files of mandatory parameters.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* validate_yaml.py: Add more job validations

Add basic validation, each job must have kind parameter

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* workflows: Add label on CI check failures

Automatically add label so broken PR wont go to staging

Signed-off-by: Denys Fedoryshchenko <[email protected]>

---------

Signed-off-by: Jeny Sadadia <[email protected]>
Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
Signed-off-by: Denys Fedoryshchenko <[email protected]>
Signed-off-by: Ricardo Cañuelo <[email protected]>
Signed-off-by: Helen Koike <[email protected]>
Signed-off-by: Arnaud Ferraris <[email protected]>
Signed-off-by: Laura Nao <[email protected]>
Signed-off-by: Muhammad Usama Anjum <[email protected]>
Signed-off-by: Shreeya Patel <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Milosz Wasilewski <[email protected]>
Signed-off-by: Paweł Wieczorek <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
Co-authored-by: Jeny Sadadia <[email protected]>
Co-authored-by: Nícolas F. R. A. Prado <[email protected]>
Co-authored-by: Ricardo Cañuelo <[email protected]>
Co-authored-by: Helen Koike <[email protected]>
Co-authored-by: Arnaud Ferraris <[email protected]>
Co-authored-by: Laura Nao <[email protected]>
Co-authored-by: Muhammad Usama Anjum <[email protected]>
Co-authored-by: Shreeya Patel <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Milosz Wasilewski <[email protected]>
Co-authored-by: Paweł Wieczorek <[email protected]>
Co-authored-by: Milosz Wasilewski <[email protected]>
Co-authored-by: Daniel Wagner <[email protected]>
Signed-off-by: Denys Fedoryshchenko <[email protected]>
nuclearcat added a commit to nuclearcat/kernelci-pipeline that referenced this issue Jul 24, 2024
* src/scheduler: store error message when job fails with "submit_error"

It is helpful for debugging to catch error message when
scheduler fails to submit job to runtime.
Store the error message to `data.error_msg` field.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: pipeline: Set minimum kernel version for DT kselftest to 6.7

The test was introduced upstream in version 6.7, so no point in trying
to run it on earlier versions.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* configs/: Update volteer device

Update volteer devices according lab availability

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result_summary templates: detailed output for active/inactive regressions

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: new presets for active regressions

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: update CHANGELOG

Signed-off-by: Ricardo Cañuelo <[email protected]>

* data: chmod -R 777 ./data/output to avoid permission error

Avoid errors like

PermissionError: [Errno 13] Permission denied: '/home/kernelci/data/output/stable-rc-boot.html'

Signed-off-by: Helen Koike <[email protected]>

* result_summary: move code to _get_logs

Signed-off-by: Helen Koike <[email protected]>

* result_summary: use ThreadPoolExecutor to fetch logs

Fetching logs is the bottleneck of the script. Fetch them in parallel
with ThreadPoolExecutor.

Signed-off-by: Helen Koike <[email protected]>

* result_summary: fix result presets

stable-rc-build-failures and stable-rc-boot-failures weren't querying
specifically for test failures.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* src/regression_tracker: rework regression detection

Take into account "active" and "inactive" regressions when creating them
and when processing new passed or failed nodes.

When a node passes, it checks if it "inactivates" an existing "active"
regression. When a node fails, it checks if it needs to create a new
regression or update an existing "active" one.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* src/regression_tracker: link failed nodes to active regressions

When a failed node generates a regression, or when it's a re-run of a
run that generated a still active regression, link the node to the
regression id.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: support for date ranges for creation and update

New command line options to let the user specify date ranges for node
creation and last update: --created-from, --created-to,
--last-updated-from, --last-updated-to

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: support for date ranges for creation and last update

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: support for extra query parameters in cmdline

New command line option: --query-params to specify a set of extra query
parameters to complete or override preset parameters.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: html markup in some preset titles

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary changelog: update and move to docs folder

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: move parameter loading and processing to 'setup'

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: refactor and split into two clases (single, run)

Split the ResultSummary class into a base class and two child classes:
ResultSummarySingle and ResultSummaryLoop (only a stub at this point).

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: WIP initial implementation of the "loop" command

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: huge refactoring

Implement "summary" (single-shot) and "monitor" (loop) modes based on
preset parameters instead of on the command-line main command.

Split the logic into multiple files, move all monitor-specific and
summary-specific code to independent files, common code in a separate
file.

Full of kludges, I don't like how this is looking so far, might consider
reimplementing it without any dependencies on pipeline code.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: fix markup and indentation

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: new generic templates for monitor mode

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: examples for "monitor" and "summary" modes

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary changelog: summary and monitor modes

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: fix generic regression report

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: summary: fix last_updated option handling

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: embed css stylesheet in html files

Signed-off-by: Ricardo Cañuelo <[email protected]>

* regression_tracker: [trivial] make regression active by default

Fixup for commit fcb29501663d78920bcd129bd57c36b9af624bc4
If the "result" field is ever made non-optional in the models we can
probably remove this.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* regression_tracker: [trivial] set default empty node sequence

Fixup for commit fcb29501663d78920bcd129bd57c36b9af624bc4
If the "node_sequence" field is ever made non-optional in the models we
can probably remove this.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: add cmdline option --output-dir

Introduce a new command-line option: --output-dir, and rename the old
--output to --output-file.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary changelog: command-line options change

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: jobs-chromeos: remove meaningless Tast tests

Several Tast tests can only fail in the context of KernelCI:
* `video.PlatformDecoding.v4l2_state*_vp9_0_svc` do not actually exist,
  causing the whole test job to fail
* `platform.DLCService*` and `platform.Memd` rely on features only
  present in the downstream Chrom{e,ium}OS kernel (see b/247467814 and
  b/244479619 for those having access to Google's issue tracker)
* `kernel.ConfigVerify.chromeos` relies on downstream-only config
  options such as `CONFIG_SECURITY_CHROMIUMOS` and other similar ones,
  and therefore can only fail when testing upstream kernels

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: scheduler-chromeos: don't execute non-working Tast tests

Currently, HEVC-related tests are known to either fail or be skipped as
ChromeOS doesn't yet handle hardware decoding of HEVC media. This is
expected to be fixed at some point though, so we're keeping the job
definitions and only remove the corresponding scheduler entries in order
to reinstate those jobs when relevant.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: jobs-chromeos: exclude Tast tests known to always fail

Several decoder tests always fail on all platforms where they're
executed, adding only noise to otherwise useful test results. Disable
those for improving the quality of the results.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: chromeos: add special case for pre-6.7 qcom codec tests

On Qualcomm-based ChromeBooks (`trogdor` being the only model in
Collabora's lab), we noticed systematic failures of all
`vp9_*_frm_resize` and `vp9_*_sub8x8_sf` tests when using a kernel up to
6.6. With 6.7 and above, all of those tests (except one) now pass. It
therefore makes sense to exclude those on pre-6.7 kernels so we don't
report known failures and get rid of some noise.

This involves "duplicating" affected test jobs (although I did my best
to minimize that) and setting rules so only the working variant is
executed, based on the version of the kernel being tested.

Signed-off-by: Arnaud Ferraris <[email protected]>

* lava_callback: Compress the log files to save storage space

As storage space in cloud and egress have high costs,
better to compress potentially large files.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* tests: Add basic yaml validation

Add yaml load to figure out earlier issues with yaml

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: chromeos: drop stoneyridge/pineview naming in platforms anchors

The "stoneyridge" and "pineview" naming used in the Chromebook platform
anchors refers to ChromiumOS specific config fragments, but doesn't
necessarily match the actual platform of all the devices listed.
Use more generic names to distinguish amd and intel Chromebooks.

Signed-off-by: Laura Nao <[email protected]>

* config: chromeos: rename test job anchors that use chromeos specific configs

Rename test job anchors that use chromeos specific kernel configurations
to include the 'chromeos' infix.

Signed-off-by: Laura Nao <[email protected]>

* config: chromeos: add baseline tests

Enable the baseline tests on all the supported Chromebooks with their
default kernel configuration.

Signed-off-by: Laura Nao <[email protected]>

* config: chromeos: drop stoneyridge/pineview naming in job defs

The "stoneyridge" and "pineview" naming used in some Chromebook job
definitions refers to ChromiumOS specific config fragments, but
doesn't necessarily match the actual platforms targeted by the jobs.
Replace all occurrences with more generic intel/amd naming.

Signed-off-by: Laura Nao <[email protected]>

* config: chromeos: drop chromeos infix from baseline jobs

Keeping different job names for tests targeting different kernel configs
might cause too much duplication. Drop the 'chromeos' infix from the job
name for the tests using the chromeos config fragment. Users will be
able to filter the results using the data.defconfig/data.config_full
fields anyway.

Signed-off-by: Laura Nao <[email protected]>

* result_summary: post-process results for summary and monitor modes

Split the post-processing of nodes to a common function that can be used
for both summary and monitor modes. Currently, post-processing involves
only the collection of logs.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: update and fix presets and templates

Signed-off-by: Ricardo Cañuelo <[email protected]>

* doc/result-summary-CHANGELOG: update

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config/pipeline.yaml: enable 'BayLibre' lab

Add lab configuration for BayLibre.

Signed-off-by: Jeny Sadadia <[email protected]>

* docker-compose.yaml: add `lab-baylibre` runtime

Add runtime argument `lab-baylibre` to `scheduler-lava`
container. This will enable the pipeline to run and
submit jobs to BayLibre.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add `baseline-x86-baylibre` job

Add job configuration `baseline-x86-baylibre` for BayLibre.
Add scheduler entry as well.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add `baseline-armel-baylibre` job

Add job configuration `baseline-armel-baylibre` for BayLibre.
Add scheduler entry and platform config as well.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline: enable `android` tree and build configs

Monitor linux `android` tree. Add build configs for `android-mainline`
branch.

Signed-off-by: Helen Koike <[email protected]>

* config/pipeline.yaml: add kbuild definitions for android-mainline

Add kbuild jobs to compile the kernel for android-mainline branch

Signed-off-by: Helen Koike <[email protected]>

* config/pipeline.yaml: add entries to schedule to build android-mainline

Add entries to `scheduler:` section to run the builds for
android-mainline.

Signed-off-by: Helen Koike <[email protected]>

* result_summary: fix node filter in monitor mode

Signed-off-by: Ricardo Cañuelo <[email protected]>

* kernelci.toml: set `checkout` node timeout to `180 min`

Currently set `60 min` timeout is not enough as some
`kbuild` jobs and its sub-tests take around 2 hrs to
complete after getting submitted to runtime.

Here is an example from staging. See the information
for a `checkout` and its child nodes:

| id                       | name                | created                    | updated                    | timeout                    |
|--------------------------|---------------------|----------------------------|----------------------------|----------------------------|
| 661c9d59b60b785eb9fc42b0 | checkout            | 2024-04-15T03:22:01.317000 | 2024-04-15T03:51:03.870000 | 2024-04-15T04:22:01.284000 |
| 661c9d97b60b785eb9fc42b4 | kbuild-gcc-10-arm64 | 2024-04-15T03:23:03.399000 | 2024-04-15T03:50:15.031000 | 2024-04-15T09:23:03.399000 |
| 661ca3f7b60b785eb9fc4ead | baseline-arm64      | 2024-04-15T03:50:15.304000 | 2024-04-15T05:09:45.247000 | 2024-04-15T09:50:15.304000 |

Signed-off-by: Jeny Sadadia <[email protected]>

* result_summary: add email report capabilities for monitor mode

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: plain text single report templates

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: chromeos: add baseline-nfs tests

Enable the baseline-nfs tests on all the supported Chromebooks, with
both the default and the chromeos kernel configurations.

Signed-off-by: Laura Nao <[email protected]>

* src/timeout: set `checkout` result

For `TIMEOUT` mode, set `checkout` node result to `fail`
if its state is `running` as it means code checkout is still
going on and node timed-out. Set it to `pass` if its state
is any other than `running`.
Set `checkout` node result to `pass` if mode is `DONE` as
it means once `checkout` has been in `available` or `closing`
state and it could successfully complete source code checkout.

Signed-off-by: Jeny Sadadia <[email protected]>

* regression_tracker: bugfix, failed test with no prior runs

Handle the case of a failed test run when it's the first occurence of
that test case. Consider it "not a regression" for now, since we're
defining a regression as a "breaking point" between a success and a
failure.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: platforms-chromeos: fix dalboz device type

Due due to a copy/paste mishap, the device type for
`asus-CM1400CXA-dalboz` had a trailing `_chromeos`, leading LAVA to fail
finding the correct device type, and no job from the new system running
on this platform.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: jobs-chromes: run Tast tests only on 5.4+

Current ChromeOS images have `ext4` filesystems using options not
present in 4.19. Therefore tests cannot run on kernels that old, and
this leads to false positives in corrupt device identification, so we
should only run those tests on 5.4 and later kernels.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: platforms-chromes: drop non-existent platform

`hp-x360-12b-ca0500na-n4000-octopus` isn't a device type available in
Collabora's LAVA lab, so let's drop its definition.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: exclude android tree from kbuild jobs

Only Android-specific kbuild jobs should run for this tree, let's not
overload our system with unneeded builds.

Take this opportunity to limit mediatek kbuilds to 6.1+ as that's the
earliest version that has upstream support for at least one of our
devices.

Signed-off-by: Arnaud Ferraris <[email protected]>

* src/timeout: a bug fix in `_submit_lapsed_nodes`

Fix a glitch in the code related to setting `checkout`
node result.

Fixes: 361fc0d ("src/timeout: set `checkout` result")
Signed-off-by: Jeny Sadadia <[email protected]>

* pipeline.yaml: Update early access FQDN

We are moving k8s from eastus to westus3 as it is cheaper

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* src/tarball: fix `_kdir` in `update_repo`

Fix the below error:
```
kernelci-pipeline-tarball |   File "/home/kernelci/./pipeline/tarball.py", line 79, in _update_repo
kernelci-pipeline-tarball |     kernelci.shell_cmd(f"rm -rf {self._kdir}")
kernelci-pipeline-tarball |                                  ^^^^^^^^^^
kernelci-pipeline-tarball | AttributeError: 'Tarball' object has no attribute '_kdir'
```

Fixes: 0a2fe9c ("src/patchset.py: Implement Patchset service)
Signed-off-by: Jeny Sadadia <[email protected]>

* src/timeout: fix method to get child nodes recursively

`TimeoutService._get_child_nodes_recursive` is used to get
pending child nodes recursively for closing and timed-out
nodes. It overwrites the result while being called recursively.
Fix the method to make it work properly.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: pipeline: rename "armel" arch to "arm"

`armel` has various meanings depending on the system: for ChromeOS, it
is ARMv7, while in Debian it's ARMv{5T,6}. Moreover, this project is
*Kernel*CI and the kernel uses `arm` for all 32-bits ARM devices. In
order to avoid confusion (including those wondering what the heck does
`armel` mean), let's rename `armel` to `arm`.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: use per-system arch property where relevant

With the new `*arch` fields present in the platform configurations, we
don't have to hardcode the architecture strings in some specific cases.
Let's adapt the config files so we use `{cros,deb,k}arch` wherever it
makes sense.

Signed-off-by: Arnaud Ferraris <[email protected]>

* src/timeout: set timed-out `checkout` result

Set timed-out `checkout` node result to `incomplete`
while in `running` state. As it denotes that the node
timed-out while checkout was still going on.
Also, set error related information i.e. `error_code`
and `error_msg`.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/tarball: update checkout node when update repo fails

Tarball updates source code repo and creates tarball.
If update repo operation fails even with second attempt,
it means it failed to checkout souce code.
Hence, update `checkout` node with state `done` state and
result `fail`. Also, set appropriate error information
to the `data` field.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: pipeline: enable collabora-next tree and build config

Monitor the collabora-next tree. Add build config for the for-kernelci
branch.

Signed-off-by: Laura Nao <[email protected]>

* config: chromeos: enable acpi kselftest on collabora-next tree

Run the ACPI kselftest on the for-kernelci branch of the collabora-next
tree.

See: https://lore.kernel.org/linux-kselftest/[email protected]/T/#t

Signed-off-by: Laura Nao <[email protected]>

* result_summary: restore missing split_query_params function

Restore this function that was accidentally removed during the last
refactoring.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* lava_callback: Don't upload empty files to Azure

There is no use for lot of empty files on Azure,
that only complicate cleanup.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result_summary presets: unify preset and output names

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: update preset for aferraris

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: new presets for laura.nao

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: fixes and new presets for nfraprado

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: fix arch query parameters

Signed-off-by: Ricardo Cañuelo <[email protected]>

* k8s: Lot of deployment tested fixes

Fixes in yaml files for k8s production deployment.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result-summary presets: Fix build failure and regression monitors

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* result_summary: added debug traces to the monitor

Show detailed info of the node filterings in real time.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: fix corner case bug when no logs are found

Cover rare case where neither the node nor any of its parents up to the
checkout node have any log artifacts.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: refine stable-rc presets

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: add regression info to test reports

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: escape log snippets

Signed-off-by: Ricardo Cañuelo <[email protected]>

* src: lava_callback: add device ID to node data

It can be useful to know the exact device on which a job ran, without
having to open the LAVA job page. This is done by querying the device ID
from the callback data and appending it to the node data.

Signed-off-by: Arnaud Ferraris <[email protected]>

* src: lava_callback: upload raw callback data as well

Debugging callback issues is complex due to the raw data not being saved
after processing. This change ensures we save the callback data as a
JSON file in order to ease development.

Signed-off-by: Arnaud Ferraris <[email protected]>

* DONOTMERGE lava_callback: add debug statements

Why the heck doesn't this just work???

Signed-off-by: Arnaud Ferraris <[email protected]>

* result_summary_templates: fix error 'node' is undefined

The object is named test and not node, so s/node/test

Signed-off-by: Helen Koike <[email protected]>

* config/runtime/kunit: set architecture info

Set architecture field for `kunit` test
nodes.
If no `arch` argument is supplied, kunit takes
`um` (User Mode Linux) as architecture to run
tests.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/timeout: count running child jobs of build nodes

Add a method to count running jobs of `kbuild`
nodes i.e. jobs being submitted after successful
builds. Fox example `baseline` or `tast` jobs.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/timeout: handle closing `checkout` node differently

Usually, `checkout` should be transited to `done` state
when all its child nodes are completed.
In case of closing `checkout`, take into account
running child jobs of build nodes before transiting
its state to `done`. Otherwise, `checkout` will be
assigned to `done` state even if some child jobs are still
running.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/timeout: handle holdoff reached `checkout` node differently

Usually, available `checkout` for which holdoff is
reached should be transited to `done` state only when
all its child nodes are completed.
In case of such `checkout` node, take into account
running child jobs of build nodes before transiting
its state to `done`. Otherwise, `checkout` will be
assigned to `done` state even if some child jobs are
still running.

Signed-off-by: Jeny Sadadia <[email protected]>

* Revert "DONOTMERGE lava_callback: add debug statements"

This reverts commit 5ed8218d99840373bbba5830b1976813b52bf4b1.

Signed-off-by: Arnaud Ferraris <[email protected]>

* Create dependabot.yml

* result_summary_templates: make generic-test-failures generic to all
results

The generic-test-failures templates can be used to show general results
just replacing the name "failures" by "results". Makeing it easier to be
re-used by communities that want to have pre-sets to list all results of
the tests, so:

	s/generic-test-failures/generic-test-results

Signed-off-by: Helen Koike <[email protected]>

* result-summary.yaml: add preset to list android build tests

Since we now build android, add a preset to allow result-summary.yaml to
list all build results from Android tree.

Signed-off-by: Helen Koike <[email protected]>

* tarball: Implement checkout for specific commit

We often need not ToT, but specific commit, implement this.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* jobs-chromeos.yaml: Disable module compression for every kernel version

Commit d4bbe942098b ("kbuild: remove CONFIG_MODULE_COMPRESS"),
introduced in kernel v5.13, substituted CONFIG_MODULE_COMPRESS=n for
CONFIG_MODULE_COMPRESS_NONE=y as the way to disable module compression.
Since module compression causes "Invalid ELF header magic: != ELF"
errors during boot on the ChromeOS base config, add the missing config
to disable module compression on kernels > v5.13 as well.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* src: lava_callback: reduce callback data size

The callback data is quite large, especially as it includes the full log
which we already upload separately. By dropping it and compressing the
whole file with `gzip` we can avoid wasting too much storage space.

Signed-off-by: Arnaud Ferraris <[email protected]>

* src: lava_callback: don't leak secret token

The callback data contains the secret tokens value which shouldn't be
leaked. Ensure we drop it from the uploaded data.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: platforms-chromeos: use new cros-flash image

This ensures we use the new version of the `install-modules` script.

Signed-off-by: Arnaud Ferraris <[email protected]>

* src: regression_tracker: add the "device" field to regression data

This can be helpful. We're not using it as a search param though, as we
don't want to narrow down the search that much, using the platform only
is better.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: result_summary_templates: report device used for job

This information is now available, and it can be useful to know the
affected device withouth having to look at the LAVA job details.

Signed-off-by: Arnaud Ferraris <[email protected]>

* kubernetes: Update deployment recipe

Update list of labs and add KCI_INSTANCE variable.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* lava-callback: Limit threads of lava-callback

Due inrush of lava callbacks and slow Azure Files
processing, we need to make sure we dont spawn too many
threads.
Also add hard limit of memory 1Gbyte

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result_summary presets: add presetes for fluster test

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Make template generic for all v4l2 tests
- Rebase on main

* result_summary presets: make the name of fluster test generic

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: enable first fluster test for mt8195-cherry-tomato-r2

Enable first fluster test, AV1-TEST-VECTORS for mt8195-cherry-tomato-r2.
Run the test on mainline and next until more trees are added.

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Create generic v4l2-decoder-conformance-job and use anchers from it
- Update the rootfs address
- Move anchor to _anchor
- Update with nitpicks

* config: jobs-chromeos: Add kernelci tree for testing purpose

Remove this commit before merging.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: chromeos: Enable cpufreq kselftest

Enable cpufreq kselftest on all the trees and branches.

Signed-off-by: Shreeya Patel <[email protected]>

* result_summary presets: fix preset for kselftest-dt failures monitor

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: new presets for kselftest-cpufreq

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: mt8195-cherry-tomato-r2: enable all fluster tests for all branches

Add all the trees and branches on which the tests would be ran. Enable
all the tests for tomato.

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- The build config cannot be added yet. Just list the trees, it will only use
  the branches configured in build_configs:
  - mainline will use master
  - next will use master
  - collabora-chromeos-kernel will use for-kernelci
  - media will use master and fixes
- Remove kernelci tree as it was added just for testing purpose

* config: mt8183-kukui-jacuzzi-juniper-sku16: enable add all supported fluster tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>

jacuzzi

* config: mt8186-corsola-steelix-sku131072: enable add all supported fluster tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: mt8192-asurada-spherion-r0: enable add all supported fluster tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Don't specify the platforms manually as they are already mentioned in
  test-job-arm64-mediatek

* config: sc7180-trogdor-kingoftown/lazor-limozeen: enable add all supported fluster tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Use test-job-arm64-qualcomm instead and carete separate jobs for
  qualcomm devices
- Don't specify platforms manually as they are already mentioned in
  test-job-arm64-qualcomm

* build(deps): bump uwsgi from 2.0.21 to 2.0.22 in /docker/lava-callback

Bumps [uwsgi](https://uwsgi-docs.readthedocs.io/en/latest/) from 2.0.21 to 2.0.22.

---
updated-dependencies:
- dependency-name: uwsgi
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* pipeline.yaml: Add stable-rc build variants

Add more build variants for stable-rc tree to match legacy system.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result_summary: add error classification

Classify errors according to patterns in the logs

Signed-off-by: Helen Koike <[email protected]>

* result_summary presets: add collabora-chromeos-kernel and media trees for fluster tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: Use media-stage instead of media-tree

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config/pipeline: enable android branches from legacy

Enable all android branches from the legacy system

Signed-off-by: Helen Koike <[email protected]>

* trigger: Add exclude/include tree list for trigger

As we need to restrict list of running kernels on staging,
we need to add option allowing that.
Also it will be good to exclude staging kernels from production
kernel list.

So in case of staging we need to run kernels only from tree "kernelci"
and sometimes something else, for example "mediatek".
Option will look like:

--trees kernelci,mediatek
or
--trees kernelci

On production we need to exclude trees kernelci and buggytree:
--trees !kernelci,buggytree
or just kernelci:
--trees !kernelci

Purpose of this option is that our compiling capacity is limited,
and right now staging and production both compiling very large set
of kernels, we need to reduce this amount to drop costs.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: platforms-chromeos: use CrOS R124 files

ChromeBooks were upgraded with a new image based on ChromiumOS R124, so
we must use those files now.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: jobs-chromeos: drop non-existent Tast tests

Those were removed between R120 and R124 and therefore cause test
failures with the new images.

Signed-off-by: Arnaud Ferraris <[email protected]>

* result_summary presets: fix acpi kselftest presets

We're interested in catching regressions and failures in the both the
kselftest-acpi test suites and its test cases. Match the nodes by group
in the presets accordingly.
Fix template used by the failure monitor preset.

Signed-off-by: Laura Nao <[email protected]>

* src: update return values of `APIHelper.receive_event_node`

`APIHelper.receive_event_node` method is used to receive
node data from PubSub event. The method has been updated
to return `is_hierarchy` flag as well which represents
events related to node hierarchy.
Update pipeline services using the method accordingly.

Signed-off-by: Jeny Sadadia <[email protected]>

* result_summary presets: refine presets for v4l2-decoder-conformance

Modify the regression preset to monitor regressions on both the
v4l2-decoder-conformance test suites and its test cases, by matching the
nodes by group instead of by name.
Also, change the failure preset to monitor for all errors caused by
runtime errors.

Signed-off-by: Laura Nao <[email protected]>

* result_summary presets: add summary presets for v4l2-decoder-conformance

Add summary presets to fetch regressions and failures on
v4l2-decoder-conformance tests. Two of the presets are the same used by
the monitor; add one additional preset to fetch all the failures on
both the test suites and their test cases.

Signed-off-by: Laura Nao <[email protected]>

* lava_callback.py: Remove error_code/error_msg on lava-callback

Sometimes due congestion node might be set to timeout, but
then result might arrive late and we need to use it properly.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result_summary presets: fix dt kselftest presets

Fix the dt kselftest preset, just like was done for the acpi one, as the
current preset doesn't match the actual results we're interested in.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* doc/connecting-lab: refine documentation

Refine documentation for connecting LAVA labs
and submitting jobs to the lab.

Signed-off-by: Jeny Sadadia <[email protected]>

* lava_callback: Sometimes we get totally invalid log file uploaded

Most likely problems lays in threading of flask, and possibly
callbacks are getting mixed. This commit attempts to introduce
several countermeasures against that.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* doc: add `_index.md` page

Add index documentation page.

Signed-off-by: Jeny Sadadia <[email protected]>

* doc: add `pipeline-details` page

Move `pipeline-details` documentation from the API
repository to this repo to make it close to the source.

Signed-off-by: Jeny Sadadia <[email protected]>

* doc/connecting-lab: adjust `weight` property

Change `weight` property of existing doc page to
accommodate with transition of pipeline related docs
to pipeline repo.

Signed-off-by: Jeny Sadadia <[email protected]>

* doc: add `developer-documentation` page

Add developer manual documentation.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add lab config for Qualcomm

Add an entry to `runtimes` section for Qualcomm
lab configurations.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add `baseline-x86` job for qualcomm

Add job configuration `baseline-x86-qualcomm` for
running baseline job in Qualcomm LAVA lab.
Add scheduler entry as well.

Signed-off-by: Jeny Sadadia <[email protected]>

* docker-compose.yaml: add lab-qualcomm runtime

Add runtime argument `lab-qualcomm` to `scheduler-lava`
container. This will enable the pipeline to run and
submit jobs to Qualcomm LAVA lab.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add `baseline-arm64` job for qualcomm

Add job configuration `baseline-arm64-qualcomm` for
running baseline job for `arm64` in Qualcomm LAVA lab.
Add scheduler entry as well.

Signed-off-by: Jeny Sadadia <[email protected]>

* pipeline.yaml: Update RISC-V configs

1)rv32 defconfig doesn't exist, remove
2)nommu_k210_defconfig have modules disabled

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* lava_callback.py: Sanitize lava log data

As we use this data in reports, lets remove all
non-printable characters as they confuse grafana, browsers and others.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config/runtime/kunit.jinja2: fix result map

Fix result map for skipped tests. Initially, API
didn't have `skip` available node result in the schema.
That's why it was mapped to `None` result. But now API
has `skip` result to denote skipped tests.
Fix the result mapping accordingly.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: jobs-chromeos: Add lab-setup fragment

Add the lab-setup fragment to the chromebook builds, which contains the
architecture independent kernel configs needed to run tests on the
platform. Notably this disables IP autoconfig by the kernel.

The result of this change is that the 12 seconds boot delay and the
consequent deferred probe pending warnings will no longer happen on any
platform. Particularly on mt8186-corsola-steelix-sku131072 (due to a
different network adapter being used) on which it was still happening.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* lava_callback: bump up slightly threads number

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: chromeos: enable watchdog reset test on Chromebooks

Add a basic test to verify watchdog reset functionality. Enable the
test on all ARM64 and AMD x86_64 Chromebooks. For Intel
Chromebooks, enable the test only on octopus, as ACPI PM Timer on the
other devices has been disabled in coreboot.

Signed-off-by: Laura Nao <[email protected]>

* src/send_kcidb: use schema version 4.3

Test status `MISS` was added to KCIDB in schema
v4.2 and supported by the latest version i.e. v4.3.
Hence, use the latest version for submission as
API may send a few tests with "MISS" status.

Signed-off-by: Jeny Sadadia <[email protected]>

* send_kcidb: re-structure code for parsing checkout node

Move code for parsing checkout node to a separate
method.
Add `valid` field to parsed checkout node. It denotes
if source code was successfully checked out.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: print more information on invalid data

Print details for invalid revision data for the
sake of debugging.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: optimize `kcidb` import

Remove redundant `kcidb` import and adjust
kcidb Client call accordingly.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: remove keys with `None` values

KCIDB doesn't allow `None` as field value.
Remove all optional fields with `None` value
to make it valid data for submitting to KCIDB.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: add `kcidb_test_suite` property

Every KernelCI test will be mapped to a unified
test suite for KCIDB data submission.
Add `kcidb_test_suite` property to test job
definitions in YAML configuration files.
The added property will store the mapped
KCIDB test suite name.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: parse and submit node test and build data

Listen to all the node events with node state
`done` or `available` and submit the node to KCIDB.
Parse node received from the event and create KCIDB
schema compatible object based on type of the node
i.e. checkout, build or test.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: set `log_excerpt` for builds and tests

Fetch logs from compressed log file(*.log.gz) URL
and send last 16*1024 characters for setting `log_excerpt`
field for build and test nodes as it is the max allowed
length of the KCIDB field.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/jobs-chromes: add kcidb test suite property for watchdog test

Add KCIDB test suite mapping for `watchdog_reset` test.

Signed-off-by: Jeny Sadadia <[email protected]>

* lava_callback.py: disable log removal from callback data

We need it for investigations if we have any critical data
loss during log sanitizing.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* src/send_kcidb: add error info to build nodes

Add error metadata fields such as `error_code` and
`error_msg` to `misc` field for build nodes.

Signed-off-by: Jeny Sadadia <[email protected]>

* result_summary presets: add watchdog-reset presets for mainline/next

Add monitor and summary presets to track the results from the watchdog
reset test on the mainline and next trees.

Signed-off-by: Laura Nao <[email protected]>

* pipeline.yaml: Fix fluster rootfs URL

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* src/send_kcidb: get error metadata for failed/incomplete tests

Tweak condition to get error metadata for test nodes.
It should get error info for incomplete nodes as well
and not just failed nodes.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: send tests only if KCIDB test mapping exists

All test suite definitions must have `kcidb_test_suite`
property i.e. KCIDB test suite mapping.
Only send tests for those the mapping is found.

Signed-off-by: Jeny Sadadia <[email protected]>

* tests/validate_yaml: add validation for KCIDB mapping

To submit KernelCI generated data to KCIDB, it is required
to have a mapping for all the job definition with
`kcidb_test_suite` property.
Add validation to ensure all the jobs have a mapping
present to avoid missing data submission.
This check is to notify test authors trying to enable tests
in maestro to include the required property for the mapping
in their definition.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add qcs6490-rb3gen2 boot test

Signed-off-by: Milosz Wasilewski <[email protected]>

* config: chromeos: Enable kselftest-dt on Qualcomm platforms

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* pipeline.yaml: Add one um build for android trees

As per request of Android team it will be good to check for breakages
UM builds as well.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: use `kind=job` for test suites

As part of re-structuring test hierarachy, `Job` model
has been introduced for test suite/job nodes.
It uses node kind `job`.
Update test configurations in `pipeline.yaml` and
`jobs-chromeos.yaml` to use `kind=job` to
generate job nodes.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/runtime/kunit.jinja2: provide `kind` value for child tests

In case of submitting test hierarchy, child nodes by default
inherit `kind` value from parent node.
As we are re-structuring test hierarchy, test suit/job nodes
will have `kind=job` where its child test nodes will have
`kind=test`. Provide `kind` field explicitly to test result
hierarchy to preserve different kind value than the parent
node.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/runtime/kunit.jinja2: fix `NameError`

Fix the below error in `_submit` method:
```
Traceback (most recent call last):
  File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 287, in main
    job.submit(results)
  File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 138, in submit
    self._submit(result)
  File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 265, in _submit
    return node
NameError: name 'node' is not defined
```

Signed-off-by: Jeny Sadadia <[email protected]>

* config/runtime/kunit.jinja2: evaluate job node result

Evaluate job node result from child node results if
`null` result is receive from test result parser.
For example nodes such as `fortify`:
https://staging.kernelci.org:9000/viewer?node_id=6670ab43d0b7694b399897c4

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: fix parsing of KUnit log file

Handle both compressed(gzip) and plain text log files
for getting log excerpt.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: HTTP exception handling for log excerpt

Add HTTP exception handling for getting
log excerpt data.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: platforms-chromeos: Add serial delay for some Mediatek platforms

Add test_character_delay to the Spherion, Tomato and Steelix platforms
to workaround the fact that they're sometimes unable to process serial
input fast enough, resulting in mangled commands and consequently flaky
test results, as described in
https://github.com/kernelci/kernelci-project/issues/366.

The right place to do this change would be in the device-type template
as described in LAVA's documentation [1]. This overriding in KernelCI is
meant only as a temporary workaround to verify whether this fixes the
issue. If it does, then we'll do it in LAVA upstream instead.

[1] https://docs.lavasoftware.org/lava/debugging.html#differences-in-input-speeds
Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* config: chromeos: Enable error-logs kselftest for MediaTek Chromebooks

Run the error-logs kselftest on MediaTek Chromebooks. This test is
currently under review upstream [1] so, in the meantime, it has been
added to the collabora-next tree so it can prove its value by helping to
detect issues upstream.

[1] https://lore.kernel.org/all/[email protected]

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* config/pipeline.yaml: enable CIP lab

Add configuration for LAVA CIP lab.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add baseline-x86 test for CIP

Add `baseline-x86-cip` test to be submitted to CIP
LAVA lab.

Signed-off-by: Jeny Sadadia <[email protected]>

* docker-compose.yaml: add `lab-cip` runtime

Add runtime argument `lab-cip` to `scheduler-lava`
container. This will enable the pipeline to run and
submit jobs to CIP LAVA lab.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: enable `job` node submission to KCIDB

Parse newly added job node and its child tests
for KCIDB submission.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: don't submit `setup` test suite nodes

`setup` test suite has been introduced to store test results
for environment setup checks before running actual test suite.
KCIDB doesn't require `setup` test suite result as long as
main test job result is submitted.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: add a check before sending data

Check if parsed data is available before
sending revision data to KCIDB.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: fix logs

Fix log statement about submitting node to KCIDB
as we are not sending all the nodes we receive
event for to KCIDB.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: handle skipped tests

Do not retrieve artifacts or metadata from parent
node for skipped tests as in pratice only kernel
revision, test runtime and platform will be
available for skipped tests.

Signed-off-by: Jeny Sadadia <[email protected]>

* result_summary/utils: ignore failures on log retrieval

Make the script continue running if there was an error fetching a test
log.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* doc/developer-documentation: add docs for enabling new tests

Add developer documentation for enabling new tests.

Signed-off-by: Jeny Sadadia <[email protected]>

* Fix links after docs page migration

Documentation has been migrated to the "docs.*" subdomain.

Signed-off-by: Paweł Wieczorek <[email protected]>

* pipeline.yaml: Add kcidebug fragment

Add useful low-overhead debug option to kernel,
and test on most x86 boards we have available,
with minimal baseline tests.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* configs: update gcc-10 to gcc-12

As we upgrade compiler images, we need update gcc version

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* regression_tracker: workaround: match node paths programatically

Don't use 'path' as an api search parameter. The use of lists as query
parameters (path is a list) is undefined. Instead, do the filtering in
code.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: remove qemu jobs from lab-qualcomm

QEMU jobs use container pulled from hub.docker.com. After the lab move
pulling from this registry is no longer possible at Qualcomm. This patch
disables QEMU jobs from Qualcomm lab.

Signed-off-by: Milosz Wasilewski <[email protected]>

* validate_yaml.py: Improve pipeline validation

Add validation that scheduler entries have matching job entry,
this is critical validation, and job entries have at least
one entry in the scheduler.
Fix one entry detected by this validation

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* pipeline.yaml: Add broonie(Mark Brown) trees to pipeline

It is time to enable even more trees.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* validate_yaml.py: Add additional verification for duplicate keys

We might have redefined same keys in different yaml files,
this tool will ensure consistency of this entries.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* validate_yaml.py: Remove path separator

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* validate_yaml.py: Rename variable to schedules

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config/kernelci.toml: update KCIDB origin name

As we agreed to refer new KernelCI API & Pipeline as
"maestro", use the new name while submitting data
to KCIDB.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: update KCI result mapping with KCIDB status

Update evaluation of KCIDB status from KCI result.

Create 2 categories for error codes:
1. When pre-check tests completed but actual test suite
coudln't run - this will have `MISS` status
2. When pre-check tests completed, actual test suite could
run but somehow couldn't complete - this will have `ERROR` status

Some LAVA error codes can occur at any point of execution
such as `Cancelled` and `Test`.
Listed such error codes to the most relevant category
based on analysis of available results.

Signed-off-by: Jeny Sadadia <[email protected]>

* result_summary presets: fix presets for v4l2-decoder-conformance

Following recent updates to data representation on KernelCI nodes,
the top-level nodes for tests now have their kind set to 'job' instead
of  'test'. Update the presets for v4l2-decoder-conformance tests
accordingly.

Signed-off-by: Laura Nao <[email protected]>

* result_summary presets: fix output file name in kselftest-acpi preset

Signed-off-by: Laura Nao <[email protected]>

* config: enable dmabuf-heaps, exec and iommu kselftest suites

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Add kcidb_test_suite

* config: result-summary: add generic rule to monitor failures and regression

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: Add rt-stable builds

Copy rt-stable builds from legacy KernelCI.

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Major changes to move to new way of writing kbuild jobs

* config: pipeline: Add v6.6-rt branch for builds

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: result-summary: add rt-stable kbuilds presets

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: chromeos: Add 'nfs' suffix to KCIDB suite name for baseline-nfs

The baseline test is currently run with both ramdisk and nfs rootfs. To
distinguish baseline-nfs tests in KCIDB, add an 'nfs' suffix to the KCIDB
test suite name.

Signed-off-by: Laura Nao <[email protected]>

* aks: Add kubernetes kcidb deployment

We need file that will manage deployment of kcidb bridge
in kubernetes production deployment.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* kubernetes: Adjust trigger k8s options

Ignore kernelci tree on production, as it is special
"staging"-only tree, and read all /config directory, not just default
pipeline.yaml.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* regression_tracker: bugfix: catch empty search condition

Fix _get_last_matching_node(), after the previous change there was an
unhandled scenario where nodes may be empty but the function wouldn't
return None immediately.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: pipeline: correct the kind of kselftest suites to job

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* scheduler-chromeos.yaml: Temporarily disable non-essential tast tests

As per discussion, we disable temporary tast tests which unlikely
will be reviewed.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* k8s/aks: Update deployment files

1)Update memory limit, as working with linux sources might require 3Gbyte of RAM.
2)Update config file path
3)Add callback environment variable
4)Update image reference to fresh one

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: pipeline: enable android builds with gcc-12 for all architectures

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: enable android builds with clang-17 for all architectures

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: remove build_variants from android build_configs

The build_variants is legacy way to specify the different variants. We
have moved to the newer way to specify the variants. Hence remove the
build_variants from android build_configs.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: add android15-6.6-lts branch for build as well

The android15-6.6-lts has been included recently in legacy KernelCI:
https://github.com/kernelci/kernelci-core/pull/2597

Add the same in newer KernelCI.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: add blocklist for riscv older kernels for android builds

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: update KCIDB test suite mapping for baseline

Use `boot` as KCIDB test suite mapping for all
baseline tests.

Signed-off-by: Jeny Sadadia <[email protected]>

* callback_url: Update config and README

As we are moving callback URL to environment variable,
updating config and README accordingly.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: pipeline: enable android baseline (boot) testing for arm and arm64 in only allmodconfig

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* scheduler.py: If event have jobfilter, inject it to the node data

When someone generate artificial event with jobfilter, this is
likely maintainer trying to repeat job. Treat this accordingly,
and inject job filter to job node, so we will run only tests
maintainer wants.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* lava_callback: migrate to fastapi

It will be easier to maintain API and Pipeline, as
both will be powered by FastAPI framework.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: chromeos: Update fluster rootfs URL

Signed-off-by: Laura Nao <[email protected]>

* config: pipeline: fix defconfigs in fragments

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* kbuild.jinja2: support defconfig as list or str

As required in https://github.com/kernelci/kernelci-core/pull/2608
defconfig might be two types. Support it in jinja2 accordingly.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: piepline: add kbuilds of lee-mfd with default defconfigs

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: enable baseline testing for mfd for one board of each arch

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: fix platform sections for Qualcomm and Android schedules

Signed-off-by: Paweł Wieczorek <[email protected]>

* k8s: Update deployment to uvicorn, as we use fastapi now

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: pipeline: Unblock android runs on lava-collabora

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* pipeline: Enable preempt-rt cyclictest test

Enable the first preempt-rt test, cyclictest in new KernelCI. Enable it
on all platforms.

Since these are all smoke test there is no point in running them too
long. Thus reduce the runtime per test to one minute. This should keep
the total preempt-rt runtime roughly in the same time frame.

The changes have been ported from Daniel's PR [1].

[1] https://github.com/kernelci/kernelci-core/pull/2397

Signed-off-by: Daniel Wagner <[email protected]>
Co-developed-by: Muhammad Usama Anjum <[email protected]>
Signed-off-by: Muhammad Usama Anjum <[email protected]>

* pipeline: add all the test jobs for all rt-test

Add jobs definition of all the rt-tests. Enable cyclicdeadline and rtla
tests to run on all targets.

The changes have been ported from Daniel's PR [1].

[1] https://github.com/kernelci/kernelci-core/pull/2397

Signed-off-by: Daniel Wagner <[email protected]>
Co-developed-by: Muhammad Usama Anjum <[email protected]>
Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: add template and test properties for preempt_rt jobs

Add template, job add kcidb_test_suite properties for all preempt-rt jobs

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: rename preempt-rt to rt-tests which is correct name of tests

The legacy was using preempt-rt name of tests. But the repository has
rt-tests name. We must use the same name to merge with execution results
coming from other CIs in KCIDB.

Suggested-by: Jeny Sadadia <[email protected]>
Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: add the correct nfsroot for rt-tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: Remove android's deprecated branches

It has been confirmed with Todd that we should remove the deprecated
branches. Hence remove those branches.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: run baseline on non-allmodconfig

The allmodconfig generates very large kernel image. It cannot be booted
on the arm64 and arm targets as tftp errors out that size is too large.
Reduce the kernel image size. Use the default defconfig. The same
defconfigs have been booting for other trees.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* doc: developer-documentation: Update documentation by adding more details

- Reorganize some things
- Specify how to write different variants by removing old syntax
- Give two separate templates for kbuild and test
- Try to put more details for new contributors

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes since v1:
- Fix type
- Apply suggestions from code review

* doc/developer-documentation: fix a glitch in enabling new tree section

Fix a minor bug in YAML block formatting.

Fixes: f5f57de ("doc: developer-documentation: Update documentation by adding more details")
Signed-off-by: Jeny Sadadia <[email protected]>

* doc/developer-documentation: update a section title

Rename a section from "Enabling a new Kernel tree" to
"Enabling new KernelCI trees, builds, and tests" as it explains
enabling tests as well.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: use the new `tree:branch` format for rules

For cases where we want a single branch to be allowed for a given tree,
we can now use the `tree:branch` format in rules. Convert existing rules
accordingly.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: pipeline: fix improper use of "filters" attribute

The `filters` param was used in the legacy system but has been replaced
by `rules`, with a different syntax.

For Android RISC-V builds, this was used to deny job execution on
kernels < 4.19, so let's translate this condition with the rules format,
and do a similar change for the `rt-tests`-based jobs.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config/pipeline.yaml: Fix x86 typo in kcidebug job names

The kcidebug jobs that run on MediaTek and Qualcomm platforms should
have arm64 in the name rather than x86. Fix the typo.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* config: pipeline: remove params

The parameters are only needed when they are changed or appeneded.
Remvoe the parameters which aren't being modified.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* validate_yaml.py: Jobs are required to have template parameter

Add more validation to config files of mandatory parameters.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* validate_yaml.py: Add more job validations

Add basic validation, each job must have kind parameter

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* workflows: Add label on CI check failures

Automatically add label so broken PR wont go to staging

Signed-off-by: Denys Fedoryshchenko <[email protected]>

---------

Signed-off-by: Jeny Sadadia <[email protected]>
Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
Signed-off-by: Denys Fedoryshchenko <[email protected]>
Signed-off-by: Ricardo Cañuelo <[email protected]>
Signed-off-by: Helen Koike <[email protected]>
Signed-off-by: Arnaud Ferraris <[email protected]>
Signed-off-by: Laura Nao <[email protected]>
Signed-off-by: Muhammad Usama Anjum <[email protected]>
Signed-off-by: Shreeya Patel <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Milosz Wasilewski <[email protected]>
Signed-off-by: Paweł Wieczorek <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
Co-authored-by: Jeny Sadadia <[email protected]>
Co-authored-by: Nícolas F. R. A. Prado <[email protected]>
Co-authored-by: Ricardo Cañuelo <[email protected]>
Co-authored-by: Helen Koike <[email protected]>
Co-authored-by: Arnaud Ferraris <[email protected]>
Co-authored-by: Laura Nao <[email protected]>
Co-authored-by: Muhammad Usama Anjum <[email protected]>
Co-authored-by: Shreeya Patel <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Milosz Wasilewski <[email protected]>
Co-authored-by: Paweł Wieczorek <[email protected]>
Co-authored-by: Milosz Wasilewski <[email protected]>
Co-authored-by: Daniel Wagner <[email protected]>
Signed-off-by: Denys Fedoryshchenko <[email protected]>
nuclearcat added a commit to nuclearcat/kernelci-pipeline that referenced this issue Jul 24, 2024
* src/scheduler: store error message when job fails with "submit_error"

It is helpful for debugging to catch error message when
scheduler fails to submit job to runtime.
Store the error message to `data.error_msg` field.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: pipeline: Set minimum kernel version for DT kselftest to 6.7

The test was introduced upstream in version 6.7, so no point in trying
to run it on earlier versions.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* configs/: Update volteer device

Update volteer devices according lab availability

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result_summary templates: detailed output for active/inactive regressions

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: new presets for active regressions

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: update CHANGELOG

Signed-off-by: Ricardo Cañuelo <[email protected]>

* data: chmod -R 777 ./data/output to avoid permission error

Avoid errors like

PermissionError: [Errno 13] Permission denied: '/home/kernelci/data/output/stable-rc-boot.html'

Signed-off-by: Helen Koike <[email protected]>

* result_summary: move code to _get_logs

Signed-off-by: Helen Koike <[email protected]>

* result_summary: use ThreadPoolExecutor to fetch logs

Fetching logs is the bottleneck of the script. Fetch them in parallel
with ThreadPoolExecutor.

Signed-off-by: Helen Koike <[email protected]>

* result_summary: fix result presets

stable-rc-build-failures and stable-rc-boot-failures weren't querying
specifically for test failures.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* src/regression_tracker: rework regression detection

Take into account "active" and "inactive" regressions when creating them
and when processing new passed or failed nodes.

When a node passes, it checks if it "inactivates" an existing "active"
regression. When a node fails, it checks if it needs to create a new
regression or update an existing "active" one.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* src/regression_tracker: link failed nodes to active regressions

When a failed node generates a regression, or when it's a re-run of a
run that generated a still active regression, link the node to the
regression id.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: support for date ranges for creation and update

New command line options to let the user specify date ranges for node
creation and last update: --created-from, --created-to,
--last-updated-from, --last-updated-to

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: support for date ranges for creation and last update

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: support for extra query parameters in cmdline

New command line option: --query-params to specify a set of extra query
parameters to complete or override preset parameters.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: html markup in some preset titles

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary changelog: update and move to docs folder

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: move parameter loading and processing to 'setup'

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: refactor and split into two clases (single, run)

Split the ResultSummary class into a base class and two child classes:
ResultSummarySingle and ResultSummaryLoop (only a stub at this point).

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: WIP initial implementation of the "loop" command

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: huge refactoring

Implement "summary" (single-shot) and "monitor" (loop) modes based on
preset parameters instead of on the command-line main command.

Split the logic into multiple files, move all monitor-specific and
summary-specific code to independent files, common code in a separate
file.

Full of kludges, I don't like how this is looking so far, might consider
reimplementing it without any dependencies on pipeline code.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: fix markup and indentation

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: new generic templates for monitor mode

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: examples for "monitor" and "summary" modes

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary changelog: summary and monitor modes

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: fix generic regression report

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: summary: fix last_updated option handling

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: embed css stylesheet in html files

Signed-off-by: Ricardo Cañuelo <[email protected]>

* regression_tracker: [trivial] make regression active by default

Fixup for commit fcb29501663d78920bcd129bd57c36b9af624bc4
If the "result" field is ever made non-optional in the models we can
probably remove this.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* regression_tracker: [trivial] set default empty node sequence

Fixup for commit fcb29501663d78920bcd129bd57c36b9af624bc4
If the "node_sequence" field is ever made non-optional in the models we
can probably remove this.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: add cmdline option --output-dir

Introduce a new command-line option: --output-dir, and rename the old
--output to --output-file.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary changelog: command-line options change

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: jobs-chromeos: remove meaningless Tast tests

Several Tast tests can only fail in the context of KernelCI:
* `video.PlatformDecoding.v4l2_state*_vp9_0_svc` do not actually exist,
  causing the whole test job to fail
* `platform.DLCService*` and `platform.Memd` rely on features only
  present in the downstream Chrom{e,ium}OS kernel (see b/247467814 and
  b/244479619 for those having access to Google's issue tracker)
* `kernel.ConfigVerify.chromeos` relies on downstream-only config
  options such as `CONFIG_SECURITY_CHROMIUMOS` and other similar ones,
  and therefore can only fail when testing upstream kernels

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: scheduler-chromeos: don't execute non-working Tast tests

Currently, HEVC-related tests are known to either fail or be skipped as
ChromeOS doesn't yet handle hardware decoding of HEVC media. This is
expected to be fixed at some point though, so we're keeping the job
definitions and only remove the corresponding scheduler entries in order
to reinstate those jobs when relevant.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: jobs-chromeos: exclude Tast tests known to always fail

Several decoder tests always fail on all platforms where they're
executed, adding only noise to otherwise useful test results. Disable
those for improving the quality of the results.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: chromeos: add special case for pre-6.7 qcom codec tests

On Qualcomm-based ChromeBooks (`trogdor` being the only model in
Collabora's lab), we noticed systematic failures of all
`vp9_*_frm_resize` and `vp9_*_sub8x8_sf` tests when using a kernel up to
6.6. With 6.7 and above, all of those tests (except one) now pass. It
therefore makes sense to exclude those on pre-6.7 kernels so we don't
report known failures and get rid of some noise.

This involves "duplicating" affected test jobs (although I did my best
to minimize that) and setting rules so only the working variant is
executed, based on the version of the kernel being tested.

Signed-off-by: Arnaud Ferraris <[email protected]>

* lava_callback: Compress the log files to save storage space

As storage space in cloud and egress have high costs,
better to compress potentially large files.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* tests: Add basic yaml validation

Add yaml load to figure out earlier issues with yaml

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: chromeos: drop stoneyridge/pineview naming in platforms anchors

The "stoneyridge" and "pineview" naming used in the Chromebook platform
anchors refers to ChromiumOS specific config fragments, but doesn't
necessarily match the actual platform of all the devices listed.
Use more generic names to distinguish amd and intel Chromebooks.

Signed-off-by: Laura Nao <[email protected]>

* config: chromeos: rename test job anchors that use chromeos specific configs

Rename test job anchors that use chromeos specific kernel configurations
to include the 'chromeos' infix.

Signed-off-by: Laura Nao <[email protected]>

* config: chromeos: add baseline tests

Enable the baseline tests on all the supported Chromebooks with their
default kernel configuration.

Signed-off-by: Laura Nao <[email protected]>

* config: chromeos: drop stoneyridge/pineview naming in job defs

The "stoneyridge" and "pineview" naming used in some Chromebook job
definitions refers to ChromiumOS specific config fragments, but
doesn't necessarily match the actual platforms targeted by the jobs.
Replace all occurrences with more generic intel/amd naming.

Signed-off-by: Laura Nao <[email protected]>

* config: chromeos: drop chromeos infix from baseline jobs

Keeping different job names for tests targeting different kernel configs
might cause too much duplication. Drop the 'chromeos' infix from the job
name for the tests using the chromeos config fragment. Users will be
able to filter the results using the data.defconfig/data.config_full
fields anyway.

Signed-off-by: Laura Nao <[email protected]>

* result_summary: post-process results for summary and monitor modes

Split the post-processing of nodes to a common function that can be used
for both summary and monitor modes. Currently, post-processing involves
only the collection of logs.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: update and fix presets and templates

Signed-off-by: Ricardo Cañuelo <[email protected]>

* doc/result-summary-CHANGELOG: update

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config/pipeline.yaml: enable 'BayLibre' lab

Add lab configuration for BayLibre.

Signed-off-by: Jeny Sadadia <[email protected]>

* docker-compose.yaml: add `lab-baylibre` runtime

Add runtime argument `lab-baylibre` to `scheduler-lava`
container. This will enable the pipeline to run and
submit jobs to BayLibre.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add `baseline-x86-baylibre` job

Add job configuration `baseline-x86-baylibre` for BayLibre.
Add scheduler entry as well.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add `baseline-armel-baylibre` job

Add job configuration `baseline-armel-baylibre` for BayLibre.
Add scheduler entry and platform config as well.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline: enable `android` tree and build configs

Monitor linux `android` tree. Add build configs for `android-mainline`
branch.

Signed-off-by: Helen Koike <[email protected]>

* config/pipeline.yaml: add kbuild definitions for android-mainline

Add kbuild jobs to compile the kernel for android-mainline branch

Signed-off-by: Helen Koike <[email protected]>

* config/pipeline.yaml: add entries to schedule to build android-mainline

Add entries to `scheduler:` section to run the builds for
android-mainline.

Signed-off-by: Helen Koike <[email protected]>

* result_summary: fix node filter in monitor mode

Signed-off-by: Ricardo Cañuelo <[email protected]>

* kernelci.toml: set `checkout` node timeout to `180 min`

Currently set `60 min` timeout is not enough as some
`kbuild` jobs and its sub-tests take around 2 hrs to
complete after getting submitted to runtime.

Here is an example from staging. See the information
for a `checkout` and its child nodes:

| id                       | name                | created                    | updated                    | timeout                    |
|--------------------------|---------------------|----------------------------|----------------------------|----------------------------|
| 661c9d59b60b785eb9fc42b0 | checkout            | 2024-04-15T03:22:01.317000 | 2024-04-15T03:51:03.870000 | 2024-04-15T04:22:01.284000 |
| 661c9d97b60b785eb9fc42b4 | kbuild-gcc-10-arm64 | 2024-04-15T03:23:03.399000 | 2024-04-15T03:50:15.031000 | 2024-04-15T09:23:03.399000 |
| 661ca3f7b60b785eb9fc4ead | baseline-arm64      | 2024-04-15T03:50:15.304000 | 2024-04-15T05:09:45.247000 | 2024-04-15T09:50:15.304000 |

Signed-off-by: Jeny Sadadia <[email protected]>

* result_summary: add email report capabilities for monitor mode

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: plain text single report templates

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: chromeos: add baseline-nfs tests

Enable the baseline-nfs tests on all the supported Chromebooks, with
both the default and the chromeos kernel configurations.

Signed-off-by: Laura Nao <[email protected]>

* src/timeout: set `checkout` result

For `TIMEOUT` mode, set `checkout` node result to `fail`
if its state is `running` as it means code checkout is still
going on and node timed-out. Set it to `pass` if its state
is any other than `running`.
Set `checkout` node result to `pass` if mode is `DONE` as
it means once `checkout` has been in `available` or `closing`
state and it could successfully complete source code checkout.

Signed-off-by: Jeny Sadadia <[email protected]>

* regression_tracker: bugfix, failed test with no prior runs

Handle the case of a failed test run when it's the first occurence of
that test case. Consider it "not a regression" for now, since we're
defining a regression as a "breaking point" between a success and a
failure.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: platforms-chromeos: fix dalboz device type

Due due to a copy/paste mishap, the device type for
`asus-CM1400CXA-dalboz` had a trailing `_chromeos`, leading LAVA to fail
finding the correct device type, and no job from the new system running
on this platform.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: jobs-chromes: run Tast tests only on 5.4+

Current ChromeOS images have `ext4` filesystems using options not
present in 4.19. Therefore tests cannot run on kernels that old, and
this leads to false positives in corrupt device identification, so we
should only run those tests on 5.4 and later kernels.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: platforms-chromes: drop non-existent platform

`hp-x360-12b-ca0500na-n4000-octopus` isn't a device type available in
Collabora's LAVA lab, so let's drop its definition.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: exclude android tree from kbuild jobs

Only Android-specific kbuild jobs should run for this tree, let's not
overload our system with unneeded builds.

Take this opportunity to limit mediatek kbuilds to 6.1+ as that's the
earliest version that has upstream support for at least one of our
devices.

Signed-off-by: Arnaud Ferraris <[email protected]>

* src/timeout: a bug fix in `_submit_lapsed_nodes`

Fix a glitch in the code related to setting `checkout`
node result.

Fixes: 361fc0d ("src/timeout: set `checkout` result")
Signed-off-by: Jeny Sadadia <[email protected]>

* pipeline.yaml: Update early access FQDN

We are moving k8s from eastus to westus3 as it is cheaper

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* src/tarball: fix `_kdir` in `update_repo`

Fix the below error:
```
kernelci-pipeline-tarball |   File "/home/kernelci/./pipeline/tarball.py", line 79, in _update_repo
kernelci-pipeline-tarball |     kernelci.shell_cmd(f"rm -rf {self._kdir}")
kernelci-pipeline-tarball |                                  ^^^^^^^^^^
kernelci-pipeline-tarball | AttributeError: 'Tarball' object has no attribute '_kdir'
```

Fixes: 0a2fe9c ("src/patchset.py: Implement Patchset service)
Signed-off-by: Jeny Sadadia <[email protected]>

* src/timeout: fix method to get child nodes recursively

`TimeoutService._get_child_nodes_recursive` is used to get
pending child nodes recursively for closing and timed-out
nodes. It overwrites the result while being called recursively.
Fix the method to make it work properly.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: pipeline: rename "armel" arch to "arm"

`armel` has various meanings depending on the system: for ChromeOS, it
is ARMv7, while in Debian it's ARMv{5T,6}. Moreover, this project is
*Kernel*CI and the kernel uses `arm` for all 32-bits ARM devices. In
order to avoid confusion (including those wondering what the heck does
`armel` mean), let's rename `armel` to `arm`.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: use per-system arch property where relevant

With the new `*arch` fields present in the platform configurations, we
don't have to hardcode the architecture strings in some specific cases.
Let's adapt the config files so we use `{cros,deb,k}arch` wherever it
makes sense.

Signed-off-by: Arnaud Ferraris <[email protected]>

* src/timeout: set timed-out `checkout` result

Set timed-out `checkout` node result to `incomplete`
while in `running` state. As it denotes that the node
timed-out while checkout was still going on.
Also, set error related information i.e. `error_code`
and `error_msg`.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/tarball: update checkout node when update repo fails

Tarball updates source code repo and creates tarball.
If update repo operation fails even with second attempt,
it means it failed to checkout souce code.
Hence, update `checkout` node with state `done` state and
result `fail`. Also, set appropriate error information
to the `data` field.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: pipeline: enable collabora-next tree and build config

Monitor the collabora-next tree. Add build config for the for-kernelci
branch.

Signed-off-by: Laura Nao <[email protected]>

* config: chromeos: enable acpi kselftest on collabora-next tree

Run the ACPI kselftest on the for-kernelci branch of the collabora-next
tree.

See: https://lore.kernel.org/linux-kselftest/[email protected]/T/#t

Signed-off-by: Laura Nao <[email protected]>

* result_summary: restore missing split_query_params function

Restore this function that was accidentally removed during the last
refactoring.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* lava_callback: Don't upload empty files to Azure

There is no use for lot of empty files on Azure,
that only complicate cleanup.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result_summary presets: unify preset and output names

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: update preset for aferraris

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: new presets for laura.nao

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: fixes and new presets for nfraprado

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: fix arch query parameters

Signed-off-by: Ricardo Cañuelo <[email protected]>

* k8s: Lot of deployment tested fixes

Fixes in yaml files for k8s production deployment.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result-summary presets: Fix build failure and regression monitors

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* result_summary: added debug traces to the monitor

Show detailed info of the node filterings in real time.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary: fix corner case bug when no logs are found

Cover rare case where neither the node nor any of its parents up to the
checkout node have any log artifacts.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: refine stable-rc presets

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: add regression info to test reports

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary templates: escape log snippets

Signed-off-by: Ricardo Cañuelo <[email protected]>

* src: lava_callback: add device ID to node data

It can be useful to know the exact device on which a job ran, without
having to open the LAVA job page. This is done by querying the device ID
from the callback data and appending it to the node data.

Signed-off-by: Arnaud Ferraris <[email protected]>

* src: lava_callback: upload raw callback data as well

Debugging callback issues is complex due to the raw data not being saved
after processing. This change ensures we save the callback data as a
JSON file in order to ease development.

Signed-off-by: Arnaud Ferraris <[email protected]>

* DONOTMERGE lava_callback: add debug statements

Why the heck doesn't this just work???

Signed-off-by: Arnaud Ferraris <[email protected]>

* result_summary_templates: fix error 'node' is undefined

The object is named test and not node, so s/node/test

Signed-off-by: Helen Koike <[email protected]>

* config/runtime/kunit: set architecture info

Set architecture field for `kunit` test
nodes.
If no `arch` argument is supplied, kunit takes
`um` (User Mode Linux) as architecture to run
tests.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/timeout: count running child jobs of build nodes

Add a method to count running jobs of `kbuild`
nodes i.e. jobs being submitted after successful
builds. Fox example `baseline` or `tast` jobs.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/timeout: handle closing `checkout` node differently

Usually, `checkout` should be transited to `done` state
when all its child nodes are completed.
In case of closing `checkout`, take into account
running child jobs of build nodes before transiting
its state to `done`. Otherwise, `checkout` will be
assigned to `done` state even if some child jobs are still
running.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/timeout: handle holdoff reached `checkout` node differently

Usually, available `checkout` for which holdoff is
reached should be transited to `done` state only when
all its child nodes are completed.
In case of such `checkout` node, take into account
running child jobs of build nodes before transiting
its state to `done`. Otherwise, `checkout` will be
assigned to `done` state even if some child jobs are
still running.

Signed-off-by: Jeny Sadadia <[email protected]>

* Revert "DONOTMERGE lava_callback: add debug statements"

This reverts commit 5ed8218d99840373bbba5830b1976813b52bf4b1.

Signed-off-by: Arnaud Ferraris <[email protected]>

* Create dependabot.yml

* result_summary_templates: make generic-test-failures generic to all
results

The generic-test-failures templates can be used to show general results
just replacing the name "failures" by "results". Makeing it easier to be
re-used by communities that want to have pre-sets to list all results of
the tests, so:

	s/generic-test-failures/generic-test-results

Signed-off-by: Helen Koike <[email protected]>

* result-summary.yaml: add preset to list android build tests

Since we now build android, add a preset to allow result-summary.yaml to
list all build results from Android tree.

Signed-off-by: Helen Koike <[email protected]>

* tarball: Implement checkout for specific commit

We often need not ToT, but specific commit, implement this.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* jobs-chromeos.yaml: Disable module compression for every kernel version

Commit d4bbe942098b ("kbuild: remove CONFIG_MODULE_COMPRESS"),
introduced in kernel v5.13, substituted CONFIG_MODULE_COMPRESS=n for
CONFIG_MODULE_COMPRESS_NONE=y as the way to disable module compression.
Since module compression causes "Invalid ELF header magic: != ELF"
errors during boot on the ChromeOS base config, add the missing config
to disable module compression on kernels > v5.13 as well.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* src: lava_callback: reduce callback data size

The callback data is quite large, especially as it includes the full log
which we already upload separately. By dropping it and compressing the
whole file with `gzip` we can avoid wasting too much storage space.

Signed-off-by: Arnaud Ferraris <[email protected]>

* src: lava_callback: don't leak secret token

The callback data contains the secret tokens value which shouldn't be
leaked. Ensure we drop it from the uploaded data.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: platforms-chromeos: use new cros-flash image

This ensures we use the new version of the `install-modules` script.

Signed-off-by: Arnaud Ferraris <[email protected]>

* src: regression_tracker: add the "device" field to regression data

This can be helpful. We're not using it as a search param though, as we
don't want to narrow down the search that much, using the platform only
is better.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: result_summary_templates: report device used for job

This information is now available, and it can be useful to know the
affected device withouth having to look at the LAVA job details.

Signed-off-by: Arnaud Ferraris <[email protected]>

* kubernetes: Update deployment recipe

Update list of labs and add KCI_INSTANCE variable.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* lava-callback: Limit threads of lava-callback

Due inrush of lava callbacks and slow Azure Files
processing, we need to make sure we dont spawn too many
threads.
Also add hard limit of memory 1Gbyte

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result_summary presets: add presetes for fluster test

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Make template generic for all v4l2 tests
- Rebase on main

* result_summary presets: make the name of fluster test generic

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: enable first fluster test for mt8195-cherry-tomato-r2

Enable first fluster test, AV1-TEST-VECTORS for mt8195-cherry-tomato-r2.
Run the test on mainline and next until more trees are added.

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Create generic v4l2-decoder-conformance-job and use anchers from it
- Update the rootfs address
- Move anchor to _anchor
- Update with nitpicks

* config: jobs-chromeos: Add kernelci tree for testing purpose

Remove this commit before merging.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: chromeos: Enable cpufreq kselftest

Enable cpufreq kselftest on all the trees and branches.

Signed-off-by: Shreeya Patel <[email protected]>

* result_summary presets: fix preset for kselftest-dt failures monitor

Signed-off-by: Ricardo Cañuelo <[email protected]>

* result_summary presets: new presets for kselftest-cpufreq

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: mt8195-cherry-tomato-r2: enable all fluster tests for all branches

Add all the trees and branches on which the tests would be ran. Enable
all the tests for tomato.

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- The build config cannot be added yet. Just list the trees, it will only use
  the branches configured in build_configs:
  - mainline will use master
  - next will use master
  - collabora-chromeos-kernel will use for-kernelci
  - media will use master and fixes
- Remove kernelci tree as it was added just for testing purpose

* config: mt8183-kukui-jacuzzi-juniper-sku16: enable add all supported fluster tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>

jacuzzi

* config: mt8186-corsola-steelix-sku131072: enable add all supported fluster tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: mt8192-asurada-spherion-r0: enable add all supported fluster tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Don't specify the platforms manually as they are already mentioned in
  test-job-arm64-mediatek

* config: sc7180-trogdor-kingoftown/lazor-limozeen: enable add all supported fluster tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Use test-job-arm64-qualcomm instead and carete separate jobs for
  qualcomm devices
- Don't specify platforms manually as they are already mentioned in
  test-job-arm64-qualcomm

* build(deps): bump uwsgi from 2.0.21 to 2.0.22 in /docker/lava-callback

Bumps [uwsgi](https://uwsgi-docs.readthedocs.io/en/latest/) from 2.0.21 to 2.0.22.

---
updated-dependencies:
- dependency-name: uwsgi
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* pipeline.yaml: Add stable-rc build variants

Add more build variants for stable-rc tree to match legacy system.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result_summary: add error classification

Classify errors according to patterns in the logs

Signed-off-by: Helen Koike <[email protected]>

* result_summary presets: add collabora-chromeos-kernel and media trees for fluster tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: Use media-stage instead of media-tree

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config/pipeline: enable android branches from legacy

Enable all android branches from the legacy system

Signed-off-by: Helen Koike <[email protected]>

* trigger: Add exclude/include tree list for trigger

As we need to restrict list of running kernels on staging,
we need to add option allowing that.
Also it will be good to exclude staging kernels from production
kernel list.

So in case of staging we need to run kernels only from tree "kernelci"
and sometimes something else, for example "mediatek".
Option will look like:

--trees kernelci,mediatek
or
--trees kernelci

On production we need to exclude trees kernelci and buggytree:
--trees !kernelci,buggytree
or just kernelci:
--trees !kernelci

Purpose of this option is that our compiling capacity is limited,
and right now staging and production both compiling very large set
of kernels, we need to reduce this amount to drop costs.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: platforms-chromeos: use CrOS R124 files

ChromeBooks were upgraded with a new image based on ChromiumOS R124, so
we must use those files now.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: jobs-chromeos: drop non-existent Tast tests

Those were removed between R120 and R124 and therefore cause test
failures with the new images.

Signed-off-by: Arnaud Ferraris <[email protected]>

* result_summary presets: fix acpi kselftest presets

We're interested in catching regressions and failures in the both the
kselftest-acpi test suites and its test cases. Match the nodes by group
in the presets accordingly.
Fix template used by the failure monitor preset.

Signed-off-by: Laura Nao <[email protected]>

* src: update return values of `APIHelper.receive_event_node`

`APIHelper.receive_event_node` method is used to receive
node data from PubSub event. The method has been updated
to return `is_hierarchy` flag as well which represents
events related to node hierarchy.
Update pipeline services using the method accordingly.

Signed-off-by: Jeny Sadadia <[email protected]>

* result_summary presets: refine presets for v4l2-decoder-conformance

Modify the regression preset to monitor regressions on both the
v4l2-decoder-conformance test suites and its test cases, by matching the
nodes by group instead of by name.
Also, change the failure preset to monitor for all errors caused by
runtime errors.

Signed-off-by: Laura Nao <[email protected]>

* result_summary presets: add summary presets for v4l2-decoder-conformance

Add summary presets to fetch regressions and failures on
v4l2-decoder-conformance tests. Two of the presets are the same used by
the monitor; add one additional preset to fetch all the failures on
both the test suites and their test cases.

Signed-off-by: Laura Nao <[email protected]>

* lava_callback.py: Remove error_code/error_msg on lava-callback

Sometimes due congestion node might be set to timeout, but
then result might arrive late and we need to use it properly.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* result_summary presets: fix dt kselftest presets

Fix the dt kselftest preset, just like was done for the acpi one, as the
current preset doesn't match the actual results we're interested in.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* doc/connecting-lab: refine documentation

Refine documentation for connecting LAVA labs
and submitting jobs to the lab.

Signed-off-by: Jeny Sadadia <[email protected]>

* lava_callback: Sometimes we get totally invalid log file uploaded

Most likely problems lays in threading of flask, and possibly
callbacks are getting mixed. This commit attempts to introduce
several countermeasures against that.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* doc: add `_index.md` page

Add index documentation page.

Signed-off-by: Jeny Sadadia <[email protected]>

* doc: add `pipeline-details` page

Move `pipeline-details` documentation from the API
repository to this repo to make it close to the source.

Signed-off-by: Jeny Sadadia <[email protected]>

* doc/connecting-lab: adjust `weight` property

Change `weight` property of existing doc page to
accommodate with transition of pipeline related docs
to pipeline repo.

Signed-off-by: Jeny Sadadia <[email protected]>

* doc: add `developer-documentation` page

Add developer manual documentation.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add lab config for Qualcomm

Add an entry to `runtimes` section for Qualcomm
lab configurations.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add `baseline-x86` job for qualcomm

Add job configuration `baseline-x86-qualcomm` for
running baseline job in Qualcomm LAVA lab.
Add scheduler entry as well.

Signed-off-by: Jeny Sadadia <[email protected]>

* docker-compose.yaml: add lab-qualcomm runtime

Add runtime argument `lab-qualcomm` to `scheduler-lava`
container. This will enable the pipeline to run and
submit jobs to Qualcomm LAVA lab.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add `baseline-arm64` job for qualcomm

Add job configuration `baseline-arm64-qualcomm` for
running baseline job for `arm64` in Qualcomm LAVA lab.
Add scheduler entry as well.

Signed-off-by: Jeny Sadadia <[email protected]>

* pipeline.yaml: Update RISC-V configs

1)rv32 defconfig doesn't exist, remove
2)nommu_k210_defconfig have modules disabled

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* lava_callback.py: Sanitize lava log data

As we use this data in reports, lets remove all
non-printable characters as they confuse grafana, browsers and others.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config/runtime/kunit.jinja2: fix result map

Fix result map for skipped tests. Initially, API
didn't have `skip` available node result in the schema.
That's why it was mapped to `None` result. But now API
has `skip` result to denote skipped tests.
Fix the result mapping accordingly.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: jobs-chromeos: Add lab-setup fragment

Add the lab-setup fragment to the chromebook builds, which contains the
architecture independent kernel configs needed to run tests on the
platform. Notably this disables IP autoconfig by the kernel.

The result of this change is that the 12 seconds boot delay and the
consequent deferred probe pending warnings will no longer happen on any
platform. Particularly on mt8186-corsola-steelix-sku131072 (due to a
different network adapter being used) on which it was still happening.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* lava_callback: bump up slightly threads number

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: chromeos: enable watchdog reset test on Chromebooks

Add a basic test to verify watchdog reset functionality. Enable the
test on all ARM64 and AMD x86_64 Chromebooks. For Intel
Chromebooks, enable the test only on octopus, as ACPI PM Timer on the
other devices has been disabled in coreboot.

Signed-off-by: Laura Nao <[email protected]>

* src/send_kcidb: use schema version 4.3

Test status `MISS` was added to KCIDB in schema
v4.2 and supported by the latest version i.e. v4.3.
Hence, use the latest version for submission as
API may send a few tests with "MISS" status.

Signed-off-by: Jeny Sadadia <[email protected]>

* send_kcidb: re-structure code for parsing checkout node

Move code for parsing checkout node to a separate
method.
Add `valid` field to parsed checkout node. It denotes
if source code was successfully checked out.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: print more information on invalid data

Print details for invalid revision data for the
sake of debugging.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: optimize `kcidb` import

Remove redundant `kcidb` import and adjust
kcidb Client call accordingly.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: remove keys with `None` values

KCIDB doesn't allow `None` as field value.
Remove all optional fields with `None` value
to make it valid data for submitting to KCIDB.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: add `kcidb_test_suite` property

Every KernelCI test will be mapped to a unified
test suite for KCIDB data submission.
Add `kcidb_test_suite` property to test job
definitions in YAML configuration files.
The added property will store the mapped
KCIDB test suite name.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: parse and submit node test and build data

Listen to all the node events with node state
`done` or `available` and submit the node to KCIDB.
Parse node received from the event and create KCIDB
schema compatible object based on type of the node
i.e. checkout, build or test.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: set `log_excerpt` for builds and tests

Fetch logs from compressed log file(*.log.gz) URL
and send last 16*1024 characters for setting `log_excerpt`
field for build and test nodes as it is the max allowed
length of the KCIDB field.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/jobs-chromes: add kcidb test suite property for watchdog test

Add KCIDB test suite mapping for `watchdog_reset` test.

Signed-off-by: Jeny Sadadia <[email protected]>

* lava_callback.py: disable log removal from callback data

We need it for investigations if we have any critical data
loss during log sanitizing.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* src/send_kcidb: add error info to build nodes

Add error metadata fields such as `error_code` and
`error_msg` to `misc` field for build nodes.

Signed-off-by: Jeny Sadadia <[email protected]>

* result_summary presets: add watchdog-reset presets for mainline/next

Add monitor and summary presets to track the results from the watchdog
reset test on the mainline and next trees.

Signed-off-by: Laura Nao <[email protected]>

* pipeline.yaml: Fix fluster rootfs URL

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* src/send_kcidb: get error metadata for failed/incomplete tests

Tweak condition to get error metadata for test nodes.
It should get error info for incomplete nodes as well
and not just failed nodes.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: send tests only if KCIDB test mapping exists

All test suite definitions must have `kcidb_test_suite`
property i.e. KCIDB test suite mapping.
Only send tests for those the mapping is found.

Signed-off-by: Jeny Sadadia <[email protected]>

* tests/validate_yaml: add validation for KCIDB mapping

To submit KernelCI generated data to KCIDB, it is required
to have a mapping for all the job definition with
`kcidb_test_suite` property.
Add validation to ensure all the jobs have a mapping
present to avoid missing data submission.
This check is to notify test authors trying to enable tests
in maestro to include the required property for the mapping
in their definition.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add qcs6490-rb3gen2 boot test

Signed-off-by: Milosz Wasilewski <[email protected]>

* config: chromeos: Enable kselftest-dt on Qualcomm platforms

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* pipeline.yaml: Add one um build for android trees

As per request of Android team it will be good to check for breakages
UM builds as well.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: use `kind=job` for test suites

As part of re-structuring test hierarachy, `Job` model
has been introduced for test suite/job nodes.
It uses node kind `job`.
Update test configurations in `pipeline.yaml` and
`jobs-chromeos.yaml` to use `kind=job` to
generate job nodes.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/runtime/kunit.jinja2: provide `kind` value for child tests

In case of submitting test hierarchy, child nodes by default
inherit `kind` value from parent node.
As we are re-structuring test hierarchy, test suit/job nodes
will have `kind=job` where its child test nodes will have
`kind=test`. Provide `kind` field explicitly to test result
hierarchy to preserve different kind value than the parent
node.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/runtime/kunit.jinja2: fix `NameError`

Fix the below error in `_submit` method:
```
Traceback (most recent call last):
  File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 287, in main
    job.submit(results)
  File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 138, in submit
    self._submit(result)
  File "/home/kernelci/data/output/tmp94nrvsvs/kunit-x86_64", line 265, in _submit
    return node
NameError: name 'node' is not defined
```

Signed-off-by: Jeny Sadadia <[email protected]>

* config/runtime/kunit.jinja2: evaluate job node result

Evaluate job node result from child node results if
`null` result is receive from test result parser.
For example nodes such as `fortify`:
https://staging.kernelci.org:9000/viewer?node_id=6670ab43d0b7694b399897c4

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: fix parsing of KUnit log file

Handle both compressed(gzip) and plain text log files
for getting log excerpt.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: HTTP exception handling for log excerpt

Add HTTP exception handling for getting
log excerpt data.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: platforms-chromeos: Add serial delay for some Mediatek platforms

Add test_character_delay to the Spherion, Tomato and Steelix platforms
to workaround the fact that they're sometimes unable to process serial
input fast enough, resulting in mangled commands and consequently flaky
test results, as described in
https://github.com/kernelci/kernelci-project/issues/366.

The right place to do this change would be in the device-type template
as described in LAVA's documentation [1]. This overriding in KernelCI is
meant only as a temporary workaround to verify whether this fixes the
issue. If it does, then we'll do it in LAVA upstream instead.

[1] https://docs.lavasoftware.org/lava/debugging.html#differences-in-input-speeds
Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* config: chromeos: Enable error-logs kselftest for MediaTek Chromebooks

Run the error-logs kselftest on MediaTek Chromebooks. This test is
currently under review upstream [1] so, in the meantime, it has been
added to the collabora-next tree so it can prove its value by helping to
detect issues upstream.

[1] https://lore.kernel.org/all/[email protected]

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* config/pipeline.yaml: enable CIP lab

Add configuration for LAVA CIP lab.

Signed-off-by: Jeny Sadadia <[email protected]>

* config/pipeline.yaml: add baseline-x86 test for CIP

Add `baseline-x86-cip` test to be submitted to CIP
LAVA lab.

Signed-off-by: Jeny Sadadia <[email protected]>

* docker-compose.yaml: add `lab-cip` runtime

Add runtime argument `lab-cip` to `scheduler-lava`
container. This will enable the pipeline to run and
submit jobs to CIP LAVA lab.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: enable `job` node submission to KCIDB

Parse newly added job node and its child tests
for KCIDB submission.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: don't submit `setup` test suite nodes

`setup` test suite has been introduced to store test results
for environment setup checks before running actual test suite.
KCIDB doesn't require `setup` test suite result as long as
main test job result is submitted.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: add a check before sending data

Check if parsed data is available before
sending revision data to KCIDB.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: fix logs

Fix log statement about submitting node to KCIDB
as we are not sending all the nodes we receive
event for to KCIDB.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: handle skipped tests

Do not retrieve artifacts or metadata from parent
node for skipped tests as in pratice only kernel
revision, test runtime and platform will be
available for skipped tests.

Signed-off-by: Jeny Sadadia <[email protected]>

* result_summary/utils: ignore failures on log retrieval

Make the script continue running if there was an error fetching a test
log.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* doc/developer-documentation: add docs for enabling new tests

Add developer documentation for enabling new tests.

Signed-off-by: Jeny Sadadia <[email protected]>

* Fix links after docs page migration

Documentation has been migrated to the "docs.*" subdomain.

Signed-off-by: Paweł Wieczorek <[email protected]>

* pipeline.yaml: Add kcidebug fragment

Add useful low-overhead debug option to kernel,
and test on most x86 boards we have available,
with minimal baseline tests.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* configs: update gcc-10 to gcc-12

As we upgrade compiler images, we need update gcc version

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* regression_tracker: workaround: match node paths programatically

Don't use 'path' as an api search parameter. The use of lists as query
parameters (path is a list) is undefined. Instead, do the filtering in
code.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: remove qemu jobs from lab-qualcomm

QEMU jobs use container pulled from hub.docker.com. After the lab move
pulling from this registry is no longer possible at Qualcomm. This patch
disables QEMU jobs from Qualcomm lab.

Signed-off-by: Milosz Wasilewski <[email protected]>

* validate_yaml.py: Improve pipeline validation

Add validation that scheduler entries have matching job entry,
this is critical validation, and job entries have at least
one entry in the scheduler.
Fix one entry detected by this validation

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* pipeline.yaml: Add broonie(Mark Brown) trees to pipeline

It is time to enable even more trees.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* validate_yaml.py: Add additional verification for duplicate keys

We might have redefined same keys in different yaml files,
this tool will ensure consistency of this entries.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* validate_yaml.py: Remove path separator

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* validate_yaml.py: Rename variable to schedules

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config/kernelci.toml: update KCIDB origin name

As we agreed to refer new KernelCI API & Pipeline as
"maestro", use the new name while submitting data
to KCIDB.

Signed-off-by: Jeny Sadadia <[email protected]>

* src/send_kcidb: update KCI result mapping with KCIDB status

Update evaluation of KCIDB status from KCI result.

Create 2 categories for error codes:
1. When pre-check tests completed but actual test suite
coudln't run - this will have `MISS` status
2. When pre-check tests completed, actual test suite could
run but somehow couldn't complete - this will have `ERROR` status

Some LAVA error codes can occur at any point of execution
such as `Cancelled` and `Test`.
Listed such error codes to the most relevant category
based on analysis of available results.

Signed-off-by: Jeny Sadadia <[email protected]>

* result_summary presets: fix presets for v4l2-decoder-conformance

Following recent updates to data representation on KernelCI nodes,
the top-level nodes for tests now have their kind set to 'job' instead
of  'test'. Update the presets for v4l2-decoder-conformance tests
accordingly.

Signed-off-by: Laura Nao <[email protected]>

* result_summary presets: fix output file name in kselftest-acpi preset

Signed-off-by: Laura Nao <[email protected]>

* config: enable dmabuf-heaps, exec and iommu kselftest suites

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Add kcidb_test_suite

* config: result-summary: add generic rule to monitor failures and regression

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: Add rt-stable builds

Copy rt-stable builds from legacy KernelCI.

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes:
- Major changes to move to new way of writing kbuild jobs

* config: pipeline: Add v6.6-rt branch for builds

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: result-summary: add rt-stable kbuilds presets

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: chromeos: Add 'nfs' suffix to KCIDB suite name for baseline-nfs

The baseline test is currently run with both ramdisk and nfs rootfs. To
distinguish baseline-nfs tests in KCIDB, add an 'nfs' suffix to the KCIDB
test suite name.

Signed-off-by: Laura Nao <[email protected]>

* aks: Add kubernetes kcidb deployment

We need file that will manage deployment of kcidb bridge
in kubernetes production deployment.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* kubernetes: Adjust trigger k8s options

Ignore kernelci tree on production, as it is special
"staging"-only tree, and read all /config directory, not just default
pipeline.yaml.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* regression_tracker: bugfix: catch empty search condition

Fix _get_last_matching_node(), after the previous change there was an
unhandled scenario where nodes may be empty but the function wouldn't
return None immediately.

Signed-off-by: Ricardo Cañuelo <[email protected]>

* config: pipeline: correct the kind of kselftest suites to job

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* scheduler-chromeos.yaml: Temporarily disable non-essential tast tests

As per discussion, we disable temporary tast tests which unlikely
will be reviewed.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* k8s/aks: Update deployment files

1)Update memory limit, as working with linux sources might require 3Gbyte of RAM.
2)Update config file path
3)Add callback environment variable
4)Update image reference to fresh one

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: pipeline: enable android builds with gcc-12 for all architectures

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: enable android builds with clang-17 for all architectures

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: remove build_variants from android build_configs

The build_variants is legacy way to specify the different variants. We
have moved to the newer way to specify the variants. Hence remove the
build_variants from android build_configs.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: add android15-6.6-lts branch for build as well

The android15-6.6-lts has been included recently in legacy KernelCI:
https://github.com/kernelci/kernelci-core/pull/2597

Add the same in newer KernelCI.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: add blocklist for riscv older kernels for android builds

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: update KCIDB test suite mapping for baseline

Use `boot` as KCIDB test suite mapping for all
baseline tests.

Signed-off-by: Jeny Sadadia <[email protected]>

* callback_url: Update config and README

As we are moving callback URL to environment variable,
updating config and README accordingly.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: pipeline: enable android baseline (boot) testing for arm and arm64 in only allmodconfig

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* scheduler.py: If event have jobfilter, inject it to the node data

When someone generate artificial event with jobfilter, this is
likely maintainer trying to repeat job. Treat this accordingly,
and inject job filter to job node, so we will run only tests
maintainer wants.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* lava_callback: migrate to fastapi

It will be easier to maintain API and Pipeline, as
both will be powered by FastAPI framework.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: chromeos: Update fluster rootfs URL

Signed-off-by: Laura Nao <[email protected]>

* config: pipeline: fix defconfigs in fragments

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* kbuild.jinja2: support defconfig as list or str

As required in https://github.com/kernelci/kernelci-core/pull/2608
defconfig might be two types. Support it in jinja2 accordingly.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: piepline: add kbuilds of lee-mfd with default defconfigs

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: enable baseline testing for mfd for one board of each arch

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: fix platform sections for Qualcomm and Android schedules

Signed-off-by: Paweł Wieczorek <[email protected]>

* k8s: Update deployment to uvicorn, as we use fastapi now

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* config: pipeline: Unblock android runs on lava-collabora

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* pipeline: Enable preempt-rt cyclictest test

Enable the first preempt-rt test, cyclictest in new KernelCI. Enable it
on all platforms.

Since these are all smoke test there is no point in running them too
long. Thus reduce the runtime per test to one minute. This should keep
the total preempt-rt runtime roughly in the same time frame.

The changes have been ported from Daniel's PR [1].

[1] https://github.com/kernelci/kernelci-core/pull/2397

Signed-off-by: Daniel Wagner <[email protected]>
Co-developed-by: Muhammad Usama Anjum <[email protected]>
Signed-off-by: Muhammad Usama Anjum <[email protected]>

* pipeline: add all the test jobs for all rt-test

Add jobs definition of all the rt-tests. Enable cyclicdeadline and rtla
tests to run on all targets.

The changes have been ported from Daniel's PR [1].

[1] https://github.com/kernelci/kernelci-core/pull/2397

Signed-off-by: Daniel Wagner <[email protected]>
Co-developed-by: Muhammad Usama Anjum <[email protected]>
Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: add template and test properties for preempt_rt jobs

Add template, job add kcidb_test_suite properties for all preempt-rt jobs

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: rename preempt-rt to rt-tests which is correct name of tests

The legacy was using preempt-rt name of tests. But the repository has
rt-tests name. We must use the same name to merge with execution results
coming from other CIs in KCIDB.

Suggested-by: Jeny Sadadia <[email protected]>
Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: add the correct nfsroot for rt-tests

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: Remove android's deprecated branches

It has been confirmed with Todd that we should remove the deprecated
branches. Hence remove those branches.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* config: pipeline: run baseline on non-allmodconfig

The allmodconfig generates very large kernel image. It cannot be booted
on the arm64 and arm targets as tftp errors out that size is too large.
Reduce the kernel image size. Use the default defconfig. The same
defconfigs have been booting for other trees.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* doc: developer-documentation: Update documentation by adding more details

- Reorganize some things
- Specify how to write different variants by removing old syntax
- Give two separate templates for kbuild and test
- Try to put more details for new contributors

Signed-off-by: Muhammad Usama Anjum <[email protected]>
---
Changes since v1:
- Fix type
- Apply suggestions from code review

* doc/developer-documentation: fix a glitch in enabling new tree section

Fix a minor bug in YAML block formatting.

Fixes: f5f57de ("doc: developer-documentation: Update documentation by adding more details")
Signed-off-by: Jeny Sadadia <[email protected]>

* doc/developer-documentation: update a section title

Rename a section from "Enabling a new Kernel tree" to
"Enabling new KernelCI trees, builds, and tests" as it explains
enabling tests as well.

Signed-off-by: Jeny Sadadia <[email protected]>

* config: use the new `tree:branch` format for rules

For cases where we want a single branch to be allowed for a given tree,
we can now use the `tree:branch` format in rules. Convert existing rules
accordingly.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config: pipeline: fix improper use of "filters" attribute

The `filters` param was used in the legacy system but has been replaced
by `rules`, with a different syntax.

For Android RISC-V builds, this was used to deny job execution on
kernels < 4.19, so let's translate this condition with the rules format,
and do a similar change for the `rt-tests`-based jobs.

Signed-off-by: Arnaud Ferraris <[email protected]>

* config/pipeline.yaml: Fix x86 typo in kcidebug job names

The kcidebug jobs that run on MediaTek and Qualcomm platforms should
have arm64 in the name rather than x86. Fix the typo.

Signed-off-by: Nícolas F. R. A. Prado <[email protected]>

* config: pipeline: remove params

The parameters are only needed when they are changed or appeneded.
Remvoe the parameters which aren't being modified.

Signed-off-by: Muhammad Usama Anjum <[email protected]>

* validate_yaml.py: Jobs are required to have template parameter

Add more validation to config files of mandatory parameters.

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* validate_yaml.py: Add more job validations

Add basic validation, each job must have kind parameter

Signed-off-by: Denys Fedoryshchenko <[email protected]>

* workflows: Add label on CI check failures

Automatically add label so broken PR wont go to staging

Signed-off-by: Denys Fedoryshchenko <[email protected]>

---------

Signed-off-by: Jeny Sadadia <[email protected]>
Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
Signed-off-by: Denys Fedoryshchenko <[email protected]>
Signed-off-by: Ricardo Cañuelo <[email protected]>
Signed-off-by: Helen Koike <[email protected]>
Signed-off-by: Arnaud Ferraris <[email protected]>
Signed-off-by: Laura Nao <[email protected]>
Signed-off-by: Muhammad Usama Anjum <[email protected]>
Signed-off-by: Shreeya Patel <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Milosz Wasilewski <[email protected]>
Signed-off-by: Paweł Wieczorek <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
Co-authored-by: Jeny Sadadia <[email protected]>
Co-authored-by: Nícolas F. R. A. Prado <[email protected]>
Co-authored-by: Ricardo Cañuelo <[email protected]>
Co-authored-by: Helen Koike <[email protected]>
Co-authored-by: Arnaud Ferraris <[email protected]>
Co-authored-by: Laura Nao <[email protected]>
Co-authored-by: Muhammad Usama Anjum <[email protected]>
Co-authored-by: Shreeya Patel <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Milosz Wasilewski <[email protected]>
Co-authored-by: Paweł Wieczorek <[email protected]>
Co-authored-by: Milosz Wasilewski <[email protected]>
Co-authored-by: Daniel Wagner <[email protected]>
Signed-off-by: Denys Fedoryshchenko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants