Dynamic Scheduling #248

gctucker · 2023-09-11T11:13:32Z

gctucker
Sep 11, 2023
Maintainer

The legacy pipeline system had everything hard-coded in YAML configuration: which git branches to monitor, which kernels to build for each branch, which tests to run for each build on which device and in which lab. While this makes the load very deterministic, it also means a lot of manual curation and a sub-optimal use of the available resources. It's also rather difficult to comprehend and maintain.

It wasn't really designed to be like this, rather it evolved from a smaller objective to build just some plain defconfigs and run some boot tests in LAVA. Then we added support for multiple compilers, extended test coverage, added various config fragments for specialised builds, added filters to remove issues in some corner cases. So it "works" but it really needs a fresh start.

Transitioning to the new API & Pipeline provides a natural opportunity to come up with a better configuration mechanism. Now, we don't just have fixed concepts of builds and runtime tests but generic "jobs". And instead of filters with implicit dependencies between builds and tests we have a "scheduler" which has its own YAML configuration to describe such things. This is the part of the pipeline that decides which jobs get run in which runtimes, and we can implement any logic we want there.

The initial inputs are events received from the API, typically whenever some node data changes, and the YAML scheduler configuration. For example:

scheduler:

  - job: baseline-x86
    event:
      channel: node
      name: kbuild-gcc-10-x86
      result: pass
    runtime:
      type: lava
    platforms:
      - qemu-x86

This tells the scheduler that when receiving an event about a successful x86 build, it should run a baseline test on QEMU in a LAVA lab.

Now we can discuss how to extend things based on that. Some of it will be required in order to match the legacy system's coverage (e.g. run that test on multiple platforms and not just QEMU without having to duplicate the whole config entry), but I think that's already well understood. The really important aspect which the new API enables is to go beyond this and have a more effective way of achieving test coverage.

API Roadmap issue: kernelci/kernelci-api#349

gctucker · 2023-09-11T11:42:38Z

gctucker
Sep 11, 2023
Maintainer Author

Runtime selection

A very common scenario is when there is a requirement to run a job in a particular type of runtime but which exact one to pick is not specified. The scheduler then has to pick one arbitrarily.

For example, the "baseline on QEMU" config above says that the runtime should be of the "lava" type but it doesn't specify which LAVA lab. So the scheduler needs to rely on some logic to pick one and run the job there. A trivial implementation for this would be to randomly pick one within the pool of available labs.

Note: Would we ever need to define jobs that can be run in a number of runtime types and let the scheduler decide? My initial opinion on this is no, so we can simplify things a bit by keeping this entirely deterministic in the YAML config. If we really want to run a job in several runtime types, then we can have multiple scheduler entries and the job will be run in each of them, which I don't think is a big issue.

Then the next step is about providing additional clues to the scheduler to let it make a better guess when picking one runtime within the specified type. For example, we may have 10 LAVA labs and 5 Kubernetes clusters. The Runtime class implementation could have methods to help handle this kind of logic too, for example a Runtime.get_load() method that would return a value a bit like a load average but normalised to 1.0 for full capacity. The scheduler could call this on all the runtimes and pick the one with the lowest load, which would already be a step forward compared to random selection.

It is however still a bit simplistic as the same job can cause a different load in different runtimes. Say, Kubernetes cluster might be tuned for a particular kind of job (high RAM, CPU or storage) and LAVA labs may have a different number of platform instances (QEMU or any other type of hardware). An ideal, generic way to handle this might be to have another method e.g. Runtime.estimate_load() with the parameters of the job to run. That means, the scheduler's job would be to keep the runtimes loaded up to a certain value by checking the current load and adding jobs that minimise the load.

For example, with a pool of 3 runtimes A, B and C and a target load of 0.8 for each of them, the scheduler could do this:

Get runtime loads:
- A: 0.4
- B: 0.6
- C: 0.9
Estimate job load:
- A: 0.03
- B: 0.01
- C is already above 0.8 so it's ruled out
Simple logic to estimate load increase:
- A: 7.5%
- B: 1.67%
Result: the scheduler submits the job to runtime B

0 replies

gctucker · 2023-09-11T12:04:16Z

gctucker
Sep 11, 2023
Maintainer Author

Platform criteria

Within a particular type of runtime, each job can have some criteria for which "platform" to run on. The example in this discussion just mentions qemu as a platform name but this could be made more flexible. We could be specifying the CPU architecture, amount of RAM, compute power and more specific attributes such as the presence of scratch storage. Then the Runtime implementation could have a method e.g. Runtime.can_run() so the scheduler could create a temporary list of runtimes that are capable of running the job. Another implementation approach would be to rely on the .estimate_load() method suggested in the previous discussion topic to return None or something when the job can't be run. That way, the estimate can take into account the extra criteria.

It would then be up to each runtime implementation to find out the criteria for its platforms. A simple approach for LAVA labs would be to keep the YAML device types configuration from the legacy system which contains the CPU architecture for each device type, and this config would probably be loaded by the Runtime itself rather than a global KernelCI core config. It could also query the LAVA API and keep a cache of the online devices etc. like the legacy system, but this would be all abstracted behind the Runtime implementation. For Kubernetes, the nodes information could be retrieved dynamically and some YAML config could also be added if needed.

Note: If we rely on .can_run() to determine whether a runtime can run the requested job, then we may eventually not make the runtime type required and go through all runtimes by default. So criteria for a baseline job on a Raspberry Pi would not match a Kubernetes runtime but would match some labs such as LAVA and maybe Labgrid etc. but a baseline job on QEMU might be able to run in a GCP runtime on a VM. So that would become a runtime cross-type scheduler choice.

0 replies

gctucker · 2023-09-11T12:26:21Z

gctucker
Sep 11, 2023
Maintainer Author

Multiple job runs

Adding redundancy to the jobs can be very useful as showing the differences between results when running the same job several times either exactly in the same environment or in multiple different ones can greatly help when investigating issues. It also makes the system more robust against infrastructure issues, as while we could have some mechanism to resubmit jobs it's not always obvious when they fail to run or they may timeout much later.

This could be done simply by adding a field in the scheduler config to tell it to schedule the same job multiple times. The intention when specifying "loose" criteria is that the job can be run on a variety of platforms so I would expect the scheduler implementation to try and pick very different ones. If the intention was to run a particular job multiple times exactly the same way then the runtime name could be specified along with very narrow criteria (e.g. exact device type name).

However, with the ideas suggested in the previous topics, the tendency would be to just pick the runtime and platform that would result in lowest load so potentially always the same. For example, if the job needs to be run on any x86 platform and one lab has lots of a particular kind of x86 hardware then it'll most likely be running all these jobs.

One way to deal with this would be to pass the criteria associated with the platforms used in previous jobs to the Runtime implementation and have a way to specify a constraint on the number of runs. Based on the original example:

scheduler:

  - job: baseline-x86
    event:
      channel: node
      name: kbuild-gcc-10-x86
      result: pass
    runtime:
      type: lava
    criteria:
      arch: x86_64
    runs:
      number: 3
      variant: device_type

The scheduler would schedule the first job like before, then the second time it would add a constraint that the device_type should not be the same as the one in the first job when asking the Runtimes. And the third time, it would need to not be the same as any of the first two.

In this case, one grey area is if no Runtime can schedule jobs with a different device type but could run it again on the same one. I guess we might have "strict" variations where it's required to have different ones or an error is raised, and "permissive" variations when if only one device type is available then it runs all the jobs anyway.

0 replies

gctucker · 2023-09-11T12:39:20Z

gctucker
Sep 11, 2023
Maintainer Author

Job statistics

In addition to the Runtime specific implementations, the scheduler itself could keep track of when it submits a job, when it starts running and when it completes using node state changes. Then it could accumulate this data for each job signature, essentially the set of parameters used when generating and submitting the job. This could be stored in the API too or in a local database used by the scheduler, or added directly as meta-data to the nodes. Then the scheduler could somehow get an estimate of how long a job would wait until it gets started and how long it would take to complete, and maybe also some indication of the load caused by the job if this could be retrieved from the Runtime. It could then combine this with the other suggested ways of dynamically assessing the availability of each runtime when picking one.

This may in fact be more helpful when Runtimes themselves can't get this information. In the worst case where no information can be retrieved from a Runtime (e.g. .get_load() and .estimate_load() always return 0), the scheduler could rely on this empirical data to have an educated guess on its own. A slight alternative to this would be for the Runtime implementation to keep track of its own stats itself if it can, maybe with some generic code to manage this in the base Runtime class.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic Scheduling #248

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Dynamic Scheduling #248

gctucker Sep 11, 2023 Maintainer

Replies: 4 comments

gctucker Sep 11, 2023 Maintainer Author

Runtime selection

gctucker Sep 11, 2023 Maintainer Author

Platform criteria

gctucker Sep 11, 2023 Maintainer Author

Multiple job runs

gctucker Sep 11, 2023 Maintainer Author

Job statistics

gctucker
Sep 11, 2023
Maintainer

gctucker
Sep 11, 2023
Maintainer Author

gctucker
Sep 11, 2023
Maintainer Author

gctucker
Sep 11, 2023
Maintainer Author

gctucker
Sep 11, 2023
Maintainer Author