-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Install RT kernel from RPMs included in RHCOS #34
Install RT kernel from RPMs included in RHCOS #34
Conversation
hack/label-worker-rt.sh
Outdated
echo "[INFO]:labeling worker nodes" | ||
for node in $(${OC_TOOL} get nodes --selector='!node-role.kubernetes.io/master' -o name); do | ||
${OC_TOOL} label $node node-role.kubernetes.io/worker-rt="" | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: new line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
ignitionConfig.Systemd = igntypes.Systemd{ | ||
Units: []igntypes.Unit{ | ||
{ | ||
Contents: rtKernelService, | ||
Enabled: pointer.BoolPtr(true), | ||
Enabled: &enableRTKernel, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we create at all RT kernel unit if the RT disabled, also pre-tunning also depends on the RT kernel unit, so I am interesting if it will work as expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, good point
I'm wondering if we need the option to not install the RT kernel at all, when other performance features depend on it (no clue if all of them do?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other than that, pre-tuning uses wants
and not requires
, so it would be started without rt-kernel (to my best understanding)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK it is correct, if an unit wants
another unit and the latter can't be started, the former should run anyway (see: https://unix.stackexchange.com/questions/388586/systemd-requires-vs-wants - which in turn quotes manpages).
Question is however is if makes sense to do pretuning without rt-kernel, like Marc mentionedin on his grandparent comment.
Signed-off-by: Marc Sluiter <[email protected]>
weird, the e2e test failed, but it actually looks like it happened after our e2e test suite successfully completed. I'll run it again.
|
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes look good. I just want to see e2e pass before merging.
functests/performance/rt-kernel.go
Outdated
if err != nil { | ||
return "" | ||
} | ||
defer func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use an AfterEach() instead. I'm not 100% sure what happens if you do a ctrl-c
to kill execution when a defer is in use. I know that ginkgo will attempt to run the AfterEach() functions even if you try to kill the test execution early though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair point, will change it
trap exit_handler EXIT | ||
# create and mount new container | ||
podman create --net=none --name "$osContainerName" "$imageID" > /dev/null | ||
kernelDir=$(podman mount "$osContainerName") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow, so we're delivering kernel pkgs in containers? interesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU this is used to access the packages already shipped in the system image (so we are not shipping anything anymore). But I may have out of date information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is complicated, let me try to summarize:
- the installer's bootstrap node installs an "old" os bootimage on cluster nodes, which is just new enough to be able to do everything which is needed on first boot
- then the MCO installs a newer version of the os ("early pivot"). That one is delivered via container image. Maybe you noticed that OCP comes with a
machine-os-content
image: that one contains the os - with RHCOS version 44 the image contains the RT kernel RPMs. They were just put into the root directory of the image
- so what I do here (stolen from MCO code): mount the image (which is not possible directly, so create (but don't run) a container from it) which contains the os version which is currently booted, and install the RPMs from the mount path.
longer version: https://github.com/openshift/machine-config-operator/blob/master/docs/OSUpgrades.md and others
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @slintes great summary. I think it would be great to record this as comment in the rt-kernel.sh
script itself. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought REPO_URL was a workaround in 4.3 until we have machineconfig in 4.4 and I see you have a note here as "TEMPORARY WORKAROUND".
Are we having 2 workarounds now for 4.3 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The operator is for 4.4 only. There we have the RT kernel RPMs already available in the RHCOS image, so no need for a yum repo anymore, but the MCO is not ready to use it yet, see WIP PR [0]. So this is a new temporary workaround for 4.4, unrelated to 4.3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fromanirh ah missed your comment
will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fromanirh done
@@ -0,0 +1,12 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
who calls this? only humans or also the operator proper? I guess the former, asking to be sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we use a MachineConfig
for installing this script and a systemd unit, and systemd will execute the script then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh sorry, just noticed I was looking at the wrong script
I called this one manually on test clusters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At glance looks ok, only clarification questions inside
Signed-off-by: Marc Sluiter <[email protected]>
Signed-off-by: Marc Sluiter <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: davidvossel, slintes The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
remove unused references to worker-rt label
With this PR we install the RT kernel from the RPMs provided in RHCOS 44.81.
This removes the need for a yum repo.
Not sure if the installation of the RT kernel should be default, but can be disabled on the CRD? Atm it's the other way around, but probably in most cases we want the RT kernel.
TODO: