-
Notifications
You must be signed in to change notification settings - Fork 40.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubeadm: make kubeadm init and join output the same error #130040
kubeadm: make kubeadm init and join output the same error #130040
Conversation
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
After this PR, kubeadm init will throw the following error:
After this PR, kubeadm join will throw the following error:
|
/cc @pacoxu @neolit123 |
78afe9a
to
bd36af9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the PR @HirazawaUi
here is how i think this old code should be refactored.
please, always prefix notes with
|
bd36af9
to
dcb0d13
Compare
All suggestions fixed. |
LGTM, mostly. should have a second reviewer too. could you show one error output from e.g. |
63be289
to
14aa3ce
Compare
ok, it would be nice to have another new line here:
|
this dosn't seem correct, any idea why here is a trailing |
This error string is an error I injected. :)
Yes, I will debug and modify it further tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Only one nit.
waiter.SetTimeout(data.Cfg().Timeouts.KubeletHealthCheck.Duration) | ||
kubeletConfig := data.Cfg().ClusterConfiguration.ComponentConfigs[componentconfigs.KubeletGroup].Get() | ||
kubeletConfigTyped, ok := kubeletConfig.(*kubeletconfig.KubeletConfiguration) | ||
if !ok { | ||
return errors.New("could not convert the KubeletConfiguration to a typed object") | ||
} | ||
if err := waiter.WaitForKubelet(kubeletConfigTyped.HealthzBindAddress, *kubeletConfigTyped.HealthzPort); err != nil { | ||
return handleError(err) | ||
kubelet.PrintKubeletErrorHelpScreen(data.OutputWriter(), data.Cfg().NodeRegistration.CRISocket, true) | ||
return errors.Wrap(err, "failed while waiting for the kubelet to start") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wrapped message use kubelet
, but the printer is controlPlane=true
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
Initially, I just wanted to use the controlPlane
parameter to determine whether to display the full content of kubeletFailTempl
as before. However, we can make it more granular:
- If kubelet fails to start, we should only display the kubelet startup failure error and guidance.
- If the failure occurs while waiting for
WaitForControlPlaneComponents
, we should display both the kubelet startup failure information and container troubleshooting logs.
|
||
fmt.Fprintln(outputWriter, kubeletFailMsg) | ||
if controlPlane { | ||
_ = controlPlaneFailTempl.Execute(outputWriter, context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We didn't print handle the error(probably print it) before. Not sure if we should do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other parts of kubeadm, calls to the Execute method similarly do not return errors, so I chose to ignore the error here.
14aa3ce
to
ae78ec1
Compare
kubeadm init:
kubeadm join:
|
ae78ec1
to
8095cc3
Compare
/lgtm |
LGTM label has been added. Git tree hash: cf082b5632cfc470ce786396ae36ae647c3b6796
|
@@ -45,6 +68,21 @@ func TryStartKubelet() { | |||
} | |||
} | |||
|
|||
// PrintKubeletErrorHelpScreen prints help text on kubelet errors. | |||
func PrintKubeletErrorHelpScreen(outputWriter io.Writer, criSocket string, waitControlPlaneComponents bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after some thinking let's split this func into two;
PrintKubeletErrorHelpScreen
PrintControlPlaneErrorHelpScreen
this is nicer than using the same function and having a flag waitControlPlaneComponents
also, when a control plane fails we already run the kubelet check earlier which passed. so the problem at that point seems like failing CP components, not failing kublet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, I would prefer to split this function only after the WaitForAllControlPlaneComponents
feature gate reaches the GA stage. If users manually disable this feature gate, the error reporting could become confusing as we would lose troubleshooting hints for checking controlPlane components failures.
Alternatively, we could implement a more laborious approach now:
- When the
WaitForAllControlPlaneComponents
feature gate is enabled, we would output separate troubleshooting hints for kubelet and controlPlane components respectively. - If it's disabled, we would maintain the current behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WaitForControlPlaneComponents is basically the new WaitForAPI which used to wait for only apiserver. the help text to debug a control pod is relevant for both cases.
i think the two errors are distinctive. we shouldn't show an error that the kubelet has failed if the control plane (or just the apiserver) could not start. that's a separate problem.
to summarize, i think we should;
- show a kubelet help text on WaitForKubeletErrors
- show a control plan help text if WaitForAPI or WaitForControlPlaneComponents helped.
if the user hits the kubelet error they must first resolve that problem. it's blocking the workflow to continue further, then later they might also hit the CP error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair.
- 'journalctl -xeu kubelet'`) | ||
|
||
controlPlaneFailTempl = template.Must(template.New("init").Parse(dedent.Dedent(` | ||
Additionally, a control plane component may have crashed or exited when started by the container runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, a control plane component may have crashed or exited when started by the container runtime. | |
A control plane component may have crashed or exited when started by the container runtime. |
- 'journalctl -xeu kubelet'`) | ||
|
||
controlPlaneFailTempl = template.Must(template.New("init").Parse(dedent.Dedent(` | ||
Additionally, a control plane component may have crashed or exited when started by the container runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leaving the text like that even if the wiatforcontrolplanecomponents FG is disabled, still makes sense.
the old error will actually say that it couldn't probe kub-apiserver at healthz for n minutes.
8095cc3
to
f99456e
Compare
f99456e
to
8f78cdf
Compare
@@ -52,6 +54,17 @@ const ( | |||
argAdvertiseAddress = "advertise-address" | |||
) | |||
|
|||
var ( | |||
controlPlaneFailTempl = template.Must(template.New("init").Parse(dedent.Dedent(` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved controlPlaneFailTempl
here. After the split, putting it in the kubelet's file seems a bit unreasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please also move KubeletFailMsg here and make it private var.
add exported function PrintKubeletErrorHelpScreen that just calls the fmt.Fprintln(data.OutputWriter(), kubeletFailMsg)
consistency would be nicer that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
8f78cdf
to
f66211b
Compare
f66211b
to
ab02cda
Compare
I used the latest code and manually injected errors. kubeadm output the following error log: kubeadm join:
CP components failed during kubeadm init:
kubelet startup failed during kubeadm init:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
thanks
LGTM label has been added. Git tree hash: 8c9433873aa0473f3334c458c13ae61b87f9f873
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: HirazawaUi, neolit123 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Currently, during kubeadm init, if waiting for the control plane components to start fails, we output a templated error message.
However, during kubeadm join, if waiting for the control plane components to start fails, we output a raw error message.
This PR aim to ensure that kubeadm init and kubeadm join output the same error message format when waiting for the control plane components to start fails.
Which issue(s) this PR fixes:
Fixes kubernetes/kubeadm#3149
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: