Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add timeoutSeconds to ManagedChart #929

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

FrankYang0529
Copy link
Member

Problem:

Solution:

Related Issue:
harvester/harvester#7375

Test plan:

Copy link
Contributor

@tserong tserong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does adding a timeout help to fix the related issue? I'm sure I'm personally missing some knowledge here about managed charts, so please feel free to tell me to RTFM ;-)

@albinsun
Copy link

A Try w/ Wordaround, upgrade can be triggered and complete.

  1. Cluster Setup

    3 nodes witness, hits this issue as expected.

    • chart
      image
    • managedChart
      image
  2. Apply workaround (Edit by k9s)

    Add timeoutSeconds: 600 in spec like this PR

    image

    • chart
      image
    • managedChart
      image
  3. Create storage class sc2 and set as default
    image

  4. Start an airgapped upgrade to v1.4.1-rc1
    image

    image

@FrankYang0529
Copy link
Member Author

How does adding a timeout help to fix the related issue?

The error message is from fleet bundle and fleet is used to run helm command. The default timeout in helm command is 5 minutes, so increasing timeout maybe helpful. I think that this approach may not solve root cause, but it may be a good workaround to reduce the probability of problems occurring.

Copy link
Collaborator

@innobead innobead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@w13915984028
Copy link
Member

I can confirm this PR and setting is essential.

With debug on fleet-agent, if managedchart does not set a timeout, it uses 0, when there are any hooks, it will fail in below function.

// execHook executes all of the hooks for the given hook event.
func (cfg *Configuration) execHook(rl *release.Release, hook release.HookEvent, timeout time.Duration) error {

We had below PRs on upgrade path, now with #929, it will solve finally.

harvester/harvester#6608
harvester/harvester#7386

ErrApplied(1) [Cluster fleet-local/local: post-upgrade hooks failed, debugging, timeout:0s : context deadline exceeded]

-> with a force param 5

NotReady(1) [Cluster fleet-local/local: not installed: post-upgrade hooks failed, debugging, timeout:5ns : warning: Hook longhorn-post-upgrade Job harvester/charts/longhorn/templates/postupgrade-job.yaml failed with timeout 5ns: 1 error occurred:...

harv21:/home/rancher # kk get bundle -A
NAMESPACE     NAME                                          BUNDLEDEPLOYMENTS-READY   STATUS
fleet-local   fleet-agent-local                             1/1                       
fleet-local   local-managed-system-agent                    1/1                       
fleet-local   mcc-harvester                                 0/1                       ErrApplied(1) [Cluster fleet-local/local: post-upgrade hooks failed, debugging, timeout:0s : context deadline exceeded]
fleet-local   mcc-harvester-crd                             1/1                       
fleet-local   mcc-local-managed-system-upgrade-controller   1/1                       
fleet-local   mcc-rancher-logging-crd                       1/1                       
fleet-local   mcc-rancher-monitoring-crd                    1/1   



harv21:/home/rancher # kk get bundle -A
NAMESPACE     NAME                                          BUNDLEDEPLOYMENTS-READY   STATUS
fleet-local   fleet-agent-local                             1/1                       
fleet-local   local-managed-system-agent                    1/1                       
fleet-local   mcc-harvester                                 0/1                       NotReady(1) [Cluster fleet-local/local: not installed: post-upgrade hooks failed, debugging, timeout:5ns : warning: Hook longhorn-post-upgrade Job harvester/charts/longhorn/templates/postupgrade-job.yaml failed with timeout 5ns: 1 error occurred:...
fleet-local   mcc-harvester-crd                             1/1                       
fleet-local   mcc-local-managed-system-upgrade-controller   1/1                       
fleet-local   mcc-rancher-logging-crd                       1/1                       
fleet-local   mcc-rancher-monitoring-crd                    1/1                       


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants