feat: add timeoutSeconds to ManagedChart #929

FrankYang0529 · 2025-01-15T07:09:06Z

Problem:

Solution:

Related Issue:
harvester/harvester#7375

Test plan:

Signed-off-by: PoAn Yang <[email protected]>

tserong

How does adding a timeout help to fix the related issue? I'm sure I'm personally missing some knowledge here about managed charts, so please feel free to tell me to RTFM ;-)

albinsun · 2025-01-18T06:03:23Z

A Try w/ Wordaround, upgrade can be triggered and complete.

Cluster Setup

3 nodes witness, hits this issue as expected.
- chart
- managedChart
Apply workaround (Edit by k9s)

Add timeoutSeconds: 600 in spec like this PR
- chart
- managedChart
Create storage class sc2 and set as default
Start an airgapped upgrade to v1.4.1-rc1

FrankYang0529 · 2025-01-21T04:48:47Z

How does adding a timeout help to fix the related issue?

The error message is from fleet bundle and fleet is used to run helm command. The default timeout in helm command is 5 minutes, so increasing timeout maybe helpful. I think that this approach may not solve root cause, but it may be a good workaround to reduce the probability of problems occurring.

innobead

LGTM

w13915984028 · 2025-01-24T12:00:55Z

I can confirm this PR and setting is essential.

With debug on fleet-agent, if managedchart does not set a timeout, it uses 0, when there are any hooks, it will fail in below function.

// execHook executes all of the hooks for the given hook event.
func (cfg *Configuration) execHook(rl *release.Release, hook release.HookEvent, timeout time.Duration) error {

We had below PRs on upgrade path, now with #929, it will solve finally.

harvester/harvester#6608
harvester/harvester#7386

ErrApplied(1) [Cluster fleet-local/local: post-upgrade hooks failed, debugging, timeout:0s : context deadline exceeded]

-> with a force param 5

NotReady(1) [Cluster fleet-local/local: not installed: post-upgrade hooks failed, debugging, timeout:5ns : warning: Hook longhorn-post-upgrade Job harvester/charts/longhorn/templates/postupgrade-job.yaml failed with timeout 5ns: 1 error occurred:...

harv21:/home/rancher # kk get bundle -A
NAMESPACE     NAME                                          BUNDLEDEPLOYMENTS-READY   STATUS
fleet-local   fleet-agent-local                             1/1                       
fleet-local   local-managed-system-agent                    1/1                       
fleet-local   mcc-harvester                                 0/1                       ErrApplied(1) [Cluster fleet-local/local: post-upgrade hooks failed, debugging, timeout:0s : context deadline exceeded]
fleet-local   mcc-harvester-crd                             1/1                       
fleet-local   mcc-local-managed-system-upgrade-controller   1/1                       
fleet-local   mcc-rancher-logging-crd                       1/1                       
fleet-local   mcc-rancher-monitoring-crd                    1/1   



harv21:/home/rancher # kk get bundle -A
NAMESPACE     NAME                                          BUNDLEDEPLOYMENTS-READY   STATUS
fleet-local   fleet-agent-local                             1/1                       
fleet-local   local-managed-system-agent                    1/1                       
fleet-local   mcc-harvester                                 0/1                       NotReady(1) [Cluster fleet-local/local: not installed: post-upgrade hooks failed, debugging, timeout:5ns : warning: Hook longhorn-post-upgrade Job harvester/charts/longhorn/templates/postupgrade-job.yaml failed with timeout 5ns: 1 error occurred:...
fleet-local   mcc-harvester-crd                             1/1                       
fleet-local   mcc-local-managed-system-upgrade-controller   1/1                       
fleet-local   mcc-rancher-logging-crd                       1/1                       
fleet-local   mcc-rancher-monitoring-crd                    1/1

FrankYang0529 requested a review from bk201 January 16, 2025 02:24

bk201 requested review from ibrokethecloud and tserong January 16, 2025 03:15

feat: add timeoutSeconds to ManagedChart

45b6c9b

Signed-off-by: PoAn Yang <[email protected]>

FrankYang0529 force-pushed the HARV-7375 branch from 13b14ce to 45b6c9b Compare January 16, 2025 09:05

tserong reviewed Jan 17, 2025

View reviewed changes

innobead approved these changes Jan 21, 2025

View reviewed changes

This was referenced Jan 21, 2025

[BUG] Fail to trigger v1.4.0 to v1.4.1-rc1 upgrade on witness cluster due to managed chart harvester is not ready harvester/harvester#7375

Open

feat: add upgrade with another default storage class known issue harvester/docs#710

Merged

This was referenced Jan 24, 2025

[BUG] Harvester managedchart report post-upgrade hooks (longhorn-post-upgrade) failed: context deadline exceeded harvester/harvester#7441

Open

Drop longhorn chart post-upgrade hook #940

Closed

w13915984028 mentioned this pull request Jan 24, 2025

chore: change timeoutSeconds from 60 to 600 harvester/harvester#7386

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add timeoutSeconds to ManagedChart #929

feat: add timeoutSeconds to ManagedChart #929

FrankYang0529 commented Jan 15, 2025

tserong left a comment •

edited

Loading

albinsun commented Jan 18, 2025

FrankYang0529 commented Jan 21, 2025

innobead left a comment •

edited

Loading

w13915984028 commented Jan 24, 2025

feat: add timeoutSeconds to ManagedChart #929

Are you sure you want to change the base?

feat: add timeoutSeconds to ManagedChart #929

Conversation

FrankYang0529 commented Jan 15, 2025

tserong left a comment • edited Loading

Choose a reason for hiding this comment

albinsun commented Jan 18, 2025

FrankYang0529 commented Jan 21, 2025

innobead left a comment • edited Loading

Choose a reason for hiding this comment

w13915984028 commented Jan 24, 2025

tserong left a comment •

edited

Loading

innobead left a comment •

edited

Loading