Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade/Rollout System of Clusters #204

Open
DinaBelova opened this issue Aug 18, 2024 · 0 comments
Open

Upgrade/Rollout System of Clusters #204

DinaBelova opened this issue Aug 18, 2024 · 0 comments
Labels
epic Large body of work, can be broken down into individual issues

Comments

@DinaBelova
Copy link
Collaborator

DinaBelova commented Aug 18, 2024

Goals

  • Cluster management and operations is not only creation and deletion of clusters but also the upgrade of the clusters.
  • Upgrades can be a major stress factor for any platform engineering team and therefore we should try to make them as easy as possible and automated but with as much insights for the team that does the upgrades.
    While fully automated upgrades are on the lowest level of interaction and seem to be the easiest, they do not fit into the operational procedures of enterprise customers which want to trigger upgrades of production clusters in a controlled way

Major deliverables

  • ability to upgrade clusters

Who it benefits

  • Customer Business: Plane-ability and controlled cluster upgrades that fit the need of enterprise k8s cluster management
  • Platform: Stress free upgrades without a massive amount of work to upgrade
  • Mirantis: Great customer experience and happy customers

Acceptance criteria

  • Upgrading of a cluster involves 3 steps:
    • Upgrade the Helm Chart with the changes and push the changes into an OCI registry with a new version of the Helm Chart
    • Create a new Template Object with a new name that references the pushed vesrsion of the helm chart
    • Upgrade/migrate the Deployment object to point to the new Template Name which then actually triggers the upgrade of the cluster
  • The Deployment Object shows similar status information as CAPI itself provides
    • Expectation is to have three statuses: Upgrade in Progress, Upgrade successful, Upgrade failed
  • Failed Upgrades are clearly marked in the Deployment Object
  • Changes of template variables and template name of the Deployment object are treated the same way, as they could trigger any cluster changes (like a change of the instance type in AWS needs to replace all k8s nodes, the same as a template name upgrade which upgrades the k0s version)

Assumptions

  • CAPI does actual upgrades of the changes in an enterprise grade way

Telemetry & Success Criteria

  • Each Upgrade triggers a Telemetry Event with the following Infos after the upgrade is completed:
    • cluster_id
    • target_infrastructure
    • New template name

Out of scope

  • The actual upgrade of the cluster is handled by CAPI and we should not write any code in HMC repo which upgrades the clusters. HMC code should only be in an observabillity mode of the actual upgrade and provide as much information as needed into the Deployment Object from CAPI. If there are any bugs we find that prevent upgrades they should be fixed in CAPI or the affected CAPI providers.
  • CAPI is sometimes a bit finicky on which objects can be upgraded in place and which of them need to be rolling changed (new ones added and then old one removed). In this epic we don't want to worry about this yet and assume that the templates itself don't modify inplace parts of CAPI objects which actually can't be modified inplace.
  • Multi Cluster Upgrades will be implemented later
  • Auto Cluster Upgrade will be implemented later
  • Upgrading of Mirantis templates and mgmt control plane itself is not part of this epic

related issues:

@DinaBelova DinaBelova added the epic Large body of work, can be broken down into individual issues label Aug 18, 2024
@DinaBelova DinaBelova moved this to Todo in Project 2A Aug 19, 2024
@DinaBelova DinaBelova moved this from Todo to In Progress in Project 2A Sep 4, 2024
@alex-shl alex-shl added this to K0rdent Jan 3, 2025
@alex-shl alex-shl moved this to In Progress in K0rdent Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic Large body of work, can be broken down into individual issues
Projects
Status: In Progress
Development

No branches or pull requests

1 participant