Skip to content

Latest commit

 

History

History
114 lines (78 loc) · 5.35 KB

File metadata and controls

114 lines (78 loc) · 5.35 KB

Overview

As part of the effort to provide high availability (HA) for the SAP BTP Kyma runtime (AWS, Azure, GCP service plans), we have enabled multi-availability-zone worker groups for the Kyma runtime.

Note: The Zone here refers to hyperscalers availability zone. E.g. Azure Availability Zones

The worker nodes(Virtual Machines) with the provided machine type and autoscaler parameters will be provisioned in three availability zones of the selected region.

The NAT gateway created to route the public outbound traffic will be provisioned in a zone-redundant manner by all the Azure, AWS and GCP paid service plans.

The critical Kyma managed components hosted on the runtime will be configured to be resilient to node and/or zone failures.

So what does this mean for you as a customer?

  • You can keep your applications highly available and resilient to zone failures for hyperscalers.
  • You can now deploy multiple replicas of your applications in such a way that they are distributed across multiple availability zones.
  • So if one zone goes down, you still have replicas of your applications running in multiple zones thus continue to serve your users without disruptions.
  • Kyma components such as eventing are also deployed with zone redundancy. This will ensure that all your business scenarios that use Kyma components will also be shielded from zone/node failures.

Workloads distributed across zones all-zones-up

Highly Available despite Zone 3 going down one-zone-down

Example

Prerequisites

Steps

Let's deploy a sample application and understand how we can configure an application to be highly available and resilient to zone failures.

We will deploy a simple httpbin example. The key part is to specify the topologySpreadConstraints.

Here we can specify various configuration parameters that will influence how the replicas are distributed. The most important ones are:

  • topologyKey using which Kubernetes will identify the zone and use it to spread the replicas. It is the key of node labels that contains zone information.
  • whenUnsatisfiable indicates how to deal with a Pod if it doesn't satisfy the spread constraint.
  • maxSkew describes the degree to which Pods may be unevenly distributed.
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: multi-zone-ha-httpbin
  • Set up environment variables

    • OSX

      export NS={your-namespace}
    • Windows PowerShell

      $NS={your-namespace}
  • Go ahead and deploy the application. It has 3 replicas specified.

kubectl -n $NS apply -f k8s/httpbin.yaml
  • Observe the nodes on which pods (replicas) are deployed
kubectl -n $NS get po -l app=multi-zone-ha-httpbin -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName
NAME                                    NODE
multi-zone-ha-httpbin-579594d44-7m6nk   shoot--kyma-stage--ed72781-cpu-worker-0-z2-796d8-4bwfv
multi-zone-ha-httpbin-579594d44-9dv5r   shoot--kyma-stage--ed72781-cpu-worker-0-z3-69c77-p4565
multi-zone-ha-httpbin-579594d44-zjscn   shoot--kyma-stage--ed72781-cpu-worker-0-z1-b859d-7fcm5
  • Observe the zones in which the respective nodes are deployed
kubectl get nodes -o custom-columns=NAME:.metadata.name,REGION:".metadata.labels.topology\.kubernetes\.io/region",ZONE:".metadata.labels.topology\.kubernetes\.io/zone"
NAME                                                     REGION        ZONE
shoot--kyma-stage--ed72781-cpu-worker-0-z1-b859d-7fcm5   northeurope   northeurope-1
shoot--kyma-stage--ed72781-cpu-worker-0-z2-796d8-4bwfv   northeurope   northeurope-2
shoot--kyma-stage--ed72781-cpu-worker-0-z2-796d8-8sdx6   northeurope   northeurope-2
shoot--kyma-stage--ed72781-cpu-worker-0-z3-69c77-p4565   northeurope   northeurope-3

As you can see they are in different availability zones. This will ensure that if there are zone failures or a zone/node is unreachable, the replicas from other zones will continue to serve. This will ensure high availability and resiliency for your workloads.

Key Takeaways

  • It is possible to deploy applications on SAP BTP Kyma runtime that are highly available and resilient to zone/node failures.
  • When designing/planning your workload deployment strategy, check if your workloads need to be highly available. It is recommended to use the topologySpreadConstraints.

References