From 9e397dd9b7d70cfce308dcf39e18a01bb3e7b4ad Mon Sep 17 00:00:00 2001 From: Paul Lorenz Date: Fri, 31 Jan 2025 18:17:40 -0500 Subject: [PATCH] Add Controller HA reference material. Fixes #929 --- docusaurus/docs/reference/ha/routers.md | 90 ++++++++++++++++++++++++ docusaurus/docs/reference/ha/topology.md | 81 +++++++++++++++++++++ 2 files changed, 171 insertions(+) create mode 100644 docusaurus/docs/reference/ha/routers.md create mode 100644 docusaurus/docs/reference/ha/topology.md diff --git a/docusaurus/docs/reference/ha/routers.md b/docusaurus/docs/reference/ha/routers.md new file mode 100644 index 00000000..f9a7b8bc --- /dev/null +++ b/docusaurus/docs/reference/ha/routers.md @@ -0,0 +1,90 @@ +--- +sidebar_label: Routers +sidebar_position: 40 +--- + +# Routers in Controller HA + +There are only a few differences in how routers work in an HA cluster. + +## Configuration + +When enrolling routers, the JWT for a new router contains the list of +controllers. When the router is enrolled, the controller endpoints +configuration file is initialized with the list of controllers. + +This means that manually configuring the controllers for a router should +no longer be required. + +### Endpoints File + +The router stores the current known controllers in an endpoints configuration +file. + +Note that: + +* The endpoints file will be written whenever the router is notified of changes + to the controller cluster. +* The file is only read at router startup. +* The file is not monitored, so changes made by administrators while the router + is running won't take effect until the router is restarted, and may be + overwritten by the router before it is restarted. Make sure the router is + stopped before manually editting the file. +* The endpoints file is only generated by enrollment and when the endpoints + change. For an existing configuration with the routers specified in the + router config, if the endpoints never change, the endpoints file will never + be generated. + +#### Location + +By defaul the endpoints file will be named `endpoints` and will be placed +in the same directory as the router config file. + +However, the file can be customized using a config file settings. + +```yaml +ctrl: + endpoints: + - tls:ctrl1.ziti.example.com:1280 + endpointsFile: /var/run/ziti/endpoints.yaml +``` + +### Manual Controller Configuration + +Instead of specifying a single controller, multiple controllers can be specified +in the router configuration. + +```yaml +ctrl: + endpoints: + - tls:ctrl1.ziti.example.com:1280 + - tls:ctrl2.ziti.example.com:1280 + - tls:ctrl3.ziti.example.com:1280 +``` + +If the controller cluster changes, it will notify routers of the updated +controller endpoints. + +## Router Data Model + +The router receives a stripped down version of the controller data model. + +While the router data model can be disabled on the controller using a config +setting in standalone mode, it is required for controller clusters, so that +setting will be ignored. + +The data model on the router is periodically snapshotted, so it doesn't need to +be fully restored from a controller on every restart. + +The location and frequency of snapshotting can be +[configured using the db and dbSaveIntervalSeconds properties](../configuration/router#edge). + +## Controller Selection + +When creating [circuits](/learn/core-concepts/security/SessionsAndConnections.md#data-plane), +routers will chose the most responsive controller, based on latency. Network operators will +want to keep an eye on controllers to make sure they can keep up with the circuit creation +load they receive. + +When managing terminators, routers will try to talk directly to the current +cluster leader, since updates have to go through the leader. diff --git a/docusaurus/docs/reference/ha/topology.md b/docusaurus/docs/reference/ha/topology.md new file mode 100644 index 00000000..57934a08 --- /dev/null +++ b/docusaurus/docs/reference/ha/topology.md @@ -0,0 +1,81 @@ +--- +sidebar_label: Topology +sidebar_position: 60 +--- + +# Controller Topology + +This document discuss considerations for how many controllers a network might +need and how to place them geographically. + +## Number of Controllers + +### Management + +The first consideration is how many controllers the network should be able to lose without losing +functionality. A cluster of size N needs (N/2) + 1 controllers active and connected to be able +to take model updates, such as provisioning identities, adding/changes services and updating policies. + +Since a two node cluster will lose some functionality if either node becomes unavailable, a minimum +of 3 nodes is recommended. + +### Clients + +The functionality that controllers provide to clients doesn't require any specific number of controllers. +A network manager will want to scale the number controllers based on client demand and may want to +place additional controllers geographically close to clusters of clients for better performance. + +## Voting vs Non-Voting Members + +Because every model update must be approved by a quorum of voting members, adding a large number of voting +members can add a lot of latency to model changes. + +If more controllers are desired to scale out to meet client needs, only as many controllers as are needed +to meet availability requirements for mangement needs should be made into voting members. + +Additionally a having a quorum of controllers be geographically close will reduce latency without necessarily +reducing availability. + +### Example + +**Requirements** + +1. The network should be able to withstand the loss of 1 voting member +1. Controllers should exist in the US, EU and Asia, with 2 in each region. + +To be able to lose one voting member, we need 3 voting nodes, with 6 nodes total. + +We should place 2 voting members in the same region, but in different availability zones/data centers. +The third voting member should be in a different region. The rest of the controllers should be non-voting. + +**Proposed Layout** + +So, using AWS regions, we might have: + +* 1 in us-east-1 (voting) +* 1 in us-west-2 (voting) +* 1 in eu-west-3 (voting) +* 1 in eu-south-1 (non-voting) +* 1 in ap-southeast-4 (non-voting) +* 1 in ap-south-2 (non-voting) + +Assuming the leader is one of us-east-1 or us-west-2, model updates will only need to be accepted by +one relatively close node before being accepted. All other controllers will recieve the updates as well, +but updates won't be gated on communications with all of them. + +**Alternate** + +For even faster updates at the cost of an extra controller, two controllers could be in us-east, one in us-east-1 +and one in us-east-2. The third member could be in the eu. Updates would now only need to be approved by two +very close controllers. If one of them went down, updates would slow down, since updates would need to be done +over longer latencies, but they would still work. + +* 1 in us-east-1 (voting) +* 1 in us-east-2 (voting) +* 1 in us-west-2 (non-voting) +* 1 in eu-west-3 (voting) +* 1 in eu-south-1 (non-voting) +* 1 in ap-southeast-4 (non-voting) +* 1 in ap-south-2 (non-voting) + +