Add Controller HA reference material. Fixes #929

openziti · Feb 6, 2025 · 5d5328c · 5d5328c
1 parent b2a25fa
commit 5d5328c
Show file tree

Hide file tree

Showing 4 changed files with 323 additions and 0 deletions.
diff --git a/docusaurus/docs/reference/ha/data-model.md b/docusaurus/docs/reference/ha/data-model.md
@@ -0,0 +1,135 @@
+---
+sidebar_label: Data Model
+sidebar_position: 80
+---
+
+# Controller HA Data Model
+
+:::info
+
+This document is likely most interesting for developers working on OpenZiti,
+those curious about how distributed systems work in general, or curious
+about how data is distributed in OpenZiti.
+
+:::
+
+## Model Data
+
+### Model Data Characteristics
+
+* All data required on every controller
+* Read characteristics
+    * Reads happen all the time, from every client and as well as admins
+    * Speed is very important. They affect how every client perceives the system.
+    * Availability is very important. Without reading definitions, can’t create new connections
+    * Can be against stale data, if we get consistency within a reasonable timeframe (seconds to
+      minutes)
+* Write characteristics
+    * Writes only happen from administrators
+    * Speed needs to be reasonable, but doesn't need to be blazing fast
+    * Write availability can be interrupted, since it primarily affects management operations
+    * Must be consistent. Write validation can’t happen with stale data. Don’t want to have to deal
+      with reconciling concurrent, contradictory write operations.
+* Generally involves controller to controller coordination
+
+Of the distribution mechanisms we looked at, RAFT had the best fit.
+
+### Raft Resources
+
+For a more in-depth look at Raft, see
+
+* https://raft.github.io/
+* http://thesecretlivesofdata.com/raft/
+
+### RAFT Characteristics
+
+* Writes
+    * Consistency over availability
+    * Good but not stellar performance
+* Reads
+    * Every node has full state
+    * Local state is always internally consistent, but maybe slightly behind the leader
+    * No coordination required for reads
+    * Fast reads
+    * Reads work even when other nodes are unavailable
+    * If latest data is desired, reads can be forwarded to the current leader
+
+So the OpenZiti controller uses RAFT to distribute the data model. Specifically it uses the
+[HashiCorp Raft Library](https://github.com/hashicorp/raft/).
+
+### Updates
+
+The basic flow for model updates is as follows:
+
+1. A client requests a model update via the REST API.
+2. The controller checks if it is the raft cluster leader. If it is not, it forwards the request to
+   the leader.
+3. Once the request is on the leader, it applies the model update to the raft log. This involves
+   getting a quorum of the controllers to accept the update.
+4. One the update has been accepted, it will be executed on each node of the cluster. This will
+   generate create one or more changes to the bolt database.
+5. The results of the operation (success or failure) are returned to the controller which received
+   the original REST request.
+6. The controller waits until the operation has been applied locally.
+7. The result is returned to the REST client.
+
+### Reads
+
+Reads are always done to the local bolt database for performance. The assumption is that if
+something like a policy change is delayed, it may temporarily allow a circuit to be created, but as
+soon as the policy update is applied, it will make changes to circuits as necessary.
+
+## Runtime Data
+
+In addition to model data, the controller also manages some amount of runtime data. This data is for
+running OpenZiti's core functions, i.e. managing the flow of data across the mesh, along with
+related authentication data. So this includes things like:
+
+* Links
+* Circuits
+* API Sessions
+* Sessions
+* Posture Data
+
+### Runtime Data Characteristics
+
+Runtime data has different characteristics than the model data does.
+
+* Not necessarily shared across controllers
+* Reads **and** writes must be very fast
+* Generally involves sdk to controller or controller to router coordination
+
+Because writes must also be fast, RAFT is not a good candidate for storing this data. Good
+performance is critical for these components, so they are each evaluated individually.
+
+### Links
+
+Each controller currently needs to know about links so that it can make routing decisions. However,
+links exist on routers. So, routers are the source of record for links. When a router connects to a
+controller, the router will tell the controller about any links that it already has. The controller
+will ask to fill in any missing links and the controller will ensure that it doesn't create
+duplicate links if multiple controllers request the same link be created. If there are duplicates,
+the router will inform the controller of the existing link.
+
+The allows the routers to properly handle link dials from multiple routers and keep controllers up
+to date with the current known links.
+
+### Circuits
+
+Circuits were and continue to be stored in memory for both standalone and HA mode
+controllers.Circuits are not distributed. Rather, each controller remains responsible for any
+circuits that it created.
+
+When a router needs to initiate circuit creation it will pick the one with the lowest response time
+and send a circuit creation request to that router. The controller will establish a route. Route
+tables as well as the xgress endpoints now track which controller is responsible for the associated
+circuit. This way when failures or other notifications need to be sent, the router knows which
+controller to talk to.
+
+This gets routing working with multiple controllers without a major refactor. Future work will
+likely delegate more routing control to the routers, so routing should get more robust and
+distributed over time.
+
+### Api Sessions, Sessions, Posture Data
+
+API Sessions and Sessions are moving to bearer tokens. Posture Data is now handled in the routers.
diff --git a/docusaurus/docs/reference/ha/migrating.md b/docusaurus/docs/reference/ha/migrating.md
@@ -0,0 +1,45 @@
+---
+sidebar_label: Migrating
+sidebar_position: 30
+---
+
+# Migrating Controllers
+
+A controller can be moved from standalone mode to HA mode. It can also be returned
+from HA mode back to standalone mode.
+
+## Standalone to HA
+
+### Requirements
+First, ensure that the controller's certificates and configuration meet the requirements
+in [Bootstrapping](./bootstrapping.md).
+
+### Data Model Migration
+The controller's data can be imported in one of two ways:
+
+**Using Config**
+
+Leave the `db: </path/to/ctrl.db/>` setting in the controller config. When the controller
+starts up, it will see that it's running in HA mode, but isn't initialized yet. It will
+try to use the database in the configuration to initialize its data model.
+
+**Using the Agent**
+
+The agent can also be used to provide the controller database to the controller.
+
+```
+ziti agent controller init-from-db path/to/source.db
+```
+
+Once the controller is initialized, it should start up as normal and be usable.
+The cluster can now be expanded as explained in [Bootstrapping](./bootstrapping.md).
+
+## HA to Standalone
+
+This assumes that you have a database snapshot from an HA cluster. This could either
+be the ctrl-ha.db from the `dataDir`, or a snapshot created using the snapshot 
+CLI command. 
+
+To revert back to standalone mode, the `raft` section would be removed from the
+config file and the `db:` section would be added back, pointing at the snapshot
+from the HA cluster. Now when started, it should come up in standalone mode.
diff --git a/docusaurus/docs/reference/ha/routers.md b/docusaurus/docs/reference/ha/routers.md
@@ -0,0 +1,62 @@
+---
+sidebar_label: Routers
+sidebar_position: 40
+---
+
+# Routers in Controller HA
+
+There are only a few differences in how routers work in an HA cluster.
+
+## Configuration
+
+Instead of specifying a single controller, you can specify multiple controllers
+in the router configuration.
+
+```yaml
+ctrl:
+  endpoints:
+    - tls:192.168.3.100:6262
+    - tls:192.168.3.101:6262
+    - tls:192.168.3.102:6262
+```
+
+If the controller cluster changes, it will notify routers of the updated 
+controller endpoints. 
+
+By default these will be stored in a file named `endpoints` in the same directory
+as the router config file.
+
+However, the file can be customized using a config file settings.
+
+```yaml
+ctrl:
+  endpoints:
+    - tls:192.168.3.100:6262
+  endpointsFile: /var/run/ziti/endpoints.yaml
+```
+
+In general, a router should only need one or two controllers to bootstrap itself,
+and thereafter should be able to keep the endpoints list up to date with help 
+from the controller.
+
+## Router Data Model
+
+In order to enable HA functionality, the router now receives a stripped down 
+version of the controller data model. While required for controller HA, this 
+also enables other optimizations, so use of the router data model is also enabled
+by default when running in standalone mode. 
+
+The router data model can be disabled on the controller using a config setting,
+but since it is required for HA, that flag will be ignored if the controllers
+are running in a cluster.
+
+The data model on the router is periodically snapshotted, so it doesn't need to
+be fully restored from a controller on every restart. 
+
+The location and frequency of snapshotting can be [configured](../configuration/router#edge).
+
+## Controller Selection
+
+When creating circuits, routers will chose the most responsive controller, based on latency.
+When doing model updates, such as managing terminators, they will try to talk directly to
+the current cluster leader, since updates have to go through the leader in any case.
diff --git a/docusaurus/docs/reference/ha/topology.md b/docusaurus/docs/reference/ha/topology.md
@@ -0,0 +1,81 @@
+---
+sidebar_label: Topology
+sidebar_position: 60
+---
+
+# Controller Topology
+
+This document discuss considerations for how many controllers a network might 
+need and how to place them geographically.
+
+## Number of Controllers
+
+### Management
+
+The first consideration is how many controllers the network should be able to lose without losing
+functionality. A cluster of size N needs (N/2) + 1 controllers active and connected to be able
+to take model updates, such as provisioning identities, adding/changes services and updating policies.
+
+Since a two node cluster will lose some functionality if either node becomes unavailable, a minimum
+of 3 nodes is recommended.
+
+### Clients
+
+The functionality that controllers provide to clients doesn't require any specific number of controllers.
+A network manager will want to scale the number controllers based on client demand and may want to 
+place additional controllers geographically close to clusters of clients for better performance.
+
+## Voting vs Non-Voting Members
+
+Because every model update must be approved by a quorum of voting members, adding a large number of voting
+members can add a lot of latency to model changes. 
+
+If more controllers are desired to scale out to meet client needs, only as many controllers as are needed
+to meet availability requirements for mangement needs should be made into voting members.
+
+Additionally a having a quorum of controllers be geographically close will reduce latency without necessarily
+reducing availability.
+
+### Example
+
+**Requirements**
+
+1. The network should be able to withstand the loss of 1 voting member
+1. Controllers should exist in the US, EU and Asia, with 2 in each region. 
+
+To be able to lose one voting member, we need 3 voting nodes, with 6 nodes total.
+
+We should place 2 voting members in the same region, but in different availability zones/data centers.
+The third voting member should be in a different region. The rest of the controllers should be non-voting.
+
+**Proposed Layout**
+
+So, using AWS regions, we might have:
+
+* 1 in us-east-1 (voting)
+* 1 in us-west-2 (voting)
+* 1 in eu-west-3 (voting)
+* 1 in eu-south-1 (non-voting)
+* 1 in ap-southeast-4 (non-voting)
+* 1 in ap-south-2 (non-voting)
+
+Assuming the leader is one of us-east-1 or us-west-2, model updates will only need to be accepted by 
+one relatively close node before being accepted. All other controllers will recieve the updates as well,
+but updates won't be gated on communications with all of them.
+
+**Alternate**
+
+For even faster updates at the cost of an extra controller, two controllers could be in us-east, one in us-east-1
+and one in us-east-2. The third member could be in the eu. Updates would now only need to be approved by two 
+very close controllers. If one of them went down, updates would slow down, since updates would need to be done
+over longer latencies, but they would still work.
+
+* 1 in us-east-1 (voting)
+* 1 in us-east-2 (voting)
+* 1 in us-west-2 (non-voting)
+* 1 in eu-west-3 (voting)
+* 1 in eu-south-1 (non-voting)
+* 1 in ap-southeast-4 (non-voting)
+* 1 in ap-south-2 (non-voting)
+
+