Skip to content

Commit

Permalink
Add Controller HA reference material. Fixes #929
Browse files Browse the repository at this point in the history
  • Loading branch information
plorenz committed Feb 7, 2025
1 parent bb7accd commit 9e397dd
Show file tree
Hide file tree
Showing 2 changed files with 171 additions and 0 deletions.
90 changes: 90 additions & 0 deletions docusaurus/docs/reference/ha/routers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
---
sidebar_label: Routers
sidebar_position: 40
---

# Routers in Controller HA

There are only a few differences in how routers work in an HA cluster.

## Configuration

When enrolling routers, the JWT for a new router contains the list of
controllers. When the router is enrolled, the controller endpoints
configuration file is initialized with the list of controllers.

This means that manually configuring the controllers for a router should
no longer be required.

### Endpoints File

The router stores the current known controllers in an endpoints configuration
file.

Note that:

* The endpoints file will be written whenever the router is notified of changes
to the controller cluster.
* The file is only read at router startup.
* The file is not monitored, so changes made by administrators while the router
is running won't take effect until the router is restarted, and may be
overwritten by the router before it is restarted. Make sure the router is
stopped before manually editting the file.
* The endpoints file is only generated by enrollment and when the endpoints
change. For an existing configuration with the routers specified in the
router config, if the endpoints never change, the endpoints file will never
be generated.

#### Location

By defaul the endpoints file will be named `endpoints` and will be placed
in the same directory as the router config file.

However, the file can be customized using a config file settings.

```yaml
ctrl:
endpoints:
- tls:ctrl1.ziti.example.com:1280
endpointsFile: /var/run/ziti/endpoints.yaml
```
### Manual Controller Configuration
Instead of specifying a single controller, multiple controllers can be specified
in the router configuration.
```yaml
ctrl:
endpoints:
- tls:ctrl1.ziti.example.com:1280
- tls:ctrl2.ziti.example.com:1280
- tls:ctrl3.ziti.example.com:1280
```
If the controller cluster changes, it will notify routers of the updated
controller endpoints.
## Router Data Model
The router receives a stripped down version of the controller data model.
While the router data model can be disabled on the controller using a config
setting in standalone mode, it is required for controller clusters, so that
setting will be ignored.
The data model on the router is periodically snapshotted, so it doesn't need to
be fully restored from a controller on every restart.
The location and frequency of snapshotting can be
[configured using the db and dbSaveIntervalSeconds properties](../configuration/router#edge).
## Controller Selection
When creating [circuits](/learn/core-concepts/security/SessionsAndConnections.md#data-plane),
routers will chose the most responsive controller, based on latency. Network operators will
want to keep an eye on controllers to make sure they can keep up with the circuit creation
load they receive.
When managing terminators, routers will try to talk directly to the current
cluster leader, since updates have to go through the leader.
81 changes: 81 additions & 0 deletions docusaurus/docs/reference/ha/topology.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
---
sidebar_label: Topology
sidebar_position: 60
---

# Controller Topology

This document discuss considerations for how many controllers a network might
need and how to place them geographically.

## Number of Controllers

### Management

The first consideration is how many controllers the network should be able to lose without losing
functionality. A cluster of size N needs (N/2) + 1 controllers active and connected to be able
to take model updates, such as provisioning identities, adding/changes services and updating policies.

Since a two node cluster will lose some functionality if either node becomes unavailable, a minimum
of 3 nodes is recommended.

### Clients

The functionality that controllers provide to clients doesn't require any specific number of controllers.
A network manager will want to scale the number controllers based on client demand and may want to
place additional controllers geographically close to clusters of clients for better performance.

## Voting vs Non-Voting Members

Because every model update must be approved by a quorum of voting members, adding a large number of voting
members can add a lot of latency to model changes.

If more controllers are desired to scale out to meet client needs, only as many controllers as are needed
to meet availability requirements for mangement needs should be made into voting members.

Additionally a having a quorum of controllers be geographically close will reduce latency without necessarily
reducing availability.

### Example

**Requirements**

1. The network should be able to withstand the loss of 1 voting member
1. Controllers should exist in the US, EU and Asia, with 2 in each region.

To be able to lose one voting member, we need 3 voting nodes, with 6 nodes total.

We should place 2 voting members in the same region, but in different availability zones/data centers.
The third voting member should be in a different region. The rest of the controllers should be non-voting.

**Proposed Layout**

So, using AWS regions, we might have:

* 1 in us-east-1 (voting)
* 1 in us-west-2 (voting)
* 1 in eu-west-3 (voting)
* 1 in eu-south-1 (non-voting)
* 1 in ap-southeast-4 (non-voting)
* 1 in ap-south-2 (non-voting)

Assuming the leader is one of us-east-1 or us-west-2, model updates will only need to be accepted by
one relatively close node before being accepted. All other controllers will recieve the updates as well,
but updates won't be gated on communications with all of them.

**Alternate**

For even faster updates at the cost of an extra controller, two controllers could be in us-east, one in us-east-1
and one in us-east-2. The third member could be in the eu. Updates would now only need to be approved by two
very close controllers. If one of them went down, updates would slow down, since updates would need to be done
over longer latencies, but they would still work.

* 1 in us-east-1 (voting)
* 1 in us-east-2 (voting)
* 1 in us-west-2 (non-voting)
* 1 in eu-west-3 (voting)
* 1 in eu-south-1 (non-voting)
* 1 in ap-southeast-4 (non-voting)
* 1 in ap-south-2 (non-voting)


0 comments on commit 9e397dd

Please sign in to comment.