Skip to content

Commit

Permalink
Final pieces of controller clustering documentation. Fixes #929
Browse files Browse the repository at this point in the history
  • Loading branch information
plorenz committed Feb 7, 2025
1 parent c022517 commit cada138
Show file tree
Hide file tree
Showing 3 changed files with 177 additions and 0 deletions.
5 changes: 5 additions & 0 deletions docusaurus/docs/reference/ha/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,3 +131,8 @@ The following limitations currently apply:

Improving routing is an ongoing focus for the OpenZiti project.
Issues related to routing improvments can be found on the [Routing Project Board](https://github.com/orgs/openziti/projects/13/views/1).

## Quickstart

The quickstart supports running in clustered mode, see
[this guide for more information](https://github.com/openziti/ziti/blob/main/doc/ha/quickstart.md).
90 changes: 90 additions & 0 deletions docusaurus/docs/reference/ha/routers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
---
sidebar_label: Routers
sidebar_position: 40
---

# Routers in Controller HA

There are only a few differences in how routers work in an HA cluster.

## Configuration

When enrolling routers, the JWT for a new router contains the list of
controllers. When the router is enrolled, the controller endpoints
configuration file is initialized with the list of controllers.

This means that manually configuring the controllers for a router should
no longer be required.

### Endpoints File

The router stores the current known controllers in an endpoints configuration
file.

Note that:

* The endpoints file will be written whenever the router is notified of changes
to the controller cluster.
* The file is only read at router startup.
* The file is not monitored, so changes made by administrators while the router
is running won't take effect until the router is restarted, and may be
overwritten by the router before it is restarted. Make sure the router is
stopped before manually editting the file.
* The endpoints file is only generated by enrollment and when the endpoints
change. For an existing configuration with the routers specified in the
router config, if the endpoints never change, the endpoints file will never
be generated.

#### Location

By defaul the endpoints file will be named `endpoints` and will be placed
in the same directory as the router config file.

However, the file can be customized using a config file settings.

```yaml
ctrl:
endpoints:
- tls:ctrl1.ziti.example.com:1280
endpointsFile: /var/run/ziti/endpoints.yaml
```
### Manual Controller Configuration
Instead of specifying a single controller, multiple controllers can be specified
in the router configuration.
```yaml
ctrl:
endpoints:
- tls:ctrl1.ziti.example.com:1280
- tls:ctrl2.ziti.example.com:1280
- tls:ctrl3.ziti.example.com:1280
```
If the controller cluster changes, it will notify routers of the updated
controller endpoints.
## Router Data Model
The router receives a stripped down version of the controller data model.
While the router data model can be disabled on the controller using a config
setting in standalone mode, it is required for controller clusters, so that
setting will be ignored.
The data model on the router is periodically snapshotted, so it doesn't need to
be fully restored from a controller on every restart.
The location and frequency of snapshotting can be
[configured using the db and dbSaveIntervalSeconds properties](../configuration/router#edge).
## Controller Selection
When creating [circuits](/learn/core-concepts/security/SessionsAndConnections.md#data-plane),
routers will chose the most responsive controller, based on latency. Network operators will
want to keep an eye on controllers to make sure they can keep up with the circuit creation
load they receive.
When managing terminators, routers will try to talk directly to the current
cluster leader, since updates have to go through the leader.
82 changes: 82 additions & 0 deletions docusaurus/docs/reference/ha/topology.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
---
sidebar_label: Topology
sidebar_position: 60
---

# Controller Topology

his document discusses cluster size and member placement.

## Number of Controllers

### Management

The first consideration is how many controllers the network should be able to lose without losing
functionality. A cluster of size N needs (N/2) + 1 voting members connected to be able
to take model updates, such as provisioning identities, adding/changes services and updating policies.

Since a two node cluster will lose some functionality if either node becomes unavailable, a minimum
of 3 nodes is recommended.

### Clients

The functionality that controllers provide to clients doesn't require any specific number of controllers.
A network manager will want to scale the number controllers based on client demand and may want to
place additional controllers geographically close to clusters of clients for better performance.

## Voting vs Non-Voting Members

Because every model update must be approved by a quorum of voting members, adding a large number of voting
members can add a lot of latency to model changes. A three node cluster in the same data center would
likely need a few 10s of milliseconds. A cluster with a quorum on a single continent might take a hundred
milliseconds, and one that had to traverse large portions of the globe might take half a second.

If the network has enough voting members to meet availability needs, then additional controllers added
for performance reasons should be added as non-voting members.

Additionally, having a quorum of controllers be geographically close will reduce latency without necessarily
reducing availability.

### Example

**Requirements**

1. The network should be able to withstand the loss of one voting member
1. Controllers should exist in the US, EU and Asia, with two in each region.

To be able to lose one voting member, we need three voting nodes, with six nodes total.

We should place 2 voting members in the same region, but in different availability zones/data centers.
The third voting member should be in a different region. The rest of the controllers should be non-voting.

**Proposed Layout**

So, using AWS regions, the network might have:

* One in us-east-1 (voting)
* One in us-west-2 (voting)
* One in eu-west-3 (voting)
* One in eu-south-1 (non-voting)
* One in ap-southeast-4 (non-voting)
* One in ap-south-2 (non-voting)

Assuming the leader is one of us-east-1 or us-west-2, model updates will only need to be accepted by
one relatively close node before being accepted. All other controllers will recieve the updates as well,
but updates won't be gated on communications with all of them.

**Alternate**

For even faster updates at the cost of an extra controller, two controllers could be in the US Eastern DC: one in us-east-1
and one in us-east-2. The third member could be in the EU. Updates would now only need to be approved by two
very close controllers. If one of them went down, updates would slow down, since updates would need to be done
over longer latencies, but they would still work.

* One in us-east-1 (voting)
* One in us-east-2 (voting)
* One in us-west-2 (non-voting)
* One in eu-west-3 (voting)
* One in eu-south-1 (non-voting)
* One in ap-southeast-4 (non-voting)
* One in ap-south-2 (non-voting)


0 comments on commit cada138

Please sign in to comment.