-
Notifications
You must be signed in to change notification settings - Fork 42
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Controller HA reference material. Fixes #929
- Loading branch information
Showing
2 changed files
with
171 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
--- | ||
sidebar_label: Routers | ||
sidebar_position: 40 | ||
--- | ||
|
||
# Routers in Controller HA | ||
|
||
There are only a few differences in how routers work in an HA cluster. | ||
|
||
## Configuration | ||
|
||
When enrolling routers, the JWT for a new router contains the list of | ||
controllers. When the router is enrolled, the controller endpoints | ||
configuration file is initialized with the list of controllers. | ||
|
||
This means that manually configuring the controllers for a router should | ||
no longer be required. | ||
|
||
### Endpoints File | ||
|
||
The router stores the current known controllers in an endpoints configuration | ||
file. | ||
|
||
Note that: | ||
|
||
* The endpoints file will be written whenever the router is notified of changes | ||
to the controller cluster. | ||
* The file is only read at router startup. | ||
* The file is not monitored, so changes made by administrators while the router | ||
is running won't take effect until the router is restarted, and may be | ||
overwritten by the router before it is restarted. Make sure the router is | ||
stopped before manually editting the file. | ||
* The endpoints file is only generated by enrollment and when the endpoints | ||
change. For an existing configuration with the routers specified in the | ||
router config, if the endpoints never change, the endpoints file will never | ||
be generated. | ||
|
||
#### Location | ||
|
||
By defaul the endpoints file will be named `endpoints` and will be placed | ||
in the same directory as the router config file. | ||
|
||
However, the file can be customized using a config file settings. | ||
|
||
```yaml | ||
ctrl: | ||
endpoints: | ||
- tls:ctrl1.ziti.example.com:1280 | ||
endpointsFile: /var/run/ziti/endpoints.yaml | ||
``` | ||
### Manual Controller Configuration | ||
Instead of specifying a single controller, multiple controllers can be specified | ||
in the router configuration. | ||
```yaml | ||
ctrl: | ||
endpoints: | ||
- tls:ctrl1.ziti.example.com:1280 | ||
- tls:ctrl2.ziti.example.com:1280 | ||
- tls:ctrl3.ziti.example.com:1280 | ||
``` | ||
If the controller cluster changes, it will notify routers of the updated | ||
controller endpoints. | ||
## Router Data Model | ||
The router receives a stripped down version of the controller data model. | ||
While the router data model can be disabled on the controller using a config | ||
setting in standalone mode, it is required for controller clusters, so that | ||
setting will be ignored. | ||
The data model on the router is periodically snapshotted, so it doesn't need to | ||
be fully restored from a controller on every restart. | ||
The location and frequency of snapshotting can be | ||
[configured using the db and dbSaveIntervalSeconds properties](../configuration/router#edge). | ||
## Controller Selection | ||
When creating [circuits](/learn/core-concepts/security/SessionsAndConnections.md#data-plane), | ||
routers will chose the most responsive controller, based on latency. Network operators will | ||
want to keep an eye on controllers to make sure they can keep up with the circuit creation | ||
load they receive. | ||
When managing terminators, routers will try to talk directly to the current | ||
cluster leader, since updates have to go through the leader. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
--- | ||
sidebar_label: Topology | ||
sidebar_position: 60 | ||
--- | ||
|
||
# Controller Topology | ||
|
||
This document discuss considerations for how many controllers a network might | ||
need and how to place them geographically. | ||
|
||
## Number of Controllers | ||
|
||
### Management | ||
|
||
The first consideration is how many controllers the network should be able to lose without losing | ||
functionality. A cluster of size N needs (N/2) + 1 controllers active and connected to be able | ||
to take model updates, such as provisioning identities, adding/changes services and updating policies. | ||
|
||
Since a two node cluster will lose some functionality if either node becomes unavailable, a minimum | ||
of 3 nodes is recommended. | ||
|
||
### Clients | ||
|
||
The functionality that controllers provide to clients doesn't require any specific number of controllers. | ||
A network manager will want to scale the number controllers based on client demand and may want to | ||
place additional controllers geographically close to clusters of clients for better performance. | ||
|
||
## Voting vs Non-Voting Members | ||
|
||
Because every model update must be approved by a quorum of voting members, adding a large number of voting | ||
members can add a lot of latency to model changes. | ||
|
||
If more controllers are desired to scale out to meet client needs, only as many controllers as are needed | ||
to meet availability requirements for mangement needs should be made into voting members. | ||
|
||
Additionally a having a quorum of controllers be geographically close will reduce latency without necessarily | ||
reducing availability. | ||
|
||
### Example | ||
|
||
**Requirements** | ||
|
||
1. The network should be able to withstand the loss of 1 voting member | ||
1. Controllers should exist in the US, EU and Asia, with 2 in each region. | ||
|
||
To be able to lose one voting member, we need 3 voting nodes, with 6 nodes total. | ||
|
||
We should place 2 voting members in the same region, but in different availability zones/data centers. | ||
The third voting member should be in a different region. The rest of the controllers should be non-voting. | ||
|
||
**Proposed Layout** | ||
|
||
So, using AWS regions, we might have: | ||
|
||
* 1 in us-east-1 (voting) | ||
* 1 in us-west-2 (voting) | ||
* 1 in eu-west-3 (voting) | ||
* 1 in eu-south-1 (non-voting) | ||
* 1 in ap-southeast-4 (non-voting) | ||
* 1 in ap-south-2 (non-voting) | ||
|
||
Assuming the leader is one of us-east-1 or us-west-2, model updates will only need to be accepted by | ||
one relatively close node before being accepted. All other controllers will recieve the updates as well, | ||
but updates won't be gated on communications with all of them. | ||
|
||
**Alternate** | ||
|
||
For even faster updates at the cost of an extra controller, two controllers could be in us-east, one in us-east-1 | ||
and one in us-east-2. The third member could be in the eu. Updates would now only need to be approved by two | ||
very close controllers. If one of them went down, updates would slow down, since updates would need to be done | ||
over longer latencies, but they would still work. | ||
|
||
* 1 in us-east-1 (voting) | ||
* 1 in us-east-2 (voting) | ||
* 1 in us-west-2 (non-voting) | ||
* 1 in eu-west-3 (voting) | ||
* 1 in eu-south-1 (non-voting) | ||
* 1 in ap-southeast-4 (non-voting) | ||
* 1 in ap-south-2 (non-voting) | ||
|
||
|