-
Notifications
You must be signed in to change notification settings - Fork 42
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Controller HA reference material. Fixes #929
- Loading branch information
Showing
4 changed files
with
323 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,135 @@ | ||
--- | ||
sidebar_label: Data Model | ||
sidebar_position: 80 | ||
--- | ||
|
||
# Controller HA Data Model | ||
|
||
:::info | ||
|
||
This document is likely most interesting for developers working on OpenZiti, | ||
those curious about how distributed systems work in general, or curious | ||
about how data is distributed in OpenZiti. | ||
|
||
::: | ||
|
||
## Model Data | ||
|
||
### Model Data Characteristics | ||
|
||
* All data required on every controller | ||
* Read characteristics | ||
* Reads happen all the time, from every client and as well as admins | ||
* Speed is very important. They affect how every client perceives the system. | ||
* Availability is very important. Without reading definitions, can’t create new connections | ||
* Can be against stale data, if we get consistency within a reasonable timeframe (seconds to | ||
minutes) | ||
* Write characteristics | ||
* Writes only happen from administrators | ||
* Speed needs to be reasonable, but doesn't need to be blazing fast | ||
* Write availability can be interrupted, since it primarily affects management operations | ||
* Must be consistent. Write validation can’t happen with stale data. Don’t want to have to deal | ||
with reconciling concurrent, contradictory write operations. | ||
* Generally involves controller to controller coordination | ||
|
||
Of the distribution mechanisms we looked at, RAFT had the best fit. | ||
|
||
### Raft Resources | ||
|
||
For a more in-depth look at Raft, see | ||
|
||
* https://raft.github.io/ | ||
* http://thesecretlivesofdata.com/raft/ | ||
|
||
### RAFT Characteristics | ||
|
||
* Writes | ||
* Consistency over availability | ||
* Good but not stellar performance | ||
* Reads | ||
* Every node has full state | ||
* Local state is always internally consistent, but maybe slightly behind the leader | ||
* No coordination required for reads | ||
* Fast reads | ||
* Reads work even when other nodes are unavailable | ||
* If latest data is desired, reads can be forwarded to the current leader | ||
|
||
So the OpenZiti controller uses RAFT to distribute the data model. Specifically it uses the | ||
[HashiCorp Raft Library](https://github.com/hashicorp/raft/). | ||
|
||
### Updates | ||
|
||
The basic flow for model updates is as follows: | ||
|
||
1. A client requests a model update via the REST API. | ||
2. The controller checks if it is the raft cluster leader. If it is not, it forwards the request to | ||
the leader. | ||
3. Once the request is on the leader, it applies the model update to the raft log. This involves | ||
getting a quorum of the controllers to accept the update. | ||
4. One the update has been accepted, it will be executed on each node of the cluster. This will | ||
generate create one or more changes to the bolt database. | ||
5. The results of the operation (success or failure) are returned to the controller which received | ||
the original REST request. | ||
6. The controller waits until the operation has been applied locally. | ||
7. The result is returned to the REST client. | ||
|
||
### Reads | ||
|
||
Reads are always done to the local bolt database for performance. The assumption is that if | ||
something like a policy change is delayed, it may temporarily allow a circuit to be created, but as | ||
soon as the policy update is applied, it will make changes to circuits as necessary. | ||
|
||
## Runtime Data | ||
|
||
In addition to model data, the controller also manages some amount of runtime data. This data is for | ||
running OpenZiti's core functions, i.e. managing the flow of data across the mesh, along with | ||
related authentication data. So this includes things like: | ||
|
||
* Links | ||
* Circuits | ||
* API Sessions | ||
* Sessions | ||
* Posture Data | ||
|
||
### Runtime Data Characteristics | ||
|
||
Runtime data has different characteristics than the model data does. | ||
|
||
* Not necessarily shared across controllers | ||
* Reads **and** writes must be very fast | ||
* Generally involves sdk to controller or controller to router coordination | ||
|
||
Because writes must also be fast, RAFT is not a good candidate for storing this data. Good | ||
performance is critical for these components, so they are each evaluated individually. | ||
|
||
### Links | ||
|
||
Each controller currently needs to know about links so that it can make routing decisions. However, | ||
links exist on routers. So, routers are the source of record for links. When a router connects to a | ||
controller, the router will tell the controller about any links that it already has. The controller | ||
will ask to fill in any missing links and the controller will ensure that it doesn't create | ||
duplicate links if multiple controllers request the same link be created. If there are duplicates, | ||
the router will inform the controller of the existing link. | ||
|
||
The allows the routers to properly handle link dials from multiple routers and keep controllers up | ||
to date with the current known links. | ||
|
||
### Circuits | ||
|
||
Circuits were and continue to be stored in memory for both standalone and HA mode | ||
controllers.Circuits are not distributed. Rather, each controller remains responsible for any | ||
circuits that it created. | ||
|
||
When a router needs to initiate circuit creation it will pick the one with the lowest response time | ||
and send a circuit creation request to that router. The controller will establish a route. Route | ||
tables as well as the xgress endpoints now track which controller is responsible for the associated | ||
circuit. This way when failures or other notifications need to be sent, the router knows which | ||
controller to talk to. | ||
|
||
This gets routing working with multiple controllers without a major refactor. Future work will | ||
likely delegate more routing control to the routers, so routing should get more robust and | ||
distributed over time. | ||
|
||
### Api Sessions, Sessions, Posture Data | ||
|
||
API Sessions and Sessions are moving to bearer tokens. Posture Data is now handled in the routers. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
--- | ||
sidebar_label: Migrating | ||
sidebar_position: 30 | ||
--- | ||
|
||
# Migrating Controllers | ||
|
||
A controller can be moved from standalone mode to HA mode. It can also be returned | ||
from HA mode back to standalone mode. | ||
|
||
## Standalone to HA | ||
|
||
### Requirements | ||
First, ensure that the controller's certificates and configuration meet the requirements | ||
in [Bootstrapping](./bootstrapping.md). | ||
|
||
### Data Model Migration | ||
The controller's data can be imported in one of two ways: | ||
|
||
**Using Config** | ||
|
||
Leave the `db: </path/to/ctrl.db/>` setting in the controller config. When the controller | ||
starts up, it will see that it's running in HA mode, but isn't initialized yet. It will | ||
try to use the database in the configuration to initialize its data model. | ||
|
||
**Using the Agent** | ||
|
||
The agent can also be used to provide the controller database to the controller. | ||
|
||
``` | ||
ziti agent controller init-from-db path/to/source.db | ||
``` | ||
|
||
Once the controller is initialized, it should start up as normal and be usable. | ||
The cluster can now be expanded as explained in [Bootstrapping](./bootstrapping.md). | ||
|
||
## HA to Standalone | ||
|
||
This assumes that you have a database snapshot from an HA cluster. This could either | ||
be the ctrl-ha.db from the `dataDir`, or a snapshot created using the snapshot | ||
CLI command. | ||
|
||
To revert back to standalone mode, the `raft` section would be removed from the | ||
config file and the `db:` section would be added back, pointing at the snapshot | ||
from the HA cluster. Now when started, it should come up in standalone mode. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
--- | ||
sidebar_label: Routers | ||
sidebar_position: 40 | ||
--- | ||
|
||
# Routers in Controller HA | ||
|
||
There are only a few differences in how routers work in an HA cluster. | ||
|
||
## Configuration | ||
|
||
Instead of specifying a single controller, you can specify multiple controllers | ||
in the router configuration. | ||
|
||
```yaml | ||
ctrl: | ||
endpoints: | ||
- tls:192.168.3.100:6262 | ||
- tls:192.168.3.101:6262 | ||
- tls:192.168.3.102:6262 | ||
``` | ||
If the controller cluster changes, it will notify routers of the updated | ||
controller endpoints. | ||
By default these will be stored in a file named `endpoints` in the same directory | ||
as the router config file. | ||
|
||
However, the file can be customized using a config file settings. | ||
|
||
```yaml | ||
ctrl: | ||
endpoints: | ||
- tls:192.168.3.100:6262 | ||
endpointsFile: /var/run/ziti/endpoints.yaml | ||
``` | ||
|
||
In general, a router should only need one or two controllers to bootstrap itself, | ||
and thereafter should be able to keep the endpoints list up to date with help | ||
from the controller. | ||
|
||
## Router Data Model | ||
|
||
In order to enable HA functionality, the router now receives a stripped down | ||
version of the controller data model. While required for controller HA, this | ||
also enables other optimizations, so use of the router data model is also enabled | ||
by default when running in standalone mode. | ||
|
||
The router data model can be disabled on the controller using a config setting, | ||
but since it is required for HA, that flag will be ignored if the controllers | ||
are running in a cluster. | ||
|
||
The data model on the router is periodically snapshotted, so it doesn't need to | ||
be fully restored from a controller on every restart. | ||
|
||
The location and frequency of snapshotting can be [configured](../configuration/router#edge). | ||
|
||
## Controller Selection | ||
|
||
When creating circuits, routers will chose the most responsive controller, based on latency. | ||
When doing model updates, such as managing terminators, they will try to talk directly to | ||
the current cluster leader, since updates have to go through the leader in any case. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
--- | ||
sidebar_label: Topology | ||
sidebar_position: 60 | ||
--- | ||
|
||
# Controller Topology | ||
|
||
This document discuss considerations for how many controllers a network might | ||
need and how to place them geographically. | ||
|
||
## Number of Controllers | ||
|
||
### Management | ||
|
||
The first consideration is how many controllers the network should be able to lose without losing | ||
functionality. A cluster of size N needs (N/2) + 1 controllers active and connected to be able | ||
to take model updates, such as provisioning identities, adding/changes services and updating policies. | ||
|
||
Since a two node cluster will lose some functionality if either node becomes unavailable, a minimum | ||
of 3 nodes is recommended. | ||
|
||
### Clients | ||
|
||
The functionality that controllers provide to clients doesn't require any specific number of controllers. | ||
A network manager will want to scale the number controllers based on client demand and may want to | ||
place additional controllers geographically close to clusters of clients for better performance. | ||
|
||
## Voting vs Non-Voting Members | ||
|
||
Because every model update must be approved by a quorum of voting members, adding a large number of voting | ||
members can add a lot of latency to model changes. | ||
|
||
If more controllers are desired to scale out to meet client needs, only as many controllers as are needed | ||
to meet availability requirements for mangement needs should be made into voting members. | ||
|
||
Additionally a having a quorum of controllers be geographically close will reduce latency without necessarily | ||
reducing availability. | ||
|
||
### Example | ||
|
||
**Requirements** | ||
|
||
1. The network should be able to withstand the loss of 1 voting member | ||
1. Controllers should exist in the US, EU and Asia, with 2 in each region. | ||
|
||
To be able to lose one voting member, we need 3 voting nodes, with 6 nodes total. | ||
|
||
We should place 2 voting members in the same region, but in different availability zones/data centers. | ||
The third voting member should be in a different region. The rest of the controllers should be non-voting. | ||
|
||
**Proposed Layout** | ||
|
||
So, using AWS regions, we might have: | ||
|
||
* 1 in us-east-1 (voting) | ||
* 1 in us-west-2 (voting) | ||
* 1 in eu-west-3 (voting) | ||
* 1 in eu-south-1 (non-voting) | ||
* 1 in ap-southeast-4 (non-voting) | ||
* 1 in ap-south-2 (non-voting) | ||
|
||
Assuming the leader is one of us-east-1 or us-west-2, model updates will only need to be accepted by | ||
one relatively close node before being accepted. All other controllers will recieve the updates as well, | ||
but updates won't be gated on communications with all of them. | ||
|
||
**Alternate** | ||
|
||
For even faster updates at the cost of an extra controller, two controllers could be in us-east, one in us-east-1 | ||
and one in us-east-2. The third member could be in the eu. Updates would now only need to be approved by two | ||
very close controllers. If one of them went down, updates would slow down, since updates would need to be done | ||
over longer latencies, but they would still work. | ||
|
||
* 1 in us-east-1 (voting) | ||
* 1 in us-east-2 (voting) | ||
* 1 in us-west-2 (non-voting) | ||
* 1 in eu-west-3 (voting) | ||
* 1 in eu-south-1 (non-voting) | ||
* 1 in ap-southeast-4 (non-voting) | ||
* 1 in ap-south-2 (non-voting) | ||
|
||
|