Skip to content

Commit

Permalink
Add Controller HA reference material. Fixes #929
Browse files Browse the repository at this point in the history
  • Loading branch information
plorenz committed Feb 6, 2025
1 parent b2a25fa commit 5d5328c
Show file tree
Hide file tree
Showing 4 changed files with 323 additions and 0 deletions.
135 changes: 135 additions & 0 deletions docusaurus/docs/reference/ha/data-model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
---
sidebar_label: Data Model
sidebar_position: 80
---

# Controller HA Data Model

:::info

This document is likely most interesting for developers working on OpenZiti,
those curious about how distributed systems work in general, or curious
about how data is distributed in OpenZiti.

:::

## Model Data

### Model Data Characteristics

* All data required on every controller
* Read characteristics
* Reads happen all the time, from every client and as well as admins
* Speed is very important. They affect how every client perceives the system.
* Availability is very important. Without reading definitions, can’t create new connections
* Can be against stale data, if we get consistency within a reasonable timeframe (seconds to
minutes)
* Write characteristics
* Writes only happen from administrators
* Speed needs to be reasonable, but doesn't need to be blazing fast
* Write availability can be interrupted, since it primarily affects management operations
* Must be consistent. Write validation can’t happen with stale data. Don’t want to have to deal
with reconciling concurrent, contradictory write operations.
* Generally involves controller to controller coordination

Of the distribution mechanisms we looked at, RAFT had the best fit.

### Raft Resources

For a more in-depth look at Raft, see

* https://raft.github.io/
* http://thesecretlivesofdata.com/raft/

### RAFT Characteristics

* Writes
* Consistency over availability
* Good but not stellar performance
* Reads
* Every node has full state
* Local state is always internally consistent, but maybe slightly behind the leader
* No coordination required for reads
* Fast reads
* Reads work even when other nodes are unavailable
* If latest data is desired, reads can be forwarded to the current leader

So the OpenZiti controller uses RAFT to distribute the data model. Specifically it uses the
[HashiCorp Raft Library](https://github.com/hashicorp/raft/).

### Updates

The basic flow for model updates is as follows:

1. A client requests a model update via the REST API.
2. The controller checks if it is the raft cluster leader. If it is not, it forwards the request to
the leader.
3. Once the request is on the leader, it applies the model update to the raft log. This involves
getting a quorum of the controllers to accept the update.
4. One the update has been accepted, it will be executed on each node of the cluster. This will
generate create one or more changes to the bolt database.
5. The results of the operation (success or failure) are returned to the controller which received
the original REST request.
6. The controller waits until the operation has been applied locally.
7. The result is returned to the REST client.

### Reads

Reads are always done to the local bolt database for performance. The assumption is that if
something like a policy change is delayed, it may temporarily allow a circuit to be created, but as
soon as the policy update is applied, it will make changes to circuits as necessary.

## Runtime Data

In addition to model data, the controller also manages some amount of runtime data. This data is for
running OpenZiti's core functions, i.e. managing the flow of data across the mesh, along with
related authentication data. So this includes things like:

* Links
* Circuits
* API Sessions
* Sessions
* Posture Data

### Runtime Data Characteristics

Runtime data has different characteristics than the model data does.

* Not necessarily shared across controllers
* Reads **and** writes must be very fast
* Generally involves sdk to controller or controller to router coordination

Because writes must also be fast, RAFT is not a good candidate for storing this data. Good
performance is critical for these components, so they are each evaluated individually.

### Links

Each controller currently needs to know about links so that it can make routing decisions. However,
links exist on routers. So, routers are the source of record for links. When a router connects to a
controller, the router will tell the controller about any links that it already has. The controller
will ask to fill in any missing links and the controller will ensure that it doesn't create
duplicate links if multiple controllers request the same link be created. If there are duplicates,
the router will inform the controller of the existing link.

The allows the routers to properly handle link dials from multiple routers and keep controllers up
to date with the current known links.

### Circuits

Circuits were and continue to be stored in memory for both standalone and HA mode
controllers.Circuits are not distributed. Rather, each controller remains responsible for any
circuits that it created.

When a router needs to initiate circuit creation it will pick the one with the lowest response time
and send a circuit creation request to that router. The controller will establish a route. Route
tables as well as the xgress endpoints now track which controller is responsible for the associated
circuit. This way when failures or other notifications need to be sent, the router knows which
controller to talk to.

This gets routing working with multiple controllers without a major refactor. Future work will
likely delegate more routing control to the routers, so routing should get more robust and
distributed over time.

### Api Sessions, Sessions, Posture Data

API Sessions and Sessions are moving to bearer tokens. Posture Data is now handled in the routers.
45 changes: 45 additions & 0 deletions docusaurus/docs/reference/ha/migrating.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
---
sidebar_label: Migrating
sidebar_position: 30
---

# Migrating Controllers

A controller can be moved from standalone mode to HA mode. It can also be returned
from HA mode back to standalone mode.

## Standalone to HA

### Requirements
First, ensure that the controller's certificates and configuration meet the requirements
in [Bootstrapping](./bootstrapping.md).

### Data Model Migration
The controller's data can be imported in one of two ways:

**Using Config**

Leave the `db: </path/to/ctrl.db/>` setting in the controller config. When the controller
starts up, it will see that it's running in HA mode, but isn't initialized yet. It will
try to use the database in the configuration to initialize its data model.

**Using the Agent**

The agent can also be used to provide the controller database to the controller.

```
ziti agent controller init-from-db path/to/source.db
```

Once the controller is initialized, it should start up as normal and be usable.
The cluster can now be expanded as explained in [Bootstrapping](./bootstrapping.md).

## HA to Standalone

This assumes that you have a database snapshot from an HA cluster. This could either
be the ctrl-ha.db from the `dataDir`, or a snapshot created using the snapshot
CLI command.

To revert back to standalone mode, the `raft` section would be removed from the
config file and the `db:` section would be added back, pointing at the snapshot
from the HA cluster. Now when started, it should come up in standalone mode.
62 changes: 62 additions & 0 deletions docusaurus/docs/reference/ha/routers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
---
sidebar_label: Routers
sidebar_position: 40
---

# Routers in Controller HA

There are only a few differences in how routers work in an HA cluster.

## Configuration

Instead of specifying a single controller, you can specify multiple controllers
in the router configuration.

```yaml
ctrl:
endpoints:
- tls:192.168.3.100:6262
- tls:192.168.3.101:6262
- tls:192.168.3.102:6262
```
If the controller cluster changes, it will notify routers of the updated
controller endpoints.
By default these will be stored in a file named `endpoints` in the same directory
as the router config file.

However, the file can be customized using a config file settings.

```yaml
ctrl:
endpoints:
- tls:192.168.3.100:6262
endpointsFile: /var/run/ziti/endpoints.yaml
```

In general, a router should only need one or two controllers to bootstrap itself,
and thereafter should be able to keep the endpoints list up to date with help
from the controller.

## Router Data Model

In order to enable HA functionality, the router now receives a stripped down
version of the controller data model. While required for controller HA, this
also enables other optimizations, so use of the router data model is also enabled
by default when running in standalone mode.

The router data model can be disabled on the controller using a config setting,
but since it is required for HA, that flag will be ignored if the controllers
are running in a cluster.

The data model on the router is periodically snapshotted, so it doesn't need to
be fully restored from a controller on every restart.

The location and frequency of snapshotting can be [configured](../configuration/router#edge).

## Controller Selection

When creating circuits, routers will chose the most responsive controller, based on latency.
When doing model updates, such as managing terminators, they will try to talk directly to
the current cluster leader, since updates have to go through the leader in any case.
81 changes: 81 additions & 0 deletions docusaurus/docs/reference/ha/topology.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
---
sidebar_label: Topology
sidebar_position: 60
---

# Controller Topology

This document discuss considerations for how many controllers a network might
need and how to place them geographically.

## Number of Controllers

### Management

The first consideration is how many controllers the network should be able to lose without losing
functionality. A cluster of size N needs (N/2) + 1 controllers active and connected to be able
to take model updates, such as provisioning identities, adding/changes services and updating policies.

Since a two node cluster will lose some functionality if either node becomes unavailable, a minimum
of 3 nodes is recommended.

### Clients

The functionality that controllers provide to clients doesn't require any specific number of controllers.
A network manager will want to scale the number controllers based on client demand and may want to
place additional controllers geographically close to clusters of clients for better performance.

## Voting vs Non-Voting Members

Because every model update must be approved by a quorum of voting members, adding a large number of voting
members can add a lot of latency to model changes.

If more controllers are desired to scale out to meet client needs, only as many controllers as are needed
to meet availability requirements for mangement needs should be made into voting members.

Additionally a having a quorum of controllers be geographically close will reduce latency without necessarily
reducing availability.

### Example

**Requirements**

1. The network should be able to withstand the loss of 1 voting member
1. Controllers should exist in the US, EU and Asia, with 2 in each region.

To be able to lose one voting member, we need 3 voting nodes, with 6 nodes total.

We should place 2 voting members in the same region, but in different availability zones/data centers.
The third voting member should be in a different region. The rest of the controllers should be non-voting.

**Proposed Layout**

So, using AWS regions, we might have:

* 1 in us-east-1 (voting)
* 1 in us-west-2 (voting)
* 1 in eu-west-3 (voting)
* 1 in eu-south-1 (non-voting)
* 1 in ap-southeast-4 (non-voting)
* 1 in ap-south-2 (non-voting)

Assuming the leader is one of us-east-1 or us-west-2, model updates will only need to be accepted by
one relatively close node before being accepted. All other controllers will recieve the updates as well,
but updates won't be gated on communications with all of them.

**Alternate**

For even faster updates at the cost of an extra controller, two controllers could be in us-east, one in us-east-1
and one in us-east-2. The third member could be in the eu. Updates would now only need to be approved by two
very close controllers. If one of them went down, updates would slow down, since updates would need to be done
over longer latencies, but they would still work.

* 1 in us-east-1 (voting)
* 1 in us-east-2 (voting)
* 1 in us-west-2 (non-voting)
* 1 in eu-west-3 (voting)
* 1 in eu-south-1 (non-voting)
* 1 in ap-southeast-4 (non-voting)
* 1 in ap-south-2 (non-voting)


0 comments on commit 5d5328c

Please sign in to comment.