diff --git a/docusaurus/docs/reference/30-configuration/_category_.yml b/docusaurus/docs/reference/30-configuration/_category_.yml
index 8407d807..09ae9f0e 100644
--- a/docusaurus/docs/reference/30-configuration/_category_.yml
+++ b/docusaurus/docs/reference/30-configuration/_category_.yml
@@ -1,5 +1,5 @@
 label: Configuration
-position: 40
+position: 15
 link:
   type: doc
   id: reference/configuration/conventions
diff --git a/docusaurus/docs/reference/30-configuration/controller.md b/docusaurus/docs/reference/30-configuration/controller.md
index 92f194ca..beb31909 100644
--- a/docusaurus/docs/reference/30-configuration/controller.md
+++ b/docusaurus/docs/reference/30-configuration/controller.md
@@ -477,7 +477,7 @@ profile:
 
 The raft section enables running multiple controllers in a cluster.
 
-- `bootstrapMembers` - (optional) Only used when bootstrapping the cluster. List of initial clusters
+- `initialMembers` - (optional) Only used when bootstrapping the cluster. List of initial clusters
   members. Should only be set on one of the controllers in the cluster.
 - `commandHandler` - (optional)
     - `maxQueueSize` - (optional, 1000) max size of the queue for processing incoming raft log
@@ -510,10 +510,12 @@ The raft section enables running multiple controllers in a cluster.
   be used to bring other nodes up to date that are only slightly behind, without having to send the
   full snapshot. This is a cluster wide value and should be consistent across nodes in the cluster.
   Otherwise the value from the most recently started controller will win.
+- `warnWhenLeaderlessFor` - (optional, 1m) - Emits a warning log message if a controller is part of
+   a cluster with no leader for a duration which exceeds this threshold. 
 
 ```text
 raft:
-  bootstrapMembers:
+  initialMembers:
     - tls:127.0.0.1:6262
     - tls:127.0.0.1:6363
     - tls:127.0.0.1:6464
diff --git a/docusaurus/docs/reference/30-configuration/router.md b/docusaurus/docs/reference/30-configuration/router.md
index ffb39f35..cb2b40d2 100644
--- a/docusaurus/docs/reference/30-configuration/router.md
+++ b/docusaurus/docs/reference/30-configuration/router.md
@@ -122,6 +122,8 @@ The `ctrl` section configures how the router will connect to the controller.
   See [heartbeats](./conventions.md#heartbeats).
 - `options` - a set of option which includes the below options and those defined
   in [channel options](conventions.md#channel)
+- `endpointsFile` - (optional, 'config file dir'/endpoints) - File location to save the current 
+  known set of controller endpoints, when an endpoints update has been received from a controller.
 
 Example:
 
@@ -164,6 +166,9 @@ Each dialer currently supports a number of [shared options](conventions.md#xgres
 The `edge` section contains configuration that pertain to edge functionality. This section must be
 present to enable edge functionality (e.g. listening for edge SDK connections, tunnel binding modes).
 
+- `db` - (optional, `<path-to-config-file>.proto.gzip`) - Configures where the router data model will be snapshotted to
+- `dbSaveIntervalSeconds` - (optional, 30s) - Configures how the router data model will be snapshotted
+
 Example:
 
 ```text
@@ -210,7 +215,6 @@ routers at least one valid SAN must be provided.
     - `uri` - (optional) - an array of URI SAN entries
     - `email`  - (optional) - an array of email SAN entries
 
-
 ### `forwarder`
 
 The `forwarder` section controls options that affect how a router forwards payloads across links to
diff --git a/docusaurus/docs/reference/_category_.yml b/docusaurus/docs/reference/_category_.yml
index 5904cd26..9e7a022f 100644
--- a/docusaurus/docs/reference/_category_.yml
+++ b/docusaurus/docs/reference/_category_.yml
@@ -1,2 +1,2 @@
 label: Reference
-position: 40
+position: 10
diff --git a/docusaurus/docs/reference/config-types/index.md b/docusaurus/docs/reference/config-types/index.md
index 70767b76..1b9c2bea 100644
--- a/docusaurus/docs/reference/config-types/index.md
+++ b/docusaurus/docs/reference/config-types/index.md
@@ -1,6 +1,6 @@
 ---
 title: Builtin Config Types
-sidebar_position: 10
+sidebar_position: 20
 ---
 
 ## Overview
diff --git a/docusaurus/docs/reference/ha/_category_.yml b/docusaurus/docs/reference/ha/_category_.yml
new file mode 100644
index 00000000..01a83d0e
--- /dev/null
+++ b/docusaurus/docs/reference/ha/_category_.yml
@@ -0,0 +1,5 @@
+label: Controller HA
+position: 22
+link:
+  type: doc
+  id: reference/ha/overview
diff --git a/docusaurus/docs/reference/ha/bootstrapping.md b/docusaurus/docs/reference/ha/bootstrapping.md
new file mode 100644
index 00000000..4c163e68
--- /dev/null
+++ b/docusaurus/docs/reference/ha/bootstrapping.md
@@ -0,0 +1,164 @@
+---
+sidebar_label: Bootstrapping
+sidebar_position: 10
+---
+
+# Bootstrapping A Cluster 
+
+To bring up a controller cluster, one starts with a single node. 
+
+## Controller Configuration
+
+### Certificates
+
+Each controller requires appropriate certificates. The certificates for clustered controllers 
+have more requirements than those for a standalone server. See the [Certificates Reference](./certificates.md)
+for more information. 
+
+### Config File
+
+The controller requires a `raft` section.
+
+```yaml
+raft:
+  dataDir: /path/to/data/dir
+```
+The `dataDir` will be used to store the following:
+
+* `ctrl-ha.db` - the OpenZiti data model bbolt database
+* `raft.db` - the raft bbolt database
+* `snapshots/` - a directory to store raft snapshots
+
+Controller use the control channel listener to communicate with each other. Unlike
+routers, they need to know how to reach each other, so an advertise address must
+be configured.
+
+```yaml
+ctrl:
+  listener: tls:0.0.0.0:6262
+  options:
+    advertiseAddress: tls:192.168.1.100:6262
+```
+ 
+Finally, for sessions to work across controllers, JWTs are used. To enable these
+an OIDC endpoint should be configured.
+
+```yaml
+web:
+  - name: all-apis-localhost
+    bindPoints:
+      - interface: 127.0.0.1:1280
+        address: 127.0.0.1:1280
+    options:
+      minTLSVersion: TLS1.2
+      maxTLSVersion: TLS1.3
+    apis:
+      - binding: health-checks
+      - binding: fabric
+      - binding: edge-management
+      - binding: edge-client
+      - binding: edge-oidc
+```
+
+## Initializing the Controller
+
+Once properly configured, the controller can be started.
+
+```shell
+ziti controller run ctrl1.yml
+```
+
+Once the controller is up and running, it will see that it is not yet initialized, and will pause
+startup, waiting for initialization. While waiting it will periodically emit a message:
+
+```
+[   3.323] WARNING ziti/controller/server.(*Controller).checkEdgeInitialized: the 
+Ziti Edge has not been initialized, no default admin exists.  Add this node to a 
+cluster using 'ziti agent cluster add tls:localhost:6262' against an existing 
+cluster member, or if this is the bootstrap node, run 'ziti agent controller init' 
+to configure the default admin and bootstrap the cluster
+```
+
+As this is the first node in the cluster, we can't add any nodes to it yet. Instead, run:
+
+```
+ziti agent controller init <admin username> <admin password> <admin identity name>
+```
+
+This initializes an admin user that can be used to manage the network.
+ 
+## Managing the Cluster
+
+There are four commands which can be used to manage the cluster.
+
+```bash
+# Adding Members
+ziti agent cluster add <other controller raft address>
+
+# Listing Members
+ziti agent cluster list
+
+# Removing Members
+ziti agent cluster remove <controller id>
+
+# Transfer Leadership
+ziti agent cluster transfer-leadership [new leader id]
+```
+
+These are also available via the REST API, and can be invoked through the CLI.
+
+```bash
+$ ziti ops cluster --help
+Controller cluster operations
+
+Usage:
+  ziti ops cluster [flags]
+  ziti ops cluster [command]
+
+Available Commands:
+  add-member          add cluster member
+  list-members        list cluster members and their status
+  remove-member       remove cluster member
+  transfer-leadership transfer cluster leadership to another member
+
+Flags:
+  -h, --help   help for cluster
+
+Use "ziti ops cluster [command] --help" for more information about a command.
+```
+
+## Growing the Cluster
+
+Once a single node is up and running, additional nodes can be added to it. They should be 
+configured the same as the initial node, though they will have different addresses.
+
+The first node, as configured above, is running at `192.168.1.100:6262`. 
+
+If the second node is running at `192.168.1.101:6262`, then it can be added to the 
+cluster in one of two ways. 
+
+### From An Existing Node
+
+From a node already in the cluster, in this case our initial node, we can add the 
+new node as follows:
+
+```bash
+user@node1$ ziti agent cluster add tls:192.168.3.101
+```
+
+### From A New Node
+
+We can also ask the new node, which is not yet part of the cluster, to reach
+out to an existing cluster node and request to be joined.
+
+```
+user@node2$ ziti agent cluser add tls:192.168.3.100
+```
+
+## Shrinking the Cluster
+
+From any node in the cluster, nodes can be removed as follows:
+
+```
+user@node1$ ziti agent cluster remove tls:192.168.3.101
+```
diff --git a/docusaurus/docs/reference/ha/certificates.md b/docusaurus/docs/reference/ha/certificates.md
new file mode 100644
index 00000000..96758531
--- /dev/null
+++ b/docusaurus/docs/reference/ha/certificates.md
@@ -0,0 +1,86 @@
+---
+sidebar_label: Certificates
+sidebar_position: 20
+---
+
+# Controller Certificates
+
+For controllers to communicate and trust one another, they need certificates that have
+been generated with the correct attribute and relationships.
+
+## Requirements
+
+1. The certificates must have a shared root of trust
+2. The controller client and server certificates must contain a 
+   [SPIFFE ID](https://spiffe.io/docs/latest/spiffe-about/spiffe-concepts/#spiffe-id)
+
+## Steps to Certificate Creation
+There are many ways to set up certificates, so this will just cover a recommended configuration.
+
+The primary thing to ensure is that controllers have a shared root of trust. 
+A standard way of generating certs would be as follows:
+
+1. Create a self-signed root CA
+1. Create an intermediate signing cert for each controller
+1. Create a server cert using the signing cert for each controller
+1. Create a client cert using the signing cert for each controller
+1. Make sure that the CA bundle for each server includes both the root CA and the intermediate CA
+   for that server
+
+Note that controller server certs must contain a SPIFFE id of the form
+
+```
+spiffe://<trust domain>/controller/<controller id>
+```
+
+So if your trust domain is `example.com` and your controller id is `ctrl1`, then your SPIFFE id
+would be:
+
+```
+spiffe://example.com/controller/ctrl1
+```
+
+**SPIFFE ID Notes:**
+
+* This ID must be set as the only URI in the `X509v3 Subject Alternative Name` field in the
+  certificate.
+* These IDs are used to allow the controllers to identify each during the mTLS negotiation.
+* The OpenZiti CLI supports creating SPIFFE IDs in your certs
+    * Use the `--trust-domain` flag when creating CAs
+    * Use the `--spiffe-id` flag when creating server or client certificates
+
+## Example
+
+Using the OpenZiti PKI tool, certificates for a three node cluster could be created as follows:
+
+```bash
+# Create the trust root, a self-signed CA
+ziti pki create ca --trust-domain ha.test --pki-root ./pki --ca-file ca --ca-name 'HA Example Trust Root'
+
+# Create the controller 1 intermediate/signing cert
+ziti pki create intermediate --pki-root ./pki --ca-name ca --intermediate-file ctrl1 --intermediate-name 'Controller One Signing Cert'
+
+# Create the controller 1 server cert
+ziti pki create server --pki-root ./pki --ca-name ctrl1 --dns localhost --ip 192.168.3.100 --server-name ctrl1 --spiffe-id 'controller/ctrl1'
+
+# Create the controller 1 server cert
+ziti pki create client --pki-root ./pki --ca-name ctrl1 --client-name ctrl1 --spiffe-id 'controller/ctrl1'
+
+# Create the controller 2 intermediate/signing cert
+ziti pki create intermediate --pki-root ./pki --ca-name ca --intermediate-file ctrl2 --intermediate-name 'Controller Two Signing Cert'
+
+# Create the controller 2 server cert
+ziti pki create server --pki-root ./pki --ca-name ctrl2 --dns localhost --ip 192.168.3.101 --server-name ctrl2 --spiffe-id 'controller/ctrl2'
+
+# Create the controller 2 client cert
+ziti pki create client --pki-root ./pki --ca-name ctrl2 --client-name ctrl2 --spiffe-id 'controller/ctrl2'
+
+# Create the controller 3 intermediate/signing cert
+ziti pki create intermediate --pki-root ./pki --ca-name ca --intermediate-file ctrl3 --intermediate-name 'Controller Three Signing Cert'
+
+# Create the controller 3 server cert
+ziti pki create server --pki-root ./pki --ca-name ctrl3 --dns localhost --ip 192.168.3.102 --server-name ctrl3 --spiffe-id 'controller/ctrl3'
+
+# Create the controller 3 client cert
+ziti pki create client --pki-root ./pki --ca-name ctrl3 --client-name ctrl3 --spiffe-id 'controller/ctrl3'
+```
diff --git a/docusaurus/docs/reference/ha/data-model.md b/docusaurus/docs/reference/ha/data-model.md
new file mode 100644
index 00000000..fa961c60
--- /dev/null
+++ b/docusaurus/docs/reference/ha/data-model.md
@@ -0,0 +1,135 @@
+---
+sidebar_label: Data Model
+sidebar_position: 80
+---
+
+# Controller HA Data Model
+
+:::info
+
+This document is likely most interesting for developers working on OpenZiti,
+those curious about how distributed systems work in general, or curious
+about how data is distributed in OpenZiti.
+
+:::
+
+## Model Data
+
+### Model Data Characteristics
+
+* All data required on every controller
+* Read characteristics
+    * Reads happen all the time, from every client and as well as admins
+    * Speed is very important. They affect how every client perceives the system.
+    * Availability is very important. Without reading definitions, can’t create new connections
+    * Can be against stale data, if we get consistency within a reasonable timeframe (seconds to
+      minutes)
+* Write characteristics
+    * Writes only happen from administrators
+    * Speed needs to be reasonable, but doesn't need to be blazing fast
+    * Write availability can be interrupted, since it primarily affects management operations
+    * Must be consistent. Write validation can’t happen with stale data. Don’t want to have to deal
+      with reconciling concurrent, contradictory write operations.
+* Generally involves controller to controller coordination
+
+Of the distribution mechanisms we looked at, RAFT had the best fit.
+
+### Raft Resources
+
+For a more in-depth look at Raft, see
+
+* https://raft.github.io/
+* http://thesecretlivesofdata.com/raft/
+
+### RAFT Characteristics
+
+* Writes
+    * Consistency over availability
+    * Good but not stellar performance
+* Reads
+    * Every node has full state
+    * Local state is always internally consistent, but maybe slightly behind the leader
+    * No coordination required for reads
+    * Fast reads
+    * Reads work even when other nodes are unavailable
+    * If latest data is desired, reads can be forwarded to the current leader
+
+So the OpenZiti controller uses RAFT to distribute the data model. Specifically it uses the
+[HashiCorp Raft Library](https://github.com/hashicorp/raft/).
+
+### Updates
+
+The basic flow for model updates is as follows:
+
+1. A client requests a model update via the REST API.
+2. The controller checks if it is the raft cluster leader. If it is not, it forwards the request to
+   the leader.
+3. Once the request is on the leader, it applies the model update to the raft log. This involves
+   getting a quorum of the controllers to accept the update.
+4. One the update has been accepted, it will be executed on each node of the cluster. This will
+   generate create one or more changes to the bolt database.
+5. The results of the operation (success or failure) are returned to the controller which received
+   the original REST request.
+6. The controller waits until the operation has been applied locally.
+7. The result is returned to the REST client.
+
+### Reads
+
+Reads are always done to the local bolt database for performance. The assumption is that if
+something like a policy change is delayed, it may temporarily allow a circuit to be created, but as
+soon as the policy update is applied, it will make changes to circuits as necessary.
+
+## Runtime Data
+
+In addition to model data, the controller also manages some amount of runtime data. This data is for
+running OpenZiti's core functions, i.e. managing the flow of data across the mesh, along with
+related authentication data. So this includes things like:
+
+* Links
+* Circuits
+* API Sessions
+* Sessions
+* Posture Data
+
+### Runtime Data Characteristics
+
+Runtime data has different characteristics than the model data does.
+
+* Not necessarily shared across controllers
+* Reads **and** writes must be very fast
+* Generally involves sdk to controller or controller to router coordination
+
+Because writes must also be fast, RAFT is not a good candidate for storing this data. Good
+performance is critical for these components, so they are each evaluated individually.
+
+### Links
+
+Each controller currently needs to know about links so that it can make routing decisions. However,
+links exist on routers. So, routers are the source of record for links. When a router connects to a
+controller, the router will tell the controller about any links that it already has. The controller
+will ask to fill in any missing links and the controller will ensure that it doesn't create
+duplicate links if multiple controllers request the same link be created. If there are duplicates,
+the router will inform the controller of the existing link.
+
+The allows the routers to properly handle link dials from multiple routers and keep controllers up
+to date with the current known links.
+
+### Circuits
+
+Circuits were and continue to be stored in memory for both standalone and HA mode
+controllers.Circuits are not distributed. Rather, each controller remains responsible for any
+circuits that it created.
+
+When a router needs to initiate circuit creation it will pick the one with the lowest response time
+and send a circuit creation request to that router. The controller will establish a route. Route
+tables as well as the xgress endpoints now track which controller is responsible for the associated
+circuit. This way when failures or other notifications need to be sent, the router knows which
+controller to talk to.
+
+This gets routing working with multiple controllers without a major refactor. Future work will
+likely delegate more routing control to the routers, so routing should get more robust and
+distributed over time.
+
+### Api Sessions, Sessions, Posture Data
+
+API Sessions and Sessions are moving to bearer tokens. Posture Data is now handled in the routers.
diff --git a/docusaurus/docs/reference/ha/migrating.md b/docusaurus/docs/reference/ha/migrating.md
new file mode 100644
index 00000000..c4190b27
--- /dev/null
+++ b/docusaurus/docs/reference/ha/migrating.md
@@ -0,0 +1,45 @@
+---
+sidebar_label: Migrating
+sidebar_position: 30
+---
+
+# Migrating Controllers
+
+A controller can be moved from standalone mode to HA mode. It can also be returned
+from HA mode back to standalone mode.
+
+## Standalone to HA
+
+### Requirements
+First, ensure that the controller's certificates and configuration meet the requirements
+in [Bootstrapping](./bootstrapping.md).
+
+### Data Model Migration
+The controller's data can be imported in one of two ways:
+
+**Using Config**
+
+Leave the `db: </path/to/ctrl.db/>` setting in the controller config. When the controller
+starts up, it will see that it's running in HA mode, but isn't initialized yet. It will
+try to use the database in the configuration to initialize its data model.
+
+**Using the Agent**
+
+The agent can also be used to provide the controller database to the controller.
+
+```
+ziti agent controller init-from-db path/to/source.db
+```
+
+Once the controller is initialized, it should start up as normal and be usable.
+The cluster can now be expanded as explained in [Bootstrapping](./bootstrapping.md).
+
+## HA to Standalone
+
+This assumes that you have a database snapshot from an HA cluster. This could either
+be the ctrl-ha.db from the `dataDir`, or a snapshot created using the snapshot 
+CLI command. 
+
+To revert back to standalone mode, the `raft` section would be removed from the
+config file and the `db:` section would be added back, pointing at the snapshot
+from the HA cluster. Now when started, it should come up in standalone mode.
diff --git a/docusaurus/docs/reference/ha/operations.md b/docusaurus/docs/reference/ha/operations.md
new file mode 100644
index 00000000..86b5c6e8
--- /dev/null
+++ b/docusaurus/docs/reference/ha/operations.md
@@ -0,0 +1,96 @@
+---
+sidebar_label: Operations
+sidebar_position: 50
+---
+
+# Operating a Controller Cluster
+
+## Restoring from Backup
+
+To restore from a database snapshot, use the following CLI command:
+
+```
+ziti agent controller restore-from-db /path/to/backup.db
+```
+
+As this is an agent command, it must be run on the same machine as the controller. The path
+provided will be read by the controller process, not the CLI.
+
+The controller will apply the snapshot and then terminate. All controllers in the cluster will
+terminate and expect to be restarted. This is so in memory caches won't be out of sync with
+the database which has changed.
+
+## Snapshot Application and Restarts
+
+If a controller is out of communcation for a while, it may receive a snapshot to apply, rather
+than a stream of events.
+
+If a controller receives a snapshot to apply after starting up, it will apply the snapshot and then
+terminate. This assumes that there is a restart script which will bring the controller back up after
+it terminates.
+
+This should only happen if a controller is connected to the cluster and then gets disconnected for
+long enough that a snapshot is created while it's disconnected. Because applying a snapshot requires
+replacing the underlying controller bolt DB, the easiest way to do that is restart. That way we
+don't have to worry about replacing the bolt DB underneath a running system.
+
+## Events
+
+All events now contain a `event_src_id` to indicate which controller emitted them.
+
+There are some new events which are specific to clusters. See [Cluster Events](../events#cluster) 
+for more detail.
+
+## Metrics
+
+In an HA system, routers will send metrics to all controllers to which they are connected. There is
+a new `doNotPropagate` flag in the metrics message, which will be set to false until the router has
+successfully delivered the metrics message to a controller. The flag will then be set to true. So
+the first controller to get the metrics message is expected to deliver the metrics message to the
+events system for external integrators. The other controllers will have `doNotPropage` set to true,
+and will only use the metrics message internally, to update routing data.
+
+## Open Ports
+
+Controllers now establish connections with each other, for two purposes.
+
+1. Forwarding model updates to the leader, so they can be applied to the raft cluster
+2. raft communication
+
+Both kinds of traffic flow over the same connection.
+
+These connections do not require any extra open ports as we are using the control channel listener
+to listen to both router and controller connections. As part of the connection process the
+connection type is provided and the appropriate authentication and connection setup happens based on
+the connection type. If no connection type is provided, it's assumed to be a router.
+
+## System of Record
+
+In controller that's not configured for HA, the bolt database is the system of record. In an HA
+setup, the raft journal is the system of record. The raft journal is stored in two places, a
+snapshot directory and a bolt database of raft journal entries.
+
+So a non-HA setup will have:
+
+* ctrl.db
+
+An HA setup will have:
+
+* raft.db - the bolt database containing raft journal entries
+* snapshots/ - a directory containing raft snapshots. Each snapshot is snapshot of the controller
+  bolt db
+* ctrl.db - the controller bolt db, with the current state of the model
+
+The location of all three is controlled by the raft/dataDir config property.
+
+```yaml
+raft:
+  dataDir: /var/ziti/data/
+```
+
+When an HA controller starts up, it will first apply the newest snapshot, then any newer journal
+entries that aren't yet contained in a snapshot. This means that an HA controller should start with
+a blank DB that can be overwritten by snapshot and/or have journal entries applied to it. So an HA
+controller will delete or rename the existing controller database and start with a fresh bolt db.
+
+
diff --git a/docusaurus/docs/reference/ha/overview.md b/docusaurus/docs/reference/ha/overview.md
new file mode 100644
index 00000000..c386d098
--- /dev/null
+++ b/docusaurus/docs/reference/ha/overview.md
@@ -0,0 +1,55 @@
+---
+sidebar_label: Overview
+sidebar_position: 05
+---
+
+# Controller HA 
+
+## Overview
+
+OpenZiti controllers can be run in a cluster for high availablity and performance scaling.
+
+:::warning
+
+**NOTE: Controller HA is still in Beta** 
+
+It's quite functional now, but we are continuing to test and refine before we mark it GA.
+:::
+
+
+### For SDK Clients/Tunnelers
+
+A controller cluster offers the following advantages:
+
+1. Horizontal scaling of SDK client services such as
+    1. Service lookups
+    1. Session creation
+1. Horizontal scaling of circuit creation
+
+This means that for everything that SDK clients and tunnelers depend on, controllers
+can be scaled up and placed strategically to meet user demand. 
+
+The following limitations currently apply:
+
+1. Circuits are owned by a controller. If the controller goes down, the circuit 
+   will remain up, but can't be re-routed for performance or if a router goes down.
+2. For a controller to route circuits on a router, that router must be connected
+   to that controller. This means that routers should generally be connected to
+   all controllers.
+
+### For Management Operations
+
+The HA controller cluster uses a distributed journal keep the data model synchronized across controllers. 
+This has the following ramifications:
+
+1. Read operations will work on any controller that is up. If the controller is 
+   disconnected from the cluster, the reads may return data that is out of date.
+2. Update operations require that the cluster has a leader and that a quorum of nodes
+   is available. A quorum for a cluster of size N is (N/2)+1. This means that a 3 node 
+   cluster can operate with 2 nodes and a 5 node cluster can operate with 3 nodes, and 
+   so on. 
+3. Updates can be initiated on any controller, they will be forwarded to the leader to
+   be applied.
+4. The cluster may have non-voting members. 
+
+See [topology](./topology.md) and [the data model](./data-model.md) for more information.
diff --git a/docusaurus/docs/reference/ha/routers.md b/docusaurus/docs/reference/ha/routers.md
new file mode 100644
index 00000000..6127509f
--- /dev/null
+++ b/docusaurus/docs/reference/ha/routers.md
@@ -0,0 +1,62 @@
+---
+sidebar_label: Routers
+sidebar_position: 40
+---
+
+# Routers in Controller HA
+
+There are only a few differences in how routers work in an HA cluster.
+
+## Configuration
+
+Instead of specifying a single controller, you can specify multiple controllers
+in the router configuration.
+
+```yaml
+ctrl:
+  endpoints:
+    - tls:192.168.3.100:6262
+    - tls:192.168.3.101:6262
+    - tls:192.168.3.102:6262
+```
+
+If the controller cluster changes, it will notify routers of the updated 
+controller endpoints. 
+
+By default these will be stored in a file named `endpoints` in the same directory
+as the router config file.
+
+However, the file can be customized using a config file settings.
+
+```yaml
+ctrl:
+  endpoints:
+    - tls:192.168.3.100:6262
+  endpointsFile: /var/run/ziti/endpoints.yaml
+```
+
+In general, a router should only need one or two controllers to bootstrap itself,
+and thereafter should be able to keep the endpoints list up to date with help 
+from the controller.
+
+## Router Data Model
+
+In order to enable HA functionality, the router now receives a stripped down 
+version of the controller data model. While required for controller HA, this 
+also enables other optimizations, so use of the router data model is also enabled
+by default when running in standalone mode. 
+
+The router data model can be disabled on the controller using a config setting,
+but since it is required for HA, that flag will be ignored if the controllers
+are running in a cluster.
+
+The data model on the router is periodically snapshotted, so it doesn't need to
+be fully restored from a controller on every restart. 
+
+The location and frequency of snapshotting can be [configured](../configuration/router#edge).
+
+## Controller Selection
+
+When creating circuits, routers will chose the most responsive controller, based on latency.
+When doing model updates, such as managing terminators, they will try to talk directly to
+the current cluster leader, since updates have to go through the leader in any case.
diff --git a/docusaurus/docs/reference/ha/topology.md b/docusaurus/docs/reference/ha/topology.md
new file mode 100644
index 00000000..57934a08
--- /dev/null
+++ b/docusaurus/docs/reference/ha/topology.md
@@ -0,0 +1,81 @@
+---
+sidebar_label: Topology
+sidebar_position: 60
+---
+
+# Controller Topology
+
+This document discuss considerations for how many controllers a network might 
+need and how to place them geographically.
+
+## Number of Controllers
+
+### Management
+
+The first consideration is how many controllers the network should be able to lose without losing
+functionality. A cluster of size N needs (N/2) + 1 controllers active and connected to be able
+to take model updates, such as provisioning identities, adding/changes services and updating policies.
+
+Since a two node cluster will lose some functionality if either node becomes unavailable, a minimum
+of 3 nodes is recommended.
+
+### Clients
+
+The functionality that controllers provide to clients doesn't require any specific number of controllers.
+A network manager will want to scale the number controllers based on client demand and may want to 
+place additional controllers geographically close to clusters of clients for better performance.
+
+## Voting vs Non-Voting Members
+
+Because every model update must be approved by a quorum of voting members, adding a large number of voting
+members can add a lot of latency to model changes. 
+
+If more controllers are desired to scale out to meet client needs, only as many controllers as are needed
+to meet availability requirements for mangement needs should be made into voting members.
+
+Additionally a having a quorum of controllers be geographically close will reduce latency without necessarily
+reducing availability.
+
+### Example
+
+**Requirements**
+
+1. The network should be able to withstand the loss of 1 voting member
+1. Controllers should exist in the US, EU and Asia, with 2 in each region. 
+
+To be able to lose one voting member, we need 3 voting nodes, with 6 nodes total.
+
+We should place 2 voting members in the same region, but in different availability zones/data centers.
+The third voting member should be in a different region. The rest of the controllers should be non-voting.
+
+**Proposed Layout**
+
+So, using AWS regions, we might have:
+
+* 1 in us-east-1 (voting)
+* 1 in us-west-2 (voting)
+* 1 in eu-west-3 (voting)
+* 1 in eu-south-1 (non-voting)
+* 1 in ap-southeast-4 (non-voting)
+* 1 in ap-south-2 (non-voting)
+
+Assuming the leader is one of us-east-1 or us-west-2, model updates will only need to be accepted by 
+one relatively close node before being accepted. All other controllers will recieve the updates as well,
+but updates won't be gated on communications with all of them.
+
+**Alternate**
+
+For even faster updates at the cost of an extra controller, two controllers could be in us-east, one in us-east-1
+and one in us-east-2. The third member could be in the eu. Updates would now only need to be approved by two 
+very close controllers. If one of them went down, updates would slow down, since updates would need to be done
+over longer latencies, but they would still work.
+
+* 1 in us-east-1 (voting)
+* 1 in us-east-2 (voting)
+* 1 in us-west-2 (non-voting)
+* 1 in eu-west-3 (voting)
+* 1 in eu-south-1 (non-voting)
+* 1 in ap-southeast-4 (non-voting)
+* 1 in ap-south-2 (non-voting)
+
+
diff --git a/docusaurus/docs/reference/tunnelers/_category_.yml b/docusaurus/docs/reference/tunnelers/_category_.yml
index ca40b911..54a44c9e 100644
--- a/docusaurus/docs/reference/tunnelers/_category_.yml
+++ b/docusaurus/docs/reference/tunnelers/_category_.yml
@@ -1,5 +1,5 @@
 label: Tunnelers
-position: 10
+position: 25
 link:
   type: doc
   id: reference/tunnelers/index