Skip to content

Commit

Permalink
Attempt to provide definition and usage example for `raftIndexLagThre…
Browse files Browse the repository at this point in the history
…shold` (neo4j#1572)

Co-authored-by: Nick Giles <[email protected]>
  • Loading branch information
NataliaIvakina and nick-giles-neo authored Apr 24, 2024
1 parent d7419e0 commit f6c1b82
Showing 1 changed file with 31 additions and 19 deletions.
50 changes: 31 additions & 19 deletions modules/ROOT/pages/clustering/monitoring/endpoints.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
:description: This section describes how to monitor cluster endpoints
:page-aliases: monitoring/causal-cluster/http-endpoints.adoc
[role=enterprise-edition]

= Monitor cluster endpoints for status information
Expand All @@ -19,7 +20,7 @@ For those situations, consider disabling authentication of the clustering status
[[clustering-http-endpoints-unified]]
== Unified endpoints

A unified set of endpoints exist, both on Primary and Secondary servers, with the following behavior:
A unified set of endpoints exist, both on primary and secondary servers, with the following behavior:

* `/db/<databasename>/cluster/writable` -- Used to direct `write` traffic to specific instances.
* `/db/<databasename>/cluster/read-only` -- Used to direct `read` traffic to specific instances.
Expand Down Expand Up @@ -82,7 +83,7 @@ With no arguments, `curl` does an HTTP `GET` on the URI provided and outputs the
If the response code is desired, just add the `-v` flag for verbose output.
Here are some examples:
* Requesting `writable` endpoint on a Primary server that is currently elected leader with verbose output:
* Requesting `writable` endpoint on a primary server that is currently elected leader with verbose output:
[source, curl]
--------------
Expand Down Expand Up @@ -114,13 +115,13 @@ true* Closing connection #0
The status endpoint, available at `/db/<databasename>/cluster/status`, is to be used to assist with rolling upgrades.
For more information, see link:https://neo4j.com/docs/upgrade-migration-guide/current/version-5/upgrade-minor/#_clusters[Upgrade and Migration Guide -> Clusters].

Typically, it is desired to have some guarantee that a Primary is safe to shutdown for each database before removing it from a cluster.
Counter-intuitively, that a Primary is safe to shutdown means that a majority of the *other* Primaries are healthy, caught up, and have recently heard from that database's leader.
Typically, it is desired to have some guarantee that a primary is safe to shutdown for each database before removing it from a cluster.
Counter-intuitively, that a primary is safe to shutdown means that a majority of the *other* primaries are healthy, caught up, and have recently heard from that database's leader.
The status endpoints provide the following information in order to help resolve such issues.

[NOTE]
====
Several of the fields in status endpoint responses refer to details of <<clustering-transacting-via-raft-protocol, Raft>>, the algorithm used in Neo4j clusters to provide highly available transactions.
Several of the fields in status endpoint responses refer to details of Raft, the algorithm used in Neo4j clusters to provide highly available transactions.
In a Neo4j cluster each database has its own independent Raft group.
Therefore, details such as `leader` and `raftCommandsPerSecond` are database-specific.
====
Expand Down Expand Up @@ -148,21 +149,21 @@ Therefore, details such as `leader` and `raftCommandsPerSecond` are database-spe
[options="header", cols="2,1,1,2,4"]
|===
| Field | Type |Optional| Example | Description
| `core` | boolean | no | `true` | Used to distinguish between if the server is hosting the database in Primary (core) or Secondary mode.
| `core` | boolean | no | `true` | Used to distinguish between if the server is hosting the database in primary (core) or secondary mode.
| `lastAppliedRaftIndex` | number | no | `4321` | Every transaction in a cluster is associated with a raft index.

Gives an indication of what the latest applied raft log index is.
| `participatingInRaftGroup` | boolean | no | `false` | A participating member is able to vote.
A Primary is considered participating when it is part of the voter membership and has kept track of the leader.
A primary is considered participating when it is part of the voter membership and has kept track of the leader.
| `votingMembers` | string[] | no | `[]` | A member is considered a voting member when the leader has been receiving communication with it.

List of member's `memberId` that are considered part of the voting set by this Primary.
List of member's `memberId` that are considered part of the voting set by this primary.
| `isHealthy` | boolean | no | `true` | Reflects that the local database of this member has not encountered a critical error preventing it from writing locally.
| `memberId` | string | no | `30edc1c4-519c-4030-8348-7cb7af44f591` | Every member in a cluster has it's own unique member id to identify it.
Use `memberId` to distinguish between Primary and Secondary servers.
Use `memberId` to distinguish between primary and secondary servers.
| `leader` | string | yes | `80a7fb7b-c966-4ee7-88a9-35db8b4d68fe` | Follows the same format as `memberId`, but if it is null or missing, then the leader is unknown.
| `millisSinceLastLeaderMessage` | number | yes | `1234` | The number of milliseconds since the last heartbeat-like leader message.
Not relevant to Secondaries, and hence is not included.
Not relevant to secondaries, and hence is not included.
| `raftCommandsPerSecond` label:deprecated[] | number | yes | `124` | An estimate of the average Raft state machine throughput over a sampling window configurable via `clustering.status_throughput_window` setting.
`raftCommandsPerSecond` is not an effective way to monitor that servers are not falling behind in updated and is hence deprecated and will be removed in the next major release of Neo4j.
It is recommended to use the metric `<prefix>.clustering.core.commit_index` on each server and look for divergence instead.
Expand All @@ -177,21 +178,32 @@ The following table explains how results can be compared.
[options="header", cols="<1,2,2"]
|===
| Name of check | Method of calculation | Description
| `allServersAreHealthy` | Every Primaries' status endpoint indicates `isHealthy`==`true`. | To en sure the data across the entire cluster is healthy.
Whenever any Primaries are false that indicates a larger problem.
| `allVotingSetsAreEqual` | For any 2 Primaries (A and B), status endpoint A's `votingMembers`== status endpoint B's `votingMembers`. | When the voting begins, all the Primaries are equal to each other, and all members agree on membership.
| `allVotingSetsContainAtLeastTargetCluster` | For all Primaries (*S*), excluding Primary Z (to be switched off), every member in *S* contains *S* in their voting set.
Membership is determined by using the `memberId` and `votingMembers` from the status endpoint. | Sometimes network conditions are not perfect and it may make sense to switch off a different Primary than the one originally was to be switched off.
If this check is run for all Primaries, the ones that match this condition can be switched off (providing other conditions are also met).
| `hasOneLeader` | For any 2 Primaries (A and B), `A.leader == B.leader && leader!=null`. | If the leader is different then there may be a partition (alternatively, this could also occur due to bad timing).
| `allServersAreHealthy` | Every primary's status endpoint indicates `isHealthy`==`true`. | To ensure the data across the entire cluster is healthy.
Whenever any primaries are false that indicates a larger problem.
| `allVotingSetsAreEqual` | For any 2 primaries (A and B), status endpoint A's `votingMembers`== status endpoint B's `votingMembers`. | When the voting begins, all the primaries are equal to each other, and all members agree on membership.
| `allVotingSetsContainAtLeastTargetCluster` | For all primaries (*S*), excluding primary Z (to be switched off), every member in *S* contains *S* in their voting set.
Membership is determined by using the `memberId` and `votingMembers` from the status endpoint. | Sometimes network conditions are not perfect and it may make sense to switch off a different primary than the one originally was to be switched off.
If this check is run for all primaries, the ones that match this condition can be switched off (providing other conditions are also met).
| `hasOneLeader` | For any 2 primaries (A and B), `A.leader == B.leader && leader!=null`. | If the leader is different then there may be a partition (alternatively, this could also occur due to bad timing).
If the leader is unknown, that means the leader messages have actually timed out.
| `noMembersLagging` | For Primary A with `lastAppliedRaftIndex` = `min`, and Primary B with `lastAppliedRaftIndex` = `max`, `B.lastAppliedRaftIndex-A.lastAppliedRaftIndex<raftIndexLagThreshold`. | If there is a large difference in the applied indexes between Primaries, then it could be dangerous to switch off a Primary.
| `noMembersLagging` | For primary A with `lastAppliedRaftIndex` = `min`, and primary B with `lastAppliedRaftIndex` = `max`, `B.lastAppliedRaftIndex-A.lastAppliedRaftIndex<raftIndexLagThreshold`. | If there is a large difference in the applied indexes between primaries, then it could be dangerous to switch off a primary.
|===

[NOTE]
====
`raftIndexLagThreshold` helps you to monitor the lag in applying Raft log entries across a cluster and set appropriate thresholds.
You should pick a `raftIndexLagThreshold` appropriate to your particular cluster and workload.
Measuring the reported lag under normal circumstances and selecting a threshold slightly above that would be a good way to select an appropriate value.
For example, you observe the metric (the difference between the maximum and minimum `lastAppliedRaftIndex`) during all phases of the specific workload and see that it spends all of the working hours around 100 or less, but on Saturdays it spikes to 5,000 for a few hours.
Then, depending on your monitoring needs or capabilities, you either set a weekday threshold of 120 and a weekend threshold of 6,000 or just an overall threshold of 6,000.
These thresholds can help in identifying performance issues.
====


[[combined-status-endpoints]]
=== Combined status endpoints

When using the status endpoints to support a rolling upgrade, it is required to assess whether a Primary is safe to shut down for *all* databases.
When using the status endpoints to support a rolling upgrade, it is required to assess whether a primary is safe to shut down for *all* databases.
To avoid having to issue a separate request to each `/db/<databasename>/cluster/status` endpoint, use the `/dbms/cluster/status` instead.

This endpoint returns a json array, the elements of which contain the same fields as the <<clustering-http-endpoints-status-example, single database version>>, along with fields for for `databaseName` and `databaseUuid`.
Expand Down

0 comments on commit f6c1b82

Please sign in to comment.