title | summary |
---|---|
tiup cluster check |
TiUP Cluster provides a `check` command to ensure hardware and software environments meet production requirements. It checks OS version, CPU support, time synchronization, system limits, and more. Options include automatic repair and enabling checks for CPU core number, memory size, and disk performance. Use `tiup cluster check <topology.yml | cluster-name> [flags]` command to perform checks. Use `--apply` to attempt automatic repair. Use `-N, --node` and `-R, --role` to specify nodes and roles to check. Use `--enable-cpu`, `--enable-disk`, and `--enable-mem` to enable specific checks. |
For a formal production environment, before the environment goes live, you need to perform a series of checks to ensure the clusters are in their best performance. To simplify the manual check steps, TiUP Cluster provides the check
command to check whether the hardware and software environments of the target machines of a specified cluster meet the requirements to work normally.
Check the operating system distribution and version of the deployed machines. Currently, only CentOS 7 is supported for deployment. More system versions may be supported in later releases for compatibility improvement.
Check whether the CPU of the target machine supports EPOLLEXCLUSIVE.
Check whether numactl is installed on the target machine. If tied cores are configured on the target machine, you must install numactl.
Check whether the system time of the target machine is synchronized. Compare the system time of the target machine with that of the central control machine, and report an error if the deviation exceeds a certain threshold (500 ms).
Check whether the system time zone of the target machines is synchronized. Compare the time zone configuration of these machines and report an error if the time zone is inconsistent.
Check whether the time synchronization service is configured on the target machine. Namely, check whether ntpd is running.
Check whether swap partitioning is enabled on the target machine. It is recommended to disable swap partitioning.
Check the values of the following kernel parameters:
net.ipv4.tcp_tw_recycle
: 0net.ipv4.tcp_syncookies
: 0net.core.somaxconn
: 32768vm.swappiness
: 0vm.overcommit_memory
: 0 or 1fs.file-max
: 1000000
Check whether THP is enabled on the target machine. It is recommended to disable THP.
Check the limit values in the /etc/security/limits.conf
file:
<deploy-user> soft nofile 1000000
<deploy-user> hard nofile 1000000
<deploy-user> soft stack 10240
<deploy-user>
is the user who deploys and runs the TiDB cluster, and the last column is the minimum value required for the system.
Check whether SELinux is enabled. It is recommended to disable SELinux.
Check whether the FirewallD service is enabled. It is recommended to either disable the FirewallD service or add permission rules for each service in the TiDB cluster.
Check whether the irqbalance service is enabled. It is recommended to enable the irqbalance service.
Check the mount options for ext4 partitions. Make sure the mount options include the nodelalloc option and the noatime option.
Check if the ports defined in the topology (including the auto-completion default ports) are already used by the processes on the target machine.
Note:
The port usage check assumes that a cluster is not started yet. If a cluster is already deployed and started, the port usage check on the cluster fails because the ports must be in use in this case.
Check the CPU information of the target machine. For a production cluster, it is recommended that the number of the CPU logical core is greater than or equal to 16.
Note:
CPU core number is not checked by default. To enable the check, you need to add the
-enable-cpu
option to the command.
Check the memory size of the target machine. For a production cluster, it is recommended that the total memory capacity is greater than or equal to 32GB.
Note:
Memory size is not checked by default. To enable the check, you need to add the
-enable-mem
option to the command.
Use flexible I/O tester (fio) to test the performance of the disk where data_dir
is located, including the following three test items:
- fio_randread_write_latency
- fio_randread_write
- fio_randread
Note:
The fio disk performance test is not performed by default. To perform the test, you need to add the
-enable-disk
option to the command.
tiup cluster check <topology.yml | cluster-name> [flags]
- If a cluster is not deployed yet, you need to pass the topology.yml file that is used to deploy the cluster. According to the content in this file, tiup-cluster connects to the corresponding machine to perform the check.
- If a cluster is already deployed, you can use the
<cluster-name>
as the check object. - If you want to check the scale-out YAML file for an existing cluster, you can use both
<scale-out.yml>
and<cluster-name>
as the check objects.
Note:
If
<cluster-name>
is used for the check, you need to add the--cluster
option in the command.
- Attempts to automatically repair the failed check items. Currently, tiup-cluster only attempts to repair the following check items:
- SELinux
- firewall
- irqbalance
- kernel parameters
- System limits
- THP (Transparent Huge Pages)
- Data type:
BOOLEAN
- This option is disabled by default with the
false
value. To enable this option, add this option to the command, and either pass thetrue
value or do not pass any value.
Note:
tiup cluster check
also supports repairing thescale-out.yml
file for an existing cluster with the following command format:tiup cluster check <cluster-name> scale-out.yml --cluster --apply --user root [-p] [-i /home/root/.ssh/gcp_rsa]
-
Indicates that the check is for a cluster that has been deployed.
-
Data type:
BOOLEAN
-
This option is disabled by default with the
false
value. To enable this option, add this option to the command, and either pass thetrue
value or do not pass any value. -
Command format:
tiup cluster check <topology.yml | cluster-name> --cluster [flags]
Note:
If the
tiup cluster check <cluster-name>
command is used, you must add the--cluster
option:tiup cluster check <cluster-name> --cluster
.
tiup cluster check
also supports checking thescale-out.yml
file for an existing cluster with the following command format:tiup cluster check <cluster-name> scale-out.yml --cluster --user root [-p] [-i /home/root/.ssh/gcp_rsa]
- Specifies the nodes to be checked. The value of this option is a comma-separated list of node IDs. You can get the node IDs from the first column of the cluster status table returned by the
tiup cluster display
command. - Data type:
STRINGS
- If this option is not specified in the command, all nodes are checked by default.
Note:
If the
-R, --role
option is specified at the same time, only the service nodes that match both the specifications of-N, --node
and-R, --role
are checked.
- Specifies the roles to be checked. The value of this option is a comma-separated list of node roles. You can get the roles of nodes from the second column of the cluster status table returned by the
tiup cluster display
command. - Data type:
STRINGS
- If this option is not specified in the command, all roles are checked by default.
Note:
If the
-N, --node
option is specified at the same time, only the service nodes that match both the specifications of-N, --node
and-R, --role
are checked.
- Enables the check of CPU core number.
- Data type:
BOOLEAN
- This option is disabled by default with the
false
value. To enable this option, add this option to the command, and either pass thetrue
value or do not pass any value.
- Enables the fio disk performance test.
- Data type:
BOOLEAN
- This option is disabled by default with the
false
value. To enable this option, add this option to the command, and either pass thetrue
value or do not pass any value.
- Enables the memory size check.
- Data type:
BOOLEAN
- This option is disabled by default with the
false
value. To enable this option, add this option to the command, and either pass thetrue
value or do not pass any value.
- Specifies the user name to connect to the target machine. The specified user needs to have the password-free sudo root privileges on the target machine.
- Data type:
STRING
- If this option is not specified in the command, the user who executes the command is used as the default value.
Note:
This option is valid only if the
-cluster
option is false. Otherwise, the value of this option is fixed to the username specified in the topology file for the cluster deployment.
- Specifies the key file to connect to the target machine.
- Data type:
STRING
- The option is enabled by default with
~/.ssh/id_rsa
(the default value) passed in.
Note:
This option is valid only if the
--cluster
option is false. Otherwise, the value of this option is fixed to${TIUP_HOME}/storage/cluster/clusters/<cluster-name>/ssh/id_rsa
.
- Logs in with a password when connecting to the target machine.
- If the
--cluster
option is added for a cluster, the password is the password of the user specified in the topology file when the cluster was deployed. - If the
--cluster
option is not added for a cluster, the password is the password of the user specified in the-u/--user
option.
- If the
- Data type:
BOOLEAN
- This option is disabled by default with the
false
value. To enable this option, add this option to the command, and either pass thetrue
value or do not pass any value.
- Prints the help information of the related commands.
- Data type:
BOOLEAN
- This option is disabled by default with the
false
value. To enable this option, add this option to the command, and either pass thetrue
value or do not pass any value.
A table containing the following fields:
Node
: the target nodeCheck
: the check itemResult
: the check result (Pass, Warn, or Fail)Message
: the result description