Rolling Restart should consider starrocks cluster status #416

milletnis · 2024-01-23T10:53:22Z

Describe the current behavior

Currently when we do a "rolling restart" of the cluster the operator is restarting the pods independent of whether the starrocks cluster is in a clean state or not.
This leads to the problem we are facing WRITE errors with "under-replicated" tablets during rolling restarts because cluster ist still syncing tablets while operator is removing next BE pod

Currently we do manual DELETE POD instead of rolling restart and watch out for "pending tablets" on the cluster. We go with next pod if "pending tablets = 0" -> See example below

PROD > SHOW PROC '/cluster_balance';
+-------------------+--------+
| Item              | Number |
+-------------------+--------+
| cluster_load_stat | 1      |
| working_slots     | 6      |
| sched_stat        | 1      |
| priority_repair   | 0      |
| pending_tablets   | 185    |
| running_tablets   | 32     |
| history_tablets   | 1000   |
| all_tablets       | 217    |
+-------------------+--------+
8 rows in set (0.06 sec)

Describe the enhancement

Operator should consider the "health/balance" state of the cluster and should only go on with removing of PODs if cluster is in sync.
Not sure if "pending_tablets" ist the best approach but should definitely avoid tablets which are not writable during restarts

The text was updated successfully, but these errors were encountered:

milletnis added the enhancement New feature or request label Jan 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rolling Restart should consider starrocks cluster status #416

Rolling Restart should consider starrocks cluster status #416

milletnis commented Jan 23, 2024

Rolling Restart should consider starrocks cluster status #416

Rolling Restart should consider starrocks cluster status #416

Comments

milletnis commented Jan 23, 2024

Describe the current behavior

Describe the enhancement