You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A restarted node will sync the schema and other information from its peers on boot. Before this process completes, the node won't be fully started and functional.
A stopping node picks an online cluster member (only disc nodes will be considered) to sync with after restart. Upon restart the node will try to contact that peer 10 times by default, with 30 second response timeouts.
In case the peer becomes available in that time interval, the node successfully starts, syncs what it needs from the peer and keeps going.
If the peer does not become available, the restarted node will give up and voluntarily stop. Such condition can be identified by the timeout (timeout_waiting_for_tables) warning messages in the logs that eventually lead to node startup failure:
This window of time can be adjusted using two configuration settings:
# wait for 60 seconds instead of 30
mnesia_table_loading_retry_timeout = 60000
# retry 15 times instead of 10
mnesia_table_loading_retry_limit = 15
By adjusting these settings and tweaking the time window in which known peer has to come back it is possible to account for cluster-wide redeployment scenarios that can be longer than 5 minutes to complete.
The text was updated successfully, but these errors were encountered:
alphamonkey79
changed the title
Add 'mnesia_table_loading' parameter(s) support
Add 'mnesia_table_loading_retry_timeout' and 'mnesia_table_loading_retry_limit' parameter(s) support
Jan 22, 2025
refs:
rabbit.schema#L1552-L1563
Schema Syncing from Online Peers:
A restarted node will sync the schema and other information from its peers on boot. Before this process completes, the node won't be fully started and functional.
A stopping node picks an online cluster member (only disc nodes will be considered) to sync with after restart. Upon restart the node will try to contact that peer 10 times by default, with 30 second response timeouts.
In case the peer becomes available in that time interval, the node successfully starts, syncs what it needs from the peer and keeps going.
If the peer does not become available, the restarted node will give up and voluntarily stop. Such condition can be identified by the timeout (timeout_waiting_for_tables) warning messages in the logs that eventually lead to node startup failure:
This window of time can be adjusted using two configuration settings:
By adjusting these settings and tweaking the time window in which known peer has to come back it is possible to account for cluster-wide redeployment scenarios that can be longer than 5 minutes to complete.
The text was updated successfully, but these errors were encountered: