-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
glusterd may try to start bricks twice #4080
Comments
xhernandez
added a commit
to xhernandez/glusterfs
that referenced
this issue
Mar 29, 2023
There was a race in glusterd code that could cause that two threads start the same brick at the same time. One of the bricks will fail because it will detect the other brick running. Depending on which brick fails, glusterd will report a start failure and mark the brick as stopped even if it's running. The problem is caused by an attempt to connect to a brick that's being started by another thread. If the brick is not fully initialized, it will refuse all connection attempts. When this happens, glusterd receives a disconnection notification, which forcibly marks the brick as stopped. Now, if another attempt to start the same brick happens, it will believe that the brick is stopped and it will start it again. If this happens very soon after the first start attempt, the checks done to see if the brick is already running will still fail, triggering the start of the brick process again. One of the bricks will fail to initialize and will report an error. If the failed one is processed by glusterd in the second place, the brick will be marked as stopped, even though the process is actually there and working. Fixes: gluster#4080 Signed-off-by: Xavi Hernandez <[email protected]>
xhernandez
added a commit
to xhernandez/glusterfs
that referenced
this issue
Mar 29, 2023
There was a race in glusterd code that could cause that two threads start the same brick at the same time. One of the bricks will fail because it will detect the other brick running. Depending on which brick fails, glusterd will report a start failure and mark the brick as stopped even if it's running. The problem is caused by an attempt to connect to a brick that's being started by another thread. If the brick is not fully initialized, it will refuse all connection attempts. When this happens, glusterd receives a disconnection notification, which forcibly marks the brick as stopped. Now, if another attempt to start the same brick happens, it will believe that the brick is stopped and it will start it again. If this happens very soon after the first start attempt, the checks done to see if the brick is already running will still fail, triggering the start of the brick process again. One of the bricks will fail to initialize and will report an error. If the failed one is processed by glusterd in the second place, the brick will be marked as stopped, even though the process is actually there and working. Fixes: gluster#4080 Signed-off-by: Xavi Hernandez <[email protected]>
xhernandez
added a commit
to xhernandez/glusterfs
that referenced
this issue
May 10, 2023
When a brick is asynchorously started it's likely that an immediate connection attempt will fail. In this case just avoid the connection. The connection will be created the next time the brick is needed. Fixes: gluster#4080 Signed-off-by: Xavi Hernandez <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
There is a race that can cause that the same brick is started twice by glusterd at the same time. One of the brick processes will detect that there's another brick process running and will stop, which is correct. However, depending on the order this happens, glusterd may think that the brick has not started when actually it's running.
The text was updated successfully, but these errors were encountered: