-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: local commit detection not thread safe #120
fix: local commit detection not thread safe #120
Conversation
Hey @tracemeyers thank you so much for your pull request and sorry for leaving this open and unresponded for so long (I didn't get a notification about it, which is my fault for not setting them up on GitHub). I appreciate you tackling a change in this intricate part of the code. To enable us to tackle the issue, can you provide a test case that fails before and passes after the change? We do already have some similar test cases in If you also rebase your changes against |
On the service broker side, all git interactions should go through |
It's been a while now, but I believe the sequence of events that resulted in a race condition was something like this...
At this point whenever It is a relatively rare race condition outcome but one I encountered often enough that I had to resolve it to get the occasional (de)provision flowing without manual intervention. Also the risk of data loss is pretty serious. |
That's fair. It might take me a while to work on this, but I'm still keen on getting this merged so I don' t have to maintain a fork 😬 |
synchronizeWithRemoteRepository did not handle the scenario where other threads may modify the local repository between the merge operation and when hasLocalCommits is set to false. This can cause the local view of the repo to become out of sync with the remote. At best it is resolved when another OSB operation, e.g. a provision request, creates a new commit with the side effect that it signals there are local changes. At worst the commits are lost if the repo storage is ephemeral and is destroyed before the commits are pushed.
46d676d
to
a75dd65
Compare
@JohannesRudolph added a test. Had to rework the implementation a bit in order to make it testable. I have zero experience with kotlin so don't be shy with suggestions to improve things. |
@tracemeyers now I understand what you mean, thanks a lot for the test case! I see that scenario happen clearly under one condition - there must be two threads active at the same time performing (write) operations on the git repo. Http requests can come in from different threads concurrently, so that fits the scenario. However we have /**
* Design: Provides a simple means to synchronize/serialize all git repository interactions.
* Only one [GitOperationContext] can be active at a single time.
*
* Typical git interactions should be fairly short-lived and therefore queuing them should be fine.
* Of course this limits the theoretical throughput of the system. However a typical unipipe deployment does not have to
* handle a lot of requests per second so this should not be much of a problem. If it becomes a problem, we can optimize
* this further at the cost of some additional complexity (e.g. separate read/write paths).
*/
@Service
class GitOperationContextFactory( This invariant also ensures that even under concurrency we will have one clean git commit per broker operations, which would otherwise be hard to maintain. Of course I don't want to rule out the possibility that there's a bug in there somewhere! I'll need to take a second look, especially around deprovisioning as you say. |
The invariant seems reasonable and I can't find any fault with it under a cursory glance at the code base. Ignoring the invariant, isn't it better to check the actual state of the repo than to check a flag that has the possibility to be inaccurate? Obviously the cost is higher but I'd rather have assurance of correct behavior without a risk of data loss than a few ms. I can only speak from experience where we have frequently encountered an OSB action that is not pushed to the remote and the logs clearly show unipipe does not think there are local changes. The simple workaround is to start another OSB action (e.g. (de)provision) that creates a new commit which ends up pushing both the existing commit for the missing OSB action and the commit for the new OSB action. However, there's still the possibility that if the container is torn down before another OSB action occurs then the missing OSB action is lost. I doubt I'll have the time to try to identify the root cause of the above. Feel free to close the MR if my argument isn't convincing enough 😄 |
@tracemeyers I've had a better look at this and I think you have a very reasonable argument that your changes will make the implementation more resilient. So I will go ahead and merge in your changes, thanks so much for your contribution! I've found a few potential issues where the locking code could have been bypassed, though from just reviewing the code I still could not figure out a scenario where that was the case. I will close those loopholes as well |
synchronizeWithRemoteRepository did not handle the scenario where other threads may modify the local repository between the merge operation and when hasLocalCommits is set to false. This can cause the local view of the repo to become out of sync with the remote. At best it is resolved when another OSB operation, e.g. a provision request, creates a new commit with the side effect that it signals there are local changes. At worst the commits are lost if the repo storage is ephemeral and is destroyed before the commits are pushed.