Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race on precommit when cluster is changed #554

Merged
merged 3 commits into from
Dec 3, 2024

Conversation

antonio2368
Copy link
Contributor

On config changes, the precommit index will be updated instantly to index of config change.
After it, request_append_entries is called which instantly commits up to precommit index if a node was removed from 2-node cluster.
If logs were being applied at the same time it could lead to commit thread trying to commit logs that were still not precommitted.
I couldn't find another explanation for such scenario.
If my conclusions are wrong feel free to correct me. 🙏
I also couldn't find a better way to fix it quickly.

@greensky00
Copy link
Contributor

greensky00 commented Nov 24, 2024

Hi @antonio2368

Can you please help me understand the problem? From what I read your description,

  1. From a 2-node cluster, one member was removed, now there is only one member.
  2. Let's say there is a config change -- adding a new member.
  3. It will directly call store_log_entry, which internally updates pre-commit index.
  4. And then it will call request_append_entries, which immediately commits the log, as there is only one member.
  5. Background commit thread tries to execute the commit.

And your problem is, during 5, the log is not there yet at the moment of commit? But it is supposed to be in log store as log_store_->flush() is called in store_log_entry.

Can you please elaborate more on it? I wonder if I'm understanding it correctly.

@antonio2368
Copy link
Contributor Author

Unless I missed something, handle_cli_req can run in parallel with config changes.
So if we have a batch of requests, config can be stored in the middle of the batch.
When config is stored it will update the precommit index.
Storing new client request is done in 2 steps:

  • storing the request
  • running precommit for it

Config can trigger commit after we stored the request but didn't precommit it yet so this leads to inconsistency (at least with our implementation of it).

@greensky00
Copy link
Contributor

Thanks for explanation, now I get it. Acquiring the lock inside store_log_entry looks good to me, but let me add some comments in there for better understanding for future readers.

@greensky00 greensky00 merged commit b602172 into eBay:master Dec 3, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants