Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consistent query bug #483

Closed
kjnilsson opened this issue Dec 5, 2024 · 0 comments · Fixed by #491
Closed

consistent query bug #483

kjnilsson opened this issue Dec 5, 2024 · 0 comments · Fixed by #491
Labels
Milestone

Comments

@kjnilsson
Copy link
Contributor

kjnilsson commented Dec 5, 2024

Describe the bug

https://github.com/rabbitmq/ra-kv-store/actions/runs/12121409210

In. one test (in a long history of non failing runs) one leader in a minority partition decided it was ok to serve consistent reads.
After a fair bit of time investigating it wasn't clear exactly what caused it but I suspect it was due to the leader somehow ending up with an incorrectly high query_index for one of it's peers.

The consistent query logic can be refactored to avoid relying on this piece of state entirely as all it does is perform a leader liveness check then waits for the commit index to be appled then execute the query.

This check can be done without each member keeping a query_index and instead just returning the {term, index} tuple provided if the term is equal or higher.

Reproduction steps

https://github.com/rabbitmq/ra-kv-store/actions/runs/12121409210

Expected behavior

.

Additional context

handle_follower(#heartbeat_rpc{query_index = RpcQueryIndex, term = Term,
                               leader_id = LeaderId},
                #{current_term := CurTerm,
                  cfg := #cfg{id = Id}} = State0)
  when Term >= CurTerm ->
    State1 = update_term(Term, State0),
    #{query_index := QueryIndex} = State1,
    NewQueryIndex = max(RpcQueryIndex, QueryIndex),
    State2 = update_query_index(State1#{leader_id => LeaderId}, NewQueryIndex),
    Reply = heartbeat_reply(State2),
    {follower, State2, [cast_reply(Id, LeaderId, Reply)]};

In this code, if the follower somehow ended up with a local_query index that was higher than what the leader has (each term resets the query index) then that could cause a leader to locally store a higher query index which if this follower was later partition would still contribute to the quorum calculation. If we can remove this state we can remove this possibility.

@kjnilsson kjnilsson added the bug label Dec 5, 2024
@kjnilsson kjnilsson added this to the 2.16.0 milestone Dec 5, 2024
@kjnilsson kjnilsson assigned dumbbell and unassigned dumbbell Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants