-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
verifier: tolerate data loss #52
Conversation
38a5c12
to
1cb796e
Compare
This fails `go test` which is introduced in subsequent commit.
fdb8113
to
45958fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just nits for me.
Also, just want to clarify the behavior when we see data loss:
- validator is running with
-continuous
and-tolerate-data-loss
- validator consumes up to 100, and is continuing to poll for 101+
- data is "rolled back" to 90
- PollFetches is doing some a periodic poll of highest offset, and under the hood will detect it moving backwards...? Is that right?
- validator detects this, adjusts the ranges, and continues with respective workloads
Correct. When offsets are "rolled back" we necessarily increment the raft term / leader epoch. When Redpanda receives a fetch request with an "outdated" epoch, it responds with With this we (franz-go) fetches the last offset for that epoch using a |
This allows to restart the consumer and have it resume from the last committed offset.
Will make state management easier in a subsequent PR.
Will be used for refactoring in subsequent PR.
Never hang /last_pass, /shutdown requests in case of retries. Quality of life change.
In this mode verifier waits for new data to be produced instead of exiting on EOF. Stops after /last_pass request.
This mode allows to verify redpanda when write caching is enabled. In addition to tolerating data loss we also record and export to the /status endpoint the number of offsets/records that are considered lost from the point of view of the verifier.
9000bde
to
1bfaab4
Compare
-continuous
flag which instructs kgo-verifier to continue polling for new data rather than exiting or looping.-tolerate-data-loss
flag which facilitates testing write caching feature where data loss is tolerable in specific circumstances.