-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/verify tool sample #722
base: master
Are you sure you want to change the base?
Conversation
{seek_offset, SeekOffset}, | ||
{reply, io_lib:format("~p", [Reply])}, | ||
{is_recorded_unpacked, io_lib:format("~p", [UnpackedReply])}]), | ||
case RequestOrigin of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log_chunk_error
should already skip logging for http
and tx_data
origins - is there an origin that I missed? I can add it to the matching clause
%% such that if an offset is sampled, no other offsets are selected from the | ||
%% open interval (Offset - ?DATA_CHUNK_SIZE, Offset + ?DATA_CHUNK_SIZE). | ||
generate_sample_offsets(Start, End, Count) when is_integer(Start), is_integer(End) -> | ||
Candidates = lists:seq(Start + 1, End, ?DATA_CHUNK_SIZE), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a full 3.6TB partition I think Candidates will be a list of length 14.4 million entries or have I misread?
Could that become a memory issue (e.g. if running verify on multiple storage modules concurrently)?
%% Use generate_sample_offsets/3 to obtain offsets (with exclusion) | ||
%% and then queries ar_data_sync:get_chunk/2 with options to trigger unpacking. | ||
sample_random_chunks(Count, Packing, Start, End, StoreID) -> | ||
Offsets = generate_sample_offsets(Start, End, Count), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than precalculate the list of offsets, what about doing one offset at a time sampled randomly from the range, and if that chunk exists on disk just adding that offset to a set so we never try it again? Maybe this will avoid the 14.4M offset list?
No description provided.