Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support enable-per-pid-reader and min-sequential-read-size-mb to accelerate mmap reads #2829

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

wlhee
Copy link

@wlhee wlhee commented Dec 22, 2024

Description

Support enable-per-pid-reader and min-sequential-read-size-mb to accelerate mmap reads.

  1. For each file handle, creates per pid GCS reader stream and stats (start, limit, read bytes count), so that each pid can seek forward and backward independently without interfering with each other
  2. Allow configuring the minimum sequential-read-size-mb so that each GCS stream can have a minimum bytes that it can support.
  3. This PR firstly refactored the random_reader to move all the GCS reading logics from randomReader to objectRangeReader so that each pid can have its own objectRangeReader and use it indepently.

Initial manual results show it reduced loading time from 13mins to 1min by setting perPidReader: true and minSequentialReadSizeMb: 200

Before:
Screenshot 2024-12-22 at 11 54 19 AM

After:
Screenshot 2024-12-22 at 11 54 46 AM

Link to the issue in case of a bug fix.

#2828

Testing details

  1. Manual - NA
  2. Unit tests - TODO
  3. Integration tests - TODO

@wlhee wlhee requested a review from a team as a code owner December 22, 2024 19:55
@wlhee
Copy link
Author

wlhee commented Dec 22, 2024

Note: I haven't added flags or tests. Sent out this PR first to get early feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant