Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORC-262: [C++] Support async io prefetch for orc c++ lib #2048

Draft
wants to merge 36 commits into
base: main
Choose a base branch
from

Conversation

taiyang-li
Copy link
Contributor

@taiyang-li taiyang-li commented Oct 9, 2024

What changes were proposed in this pull request?

Support async io prefetch for orc c++ lib. Close https://issues.apache.org/jira/browse/ORC-262

Changes:

  • Added new interface InputStream::readAsync(default unimplemented). It reads io asynchronously within the specified range.
  • Added IO Cache implementation ReadRangeCache to cache async io results. This borrows from a similar design of Parquet Reader in https://github.com/apache/arrow
  • Added interface Reader::preBuffer to trigger io prefetch. In the specific implementation of ReaderImpl::preBuffer, the io ranges will be calculated according to the selected stripe and columns, and then these ranges will be merged and sorted, and ReadRangeCache::cache will be called to trigger the asynchronous io in the background, waiting for the use of the upper layer
  • Added the interface Reader::releaseBuffer, which is used to release all cached io ranges before an offset

Why are the changes needed?

Async io prefetch could hide io latency during reading orc files, which improves performance of scan operators in ClickHouse.

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

@github-actions github-actions bot added the CPP label Oct 9, 2024
@taiyang-li taiyang-li changed the title Support async io prefetch for orc c++ lib ORC-262: [C++] Support async io prefetch for orc c++ lib Oct 9, 2024
c++/src/io/Cache.hh Outdated Show resolved Hide resolved
c++/src/MemoryPool.cc Outdated Show resolved Hide resolved
c++/src/Reader.cc Show resolved Hide resolved
c++/src/io/Cache.hh Show resolved Hide resolved
c++/src/io/Cache.hh Outdated Show resolved Hide resolved
c++/src/io/Cache.hh Outdated Show resolved Hide resolved
c++/src/io/Cache.hh Outdated Show resolved Hide resolved
@ffacs
Copy link
Contributor

ffacs commented Oct 11, 2024

Reader::preBuffer prefetch stripes as a unit which might be too large. For those users who don't want to prefetch entire file one-shot, they have to know the structure of the file. Do you think it is a good idea to make prefetch transparent to users and let the orc reader prefetch data(eg. 1MB for each column at a time) when it's proper.
What's more, we could make enable async IO a option and expose a cache interface for users so they can implement their eviction policy.

@taiyang-li
Copy link
Contributor Author

taiyang-li commented Oct 11, 2024

Reader::preBuffer prefetch stripes as a unit which might be too large. For those users who don't want to prefetch entire file one-shot, they have to know the structure of the file. Do you think it is a good idea to make prefetch transparent to users and let the orc reader prefetch data(eg. 1MB for each column at a time) when it's proper. What's more, we could make enable async IO a option and expose a cache interface for users so they can implement their eviction policy.

It is totally decided by users to choose whether to prefetch the whole orc file or single/multiple columns in single stripe or single column in single/multiple stripes. Reader::preBuffer already supported all those options.

It is better letting user invoke Reader::preBuffer explicitly because only user knows which stripe/columns to read. Thus they could find the best change to prefetch to hide io latency sufficiently. e.g. the orc prefetch implementation in ClickHouse relying on current PR: ClickHouse/ClickHouse#70534 (speed up 1.47x). Besides, the parquet reader in apache arrow also has similar design.

@taiyang-li
Copy link
Contributor Author

@ffacs @wgtmac any more comments ? Thanks!

@wgtmac
Copy link
Member

wgtmac commented Oct 16, 2024

Sorry that I'm a little bit overwhelmed these days. Will take a look when I get the chance.

BTW, @luffy-zh is implementing exposing RowIndex positions: #2005. Perhaps there is an opportunity to further prefetch io together with predicate pushdown.

@taiyang-li
Copy link
Contributor Author

@wgtmac That's a great work. We could do more improvements on IO latency hiding after it is merged.

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just finished the initial review. Thanks @taiyang-li! Please see my inline comments. My main concern is the usability that it requires user to call preBuffer instead of automatically prefetching required data.

c++/include/orc/OrcFile.hh Outdated Show resolved Hide resolved
c++/include/orc/OrcFile.hh Outdated Show resolved Hide resolved
c++/include/orc/OrcFile.hh Outdated Show resolved Hide resolved
c++/include/orc/Reader.hh Outdated Show resolved Hide resolved
c++/src/MemoryPool.cc Outdated Show resolved Hide resolved
c++/src/Reader.hh Outdated Show resolved Hide resolved
c++/src/StripeStream.hh Outdated Show resolved Hide resolved
c++/src/StripeStream.hh Outdated Show resolved Hide resolved
c++/src/io/Cache.hh Outdated Show resolved Hide resolved
c++/include/orc/Reader.hh Outdated Show resolved Hide resolved
c++/src/io/Cache.hh Outdated Show resolved Hide resolved
c++/src/io/Cache.hh Outdated Show resolved Hide resolved
c++/src/io/Cache.hh Outdated Show resolved Hide resolved
@dongjoon-hyun
Copy link
Member

Could you resolve the conflicts, @taiyang-li ?

@taiyang-li
Copy link
Contributor Author

Could you resolve the conflicts, @taiyang-li ?

Done.

c++/src/Reader.cc Outdated Show resolved Hide resolved
c++/include/orc/Reader.hh Outdated Show resolved Hide resolved
c++/src/Reader.cc Outdated Show resolved Hide resolved
Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have some concerns about the public API. Please see my inline comments.

BTW, I believe void preBuffer(const std::vector<int>& stripes, const std::list<uint64_t>& includeTypes) is a little bit coarse. We need to think about how to work together with selective read (e.g. when predicate pushdown is able to filter most rows).

c++/include/orc/Reader.hh Outdated Show resolved Hide resolved
c++/include/orc/Reader.hh Outdated Show resolved Hide resolved
c++/include/orc/OrcFile.hh Outdated Show resolved Hide resolved
c++/include/orc/OrcFile.hh Outdated Show resolved Hide resolved
Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update and benchmark! I think we should not add the readAsync function which returns a DataBuffer. Please see my inline comments.

c++/src/io/Cache.cc Show resolved Hide resolved
c++/src/io/Cache.cc Show resolved Hide resolved
c++/src/Reader.hh Outdated Show resolved Hide resolved
c++/include/orc/Reader.hh Outdated Show resolved Hide resolved
c++/include/orc/Reader.hh Outdated Show resolved Hide resolved
c++/src/io/Cache.hh Show resolved Hide resolved
c++/include/orc/OrcFile.hh Outdated Show resolved Hide resolved
c++/include/orc/OrcFile.hh Outdated Show resolved Hide resolved
@taiyang-li
Copy link
Contributor Author

taiyang-li commented Nov 21, 2024

@wgtmac thanks for your advice. I had already finished the requested changes. Do you think the pr is ready to be merged ?

Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick update! Now the public interfaces look good.

I haven't carefully reviewed Cache.hh/cc yet. Hopefully I can make it by the end of this week.

cc @ffacs @luffy-zh

c++/include/orc/OrcFile.hh Outdated Show resolved Hide resolved
c++/include/orc/OrcFile.hh Outdated Show resolved Hide resolved
c++/include/orc/Reader.hh Outdated Show resolved Hide resolved
c++/include/orc/Reader.hh Outdated Show resolved Hide resolved
@@ -624,6 +647,21 @@ namespace orc {
*/
virtual std::map<uint32_t, RowGroupIndex> getRowGroupIndex(
uint32_t stripeIndex, const std::set<uint32_t>& included = {}) const = 0;

/**
* Trigger IO prefetch and cache the prefetched contents asynchronously.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the expectation when it is called multiple times w/ or w/o overlapping ranges. Also it is good to mention that it is thread safe.

@@ -19,6 +19,7 @@
#include "StripeStream.hh"
#include "RLE.hh"
#include "Reader.hh"
#include "io/Cache.hh"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sort alphabetically.

@@ -37,7 +38,8 @@ namespace orc {
stripeStart_(stripeStart),
input_(input),
writerTimezone_(writerTimezone),
readerTimezone_(readerTimezone) {
readerTimezone_(readerTimezone),
readCache_(reader.getReadCache()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might directly call RowReaderImpl.getFileContents() to get readCache as suggested above.

c++/src/io/Cache.hh Outdated Show resolved Hide resolved
c++/src/io/Cache.hh Outdated Show resolved Hide resolved
for (size_t i = 0; i < num_stripes; ++i) {
stripes.push_back(i);
}
reader->preBuffer(stripes, {0});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add test case where preBuffer is called for multiple times and with different stripe/column, etc.

@taiyang-li taiyang-li marked this pull request as draft November 21, 2024 09:39
bool hit_cache = false;
if (it != entries_.end() && it->range.contains(range)) {
hit_cache = it->future.valid();
it->future.get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it->future.valid() returns false, we might encounter an exception here.

std::vector<ReadRange> coalesce(std::vector<ReadRange> ranges) const;
};

std::vector<ReadRange> coalesceReadRanges(std::vector<ReadRange> ranges, uint64_t holeSizeLimit,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about moving coalesceReadRanges into struct ReadRangeCombiner as a static function? Actually I think a separate coalesceReadRanges function is redundant.

std::vector<ReadRange> coalesceReadRanges(std::vector<ReadRange> ranges, uint64_t holeSizeLimit,
uint64_t rangeSizeLimit);
struct RangeCacheEntry {
using BufferPtr = InputStream::BufferPtr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can directly use std::shared_ptr now.

BufferPtr buffer;
std::shared_future<void> future;

RangeCacheEntry() = default;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be delete?

RangeCacheEntry(const ReadRange& range, BufferPtr buffer, std::future<void> future)
: range(range), buffer(std::move(buffer)), future(std::move(future).share()) {}

friend bool operator<(const RangeCacheEntry& left, const RangeCacheEntry& right) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why friend?

using Buffer = InputStream::Buffer;
using BufferPtr = InputStream::BufferPtr;

struct BufferSlice {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent, struct BufferSlice should not be a nest class as well.

private:
std::vector<RangeCacheEntry> makeCacheEntries(const std::vector<ReadRange>& ranges);

InputStream* stream_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please either remove blank lines or add a blank line between member variables to keep consistency.

entries_.erase(entries_.begin(), it);
}

std::vector<RangeCacheEntry> ReadRangeCache::makeCacheEntries(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: make it const or static


auto itr = ranges.begin();
// Ensure ranges is not empty.
assert(itr <= ranges.end());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this assert is unnecessary

uint64_t coalescedStart = itr->offset;
uint64_t coalescedEnd = coalescedStart + itr->length;

for (++itr; itr < ranges.end(); ++itr) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have iterated the ranges for three times: line 31, line 41 and here. It can be done in a single pass after sorting.

bool hit_cache = false;
if (it != entries_.end() && it->range.contains(range)) {
hit_cache = it->future.valid();
it->future.get();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we catch and rethrow an orc::Exception?
Should we use timeout here to fallback to direct read?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants