-
Notifications
You must be signed in to change notification settings - Fork 447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VL] Add support to write parquet files to GCS #3978
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/oap-project/gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
std::shared_ptr<arrow::Schema> schema) | ||
: Datasource(filePath, schema), | ||
filePath_(filePath), | ||
schema_(schema), | ||
pool_(std::move(veloxPool)), | ||
s3SinkPool_(std::move(s3SinkPool)) {} | ||
s3SinkPool_(std::move(s3SinkPool)), | ||
gcsSinkPool_(std::move(gcsSinkPool)) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's extend the class to VeloxParquetDatsourceS3, VeloxParquetDatsourceGCS and VeloxParquetDatsourceABFS, insted of puting all of them in the same file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the class needs a refactor, but that goes out of the scope of this change.
I propose to make the refactor after this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -56,6 +56,16 @@ void VeloxParquetDatasource::init(const std::unordered_map<std::string, std::str | |||
#else | |||
throw std::runtime_error( | |||
"The write path is S3 path but the S3 haven't been enabled when writing parquet data in velox runtime!"); | |||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, Let's move the logic to independent file and compile the file depends on compile flag. Let's avoid to use ifdef ENABLE_S3/GCS etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but please read my previous answer.
36d5b1f
to
cf3f3a5
Compare
This change adds support to write parquet files to GCS. It is based on the support already present to write S3. Fixes apache#3976
cf3f3a5
to
957a022
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
This change adds support to write parquet files to GCS.
It is based on the support already present to write S3.
Fixes #3976