Parquet file reading in VeloxIn10Minutes code #10042
-
Hi Everyone, I have created a TPCHq5 simplified query just joining lineitem and orders table. Just wondering is it possible to read directly generated parquet files rather than generating in place via TPCH connector
|
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 3 replies
-
Hi @Manoj-red-hat, Thanks for your question and for trying Velox. The TPC-H connector is for an in-memory generation of Velox vectors using dbgen program. An option for you is to write this generated data to Parquet files, and subsequently read those files in the tpch query plan. In fact we have implemented with this approach in ParquetTpchTest tests. e.g. Please take a look at Hope this helps. |
Beta Was this translation helpful? Give feedback.
-
Hi @aditi-pandit , was wondering if this is possible, I think there is support for parquet reading in table scan, but somehow its not working for me
|
Beta Was this translation helpful? Give feedback.
-
I want to focus on Velox operators, try to profile and analyze them how they perform, want zero dependency on Duck dB |
Beta Was this translation helpful? Give feedback.
-
Something like this |
Beta Was this translation helpful? Give feedback.
-
@aditi-pandit is this correct
register hive connector
Create a function to generate a HiveConnectorSplit for a Parquet file:
Use the split in a query plan:
|
Beta Was this translation helpful? Give feedback.
@aditi-pandit is this correct
include
register hive connector
Create a function to generate a HiveConnectorSplit for a Parquet file: