[VL] Align the timezone of timestamp partition value with Velox #29703
Annotations
11 errors
VeloxIcebergSuite.iceberg partition type - timestamp:
org/apache/gluten/execution/VeloxIcebergSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [*]
+- 'UnresolvedRelation [part_by_timestamp], [], false
== Analyzed Logical Plan ==
p: timestamp
Project [p#25736]
+- SubqueryAlias spark_catalog.default.part_by_timestamp
+- RelationV2[p#25736] spark_catalog.default.part_by_timestamp
== Optimized Logical Plan ==
RelationV2[p#25736] spark_catalog.default.part_by_timestamp
== Physical Plan ==
VeloxColumnarToRow
+- ^(1204) IcebergIcebergScanTransformer[p#25736] spark_catalog.default.part_by_timestamp [filters=] RuntimeFilters: []
== Results ==
== Results ==
!== Correct Answer - 1 == == Gluten Answer - 1 ==
struct<> struct<>
![2022-01-01 00:01:20.0] [2022-01-01 08:01:20.0]
|
VeloxIcebergSuite.iceberg partition type - timestamp:
org/apache/gluten/execution/VeloxIcebergSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [*]
+- 'UnresolvedRelation [part_by_timestamp], [], false
== Analyzed Logical Plan ==
p: timestamp
Project [p#26224]
+- SubqueryAlias spark_catalog.default.part_by_timestamp
+- RelationV2[p#26224] spark_catalog.default.part_by_timestamp
== Optimized Logical Plan ==
RelationV2[p#26224] spark_catalog.default.part_by_timestamp
== Physical Plan ==
VeloxColumnarToRow
+- ^(1241) IcebergIcebergScanTransformer[p#26224] spark_catalog.default.part_by_timestamp (branch=null) [filters=, groupedBy=] RuntimeFilters: []
== Results ==
== Results ==
!== Correct Answer - 1 == == Gluten Answer - 1 ==
struct<> struct<>
![2022-01-01 00:01:20.0] [2022-01-01 08:01:20.0]
|
GlutenFileMetadataStructSuite.metadata struct (parquet): read partial/all metadata struct fields:
org/apache/spark/sql/execution/datasources/GlutenFileMetadataStructSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project ['name, 'age, 'info, '_metadata.file_name, '_metadata.file_path, '_metadata.file_size, '_metadata.file_block_start, '_metadata.file_block_length, '_metadata.file_modification_time]
+- Relation [name#602751,age#602752,info#602753] parquet
== Analyzed Logical Plan ==
name: string, age: int, info: struct<id:bigint,university:string>, file_name: string, file_path: string, file_size: bigint, file_block_start: bigint, file_block_length: bigint, file_modification_time: timestamp
Project [name#602751, age#602752, info#602753, _metadata#602757.file_name AS file_name#602758, _metadata#602757.file_path AS file_path#602759, _metadata#602757.file_size AS file_size#602760L, _metadata#602757.file_block_start AS file_block_start#602761L, _metadata#602757.file_block_length AS file_block_length#602762L, _metadata#602757.file_modification_time AS file_modification_time#602763]
+- Relation [name#602751,age#602752,info#602753,_metadata#602757] parquet
== Optimized Logical Plan ==
Project [name#602751, age#602752, info#602753, _metadata#602757.file_name AS file_name#602758, _metadata#602757.file_path AS file_path#602759, _metadata#602757.file_size AS file_size#602760L, _metadata#602757.file_block_start AS file_block_start#602761L, _metadata#602757.file_block_length AS file_block_length#602762L, _metadata#602757.file_modification_time AS file_modification_time#602763]
+- Relation [name#602751,age#602752,info#602753,_metadata#602757] parquet
== Physical Plan ==
VeloxColumnarToRow
+- ^(36662) ProjectExecTransformer [name#602751, age#602752, info#602753, _metadata#602757.file_name AS file_name#602758, _metadata#602757.file_path AS file_path#602759, _metadata#602757.file_size AS file_size#602760L, _metadata#602757.file_block_start AS file_block_start#602761L, _metadata#602757.file_block_length AS file_block_length#602762L, _metadata#602757.file_modification_time AS file_modification_time#602763]
+- ^(36662) ProjectExecTransformer [name#602751, age#602752, info#602753, knownnotnull(named_struct(file_path, file_path#602778, file_name, file_name#602779, file_size, file_size#602780L, file_block_start, file_block_start#602781L, file_block_length, file_block_length#602782L, file_modification_time, file_modification_time#602783)) AS _metadata#602757]
+- ^(36662) FileScanTransformer parquet [name#602751,age#602752,info#602753,file_path#602778,file_name#602779,file_size#602780L,file_block_start#602781L,file_block_length#602782L,file_modification_time#602783] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(2 paths)[file:/tmp/spark-d1e0a394-47f9-4794-8d0d-57ce30697eae/data/f1, file:/tm..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<name:string,age:int,info:struct<id:bigint,university:string>>
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 2 ==
!struct<> struct<name:string,age:int,info:struct<id:bigint,university:string>,file_name:string,file_path:string,file_size:bigint,file_block_start:bigint,file_block_length:bigint,file_modification_time:timestamp>
![jack,24,[12345,uom],part-00000-7664776a-cc9d-4b06-b747-e4e19da6dcc4-c000.snappy.parquet,file:/tmp/spark-d1e0a394-47f9-4794-8d0d-57ce30697eae/data/f0/part-00000-7664776a-cc9d-4b06-b747-e4e19da6dcc4-c000.snappy.parquet,1282,0,1282,2024-11-21 02:06:05.561] [jack,24,[12345,uom],part-00000-7664776a-cc9d-4b06-b747-e4e19da6dcc4-c000.snappy.parquet,file:/tmp/spark-d1e0a394-47f9-4794-8d0d-57ce30697eae/data/f0/part-00000-7664776a-cc9d-4b06-b747-e4e19da6dcc4-c000.snappy.parquet,1282,0,1282,2024-11-21 10:06:05.561]
![lily,31,[54321,ucb],part-00000-128f0b81-368a-41af-9ac2-76cf8754a36c-c000.snappy.parquet,file:/tmp/spark-d1e0a394-47f9-4794-8d0d-57ce30697eae/data/f1/part-00000-128f0b81-368a-41af-9ac2-76cf8754a36c-c000.snappy.parquet,1282,0,1282,2024-11-21 02:06:05.629] [lily,31,[54321,ucb],part-00000-128f0b81-368a-41af-9ac2-76cf8754a36c-c000.snappy.parquet,file:/tmp/spark-d1e0a394-47f9-4794-8d0d-57ce30697eae/data/f1/part-00000-128f0b81-368a-41af-9ac2-76cf8754a36c-c000.snappy.parquet,1282,0,1282,2024-11-21 10:06:05.629]
|
GlutenFileMetadataStructSuite.metadata struct (parquet): select only metadata:
org/apache/spark/sql/execution/datasources/GlutenFileMetadataStructSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project ['_metadata.file_name, '_metadata.file_path, '_metadata.file_size, '_metadata.file_block_start, '_metadata.file_block_length, '_metadata.file_modification_time]
+- Relation [name#603272,age#603273,info#603274] parquet
== Analyzed Logical Plan ==
file_name: string, file_path: string, file_size: bigint, file_block_start: bigint, file_block_length: bigint, file_modification_time: timestamp
Project [_metadata#603278.file_name AS file_name#603279, _metadata#603278.file_path AS file_path#603280, _metadata#603278.file_size AS file_size#603281L, _metadata#603278.file_block_start AS file_block_start#603282L, _metadata#603278.file_block_length AS file_block_length#603283L, _metadata#603278.file_modification_time AS file_modification_time#603284]
+- Relation [name#603272,age#603273,info#603274,_metadata#603278] parquet
== Optimized Logical Plan ==
Project [_metadata#603278.file_name AS file_name#603279, _metadata#603278.file_path AS file_path#603280, _metadata#603278.file_size AS file_size#603281L, _metadata#603278.file_block_start AS file_block_start#603282L, _metadata#603278.file_block_length AS file_block_length#603283L, _metadata#603278.file_modification_time AS file_modification_time#603284]
+- Relation [name#603272,age#603273,info#603274,_metadata#603278] parquet
== Physical Plan ==
VeloxColumnarToRow
+- ^(36682) ProjectExecTransformer [_metadata#603278.file_name AS file_name#603279, _metadata#603278.file_path AS file_path#603280, _metadata#603278.file_size AS file_size#603281L, _metadata#603278.file_block_start AS file_block_start#603282L, _metadata#603278.file_block_length AS file_block_length#603283L, _metadata#603278.file_modification_time AS file_modification_time#603284]
+- ^(36682) ProjectExecTransformer [knownnotnull(named_struct(file_path, file_path#603295, file_name, file_name#603296, file_size, file_size#603297L, file_block_start, file_block_start#603298L, file_block_length, file_block_length#603299L, file_modification_time, file_modification_time#603300)) AS _metadata#603278]
+- ^(36682) FileScanTransformer parquet [file_path#603295,file_name#603296,file_size#603297L,file_block_start#603298L,file_block_length#603299L,file_modification_time#603300] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(2 paths)[file:/tmp/spark-1b4ec7da-b117-45b1-bc33-195615ab89eb/data/f1, file:/tm..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<>
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 2 ==
!struct<> struct<file_name:string,file_path:string,file_size:bigint,file_block_start:bigint,file_block_length:bigint,file_modification_time:timestamp>
![part-00000-48fe8e34-a8bc-4844-9201-536ba67db381-c000.snappy.parquet,file:/tmp/spark-1b4ec7da-b117-45b1-bc33-195615ab89eb/data/f0/part-00000-48fe8e34-a8bc-4844-9201-536ba67db381-c000.snappy.parquet,1282,0,1282,2024-11-21 02:06:08.017] [part-00000-48fe8e34-a8bc-4844-9201-536ba67db381-c000.snappy.parquet,file:/tmp/spark-1b4ec7da-b117-45b1-bc33-195615ab89eb/data/f0/part-00000-48fe8e34-a8bc-4844-9201-536ba67db381-c000.snappy.parquet,1282,0,1282,2024-11-21 10:06:08.017]
![part-00000-763819fe-fd15-4b39-833e-6c600098f01a-c000.snappy.parquet,file:/tmp/spark-1b4ec7da-b117-45b1-bc33-195615ab89eb/data/f1/part-00000-763819fe-fd15-4b39-833e-6c600098f01a-c000.snappy.parquet,1282,0,1282,2024-11-21 02:06:08.065] [part-00000-763819fe-fd15-4b39-833e-6c600098f01a-c000.snappy.parquet,file:/tmp/spark-1b4ec7da-b117-45b1-bc33-195615ab89eb/data/f1/part-00000-763819fe-fd15-4b39-833e-6c600098f01a-c000.snappy.parquet,1282,0,1282,2024-11-21 10:06:08.065]
|
GlutenFileMetadataStructSuite.metadata struct (parquet): filter on metadata and user data:
org/apache/spark/sql/execution/datasources/GlutenFileMetadataStructSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Filter ('age = 31)
+- Project [name#603806, age#603807, info#603808, file_name#603813, file_path#603814, file_size#603815L, file_block_start#603816L, file_block_length#603817L, file_modification_time#603818]
+- Filter (_metadata#603812.file_path = file:/tmp/spark-25849b1f-70e9-4404-8300-30a153ab6b66/data/f1/part-00000-1aafa27d-ddd8-47c7-a4bc-c89ed7786044-c000.snappy.parquet)
+- Project [name#603806, age#603807, info#603808, file_name#603813, file_path#603814, file_size#603815L, file_block_start#603816L, file_block_length#603817L, file_modification_time#603818, _metadata#603812]
+- Filter ((_metadata#603812.file_name = part-00000-1aafa27d-ddd8-47c7-a4bc-c89ed7786044-c000.snappy.parquet) AND (name#603806 = lily))
+- Project [name#603806, age#603807, info#603808, _metadata#603812.file_name AS file_name#603813, _metadata#603812.file_path AS file_path#603814, _metadata#603812.file_size AS file_size#603815L, _metadata#603812.file_block_start AS file_block_start#603816L, _metadata#603812.file_block_length AS file_block_length#603817L, _metadata#603812.file_modification_time AS file_modification_time#603818, _metadata#603812]
+- Relation [name#603806,age#603807,info#603808,_metadata#603812] parquet
== Analyzed Logical Plan ==
name: string, age: int, info: struct<id:bigint,university:string>, file_name: string, file_path: string, file_size: bigint, file_block_start: bigint, file_block_length: bigint, file_modification_time: timestamp
Filter (age#603807 = 31)
+- Project [name#603806, age#603807, info#603808, file_name#603813, file_path#603814, file_size#603815L, file_block_start#603816L, file_block_length#603817L, file_modification_time#603818]
+- Filter (_metadata#603812.file_path = file:/tmp/spark-25849b1f-70e9-4404-8300-30a153ab6b66/data/f1/part-00000-1aafa27d-ddd8-47c7-a4bc-c89ed7786044-c000.snappy.parquet)
+- Project [name#603806, age#603807, info#603808, file_name#603813, file_path#603814, file_size#603815L, file_block_start#603816L, file_block_length#603817L, file_modification_time#603818, _metadata#603812]
+- Filter ((_metadata#603812.file_name = part-00000-1aafa27d-ddd8-47c7-a4bc-c89ed7786044-c000.snappy.parquet) AND (name#603806 = lily))
+- Project [name#603806, age#603807, info#603808, _metadata#603812.file_name AS file_name#603813, _metadata#603812.file_path AS file_path#603814, _metadata#603812.file_size AS file_size#603815L, _metadata#603812.file_block_start AS file_block_start#603816L, _metadata#603812.file_block_length AS file_block_length#603817L, _metadata#603812.file_modification_time AS file_modification_time#603818, _metadata#603812]
+- Relation [name#603806,age#603807,info#603808,_metadata#603812] parquet
== Optimized Logical Plan ==
Project [name#603806, age#603807, info#603808, _metadata#603812.file_name AS file_name#603813, _metadata#603812.file_path AS file_path#603814, _metadata#603812.file_size AS file_size#603815L, _metadata#603812.file_block_start AS file_block_start#603816L, _metadata#603812.file_block_length AS file_block_length#603817L, _metadata#603812.file_modification_time AS file_modification_time#603818]
+- Filter (((((isnotnull(name#603806) AND isnotnull(age#603807)) AND (_metadata#603812.file_name = part-00000-1aafa27d-ddd8-47c7-a4bc-c89ed7786044-c000.snappy.parquet)) AND (name#603806 = lily)) AND (_metadata#603812.file_path = file:/tmp/spark-25849b1f-70e9-4404-8300-30a153ab6b66/data/f1/part-00000-1aafa27d-ddd8-47c7-a4bc-c89ed7786044-c000.snappy.parquet)) AND (age#603807 = 31))
+- Relation [name#603806,age#603807,info#603808,_metadata#603812] parquet
== Physical Plan ==
VeloxColumnarToRow
+- ^(36706) ProjectExecTransformer [name#603806, age#603807, info#603808, _metadata#603812.file_name AS file_name#603813, _metadata#603812.file_path AS file_path#603814, _metadata#603812.file_size AS file_size#603815L, _metadata#603812.file_block_start AS file_block_start#603816L, _metadata#603812.file_block_length AS file_block_length#603817L, _metadata#603812.file_modification_time AS file_modification_time#603818]
+- ^(36706) FilterExecTransformer (((((isnotnull(name#603806) AND isnotnull(age#603807)) AND (_metadata#603812.file_name = part-00000-1aafa27d-ddd8-47c7-a4bc-c89ed7786044-c000.snappy.parquet)) AND (name#603806 = lily)) AND (_metadata#603812.file_path = file:/tmp/spark-25849b1f-70e9-4404-8300-30a153ab6b66/data/f1/part-00000-1aafa27d-ddd8-47c7-a4bc-c89ed7786044-c000.snappy.parquet)) AND (age#603807 = 31))
+- ^(36706) ProjectExecTransformer [name#603806, age#603807, info#603808, knownnotnull(named_struct(file_path, file_path#603835, file_name, file_name#603836, file_size, file_size#603837L, file_block_start, file_block_start#603838L, file_block_length, file_block_length#603839L, file_modification_time, file_modification_time#603840)) AS _metadata#603812]
+- ^(36706) FileScanTransformer parquet [name#603806,age#603807,info#603808,file_path#603835,file_name#603836,file_size#603837L,file_block_start#603838L,file_block_length#603839L,file_modification_time#603840] Batched: true, DataFilters: [isnotnull(name#603806), isnotnull(age#603807), (file_name#603836 = part-00000-1aafa27d-ddd8-47c7..., Format: Parquet, Location: InMemoryFileIndex(2 paths)[file:/tmp/spark-25849b1f-70e9-4404-8300-30a153ab6b66/data/f1, file:/tm..., PartitionFilters: [], PushedFilters: [IsNotNull(name), IsNotNull(age), EqualTo(name,lily), EqualTo(age,31)], ReadSchema: struct<name:string,age:int,info:struct<id:bigint,university:string>>
== Results ==
== Results ==
!== Correct Answer - 1 == == Spark Answer - 1 ==
!struct<> struct<name:string,age:int,info:struct<id:bigint,university:string>,file_name:string,file_path:string,file_size:bigint,file_block_start:bigint,file_block_length:bigint,file_modification_time:timestamp>
![lily,31,[54321,ucb],part-00000-1aafa27d-ddd8-47c7-a4bc-c89ed7786044-c000.snappy.parquet,file:/tmp/spark-25849b1f-70e9-4404-8300-30a153ab6b66/data/f1/part-00000-1aafa27d-ddd8-47c7-a4bc-c89ed7786044-c000.snappy.parquet,1282,0,1282,2024-11-21 02:06:10.173] [lily,31,[54321,ucb],part-00000-1aafa27d-ddd8-47c7-a4bc-c89ed7786044-c000.snappy.parquet,file:/tmp/spark-25849b1f-70e9-4404-8300-30a153ab6b66/data/f1/part-00000-1aafa27d-ddd8-47c7-a4bc-c89ed7786044-c000.snappy.parquet,1282,0,1282,2024-11-21 10:06:10.173]
|
GlutenFileMetadataStructSuite.metadata struct (parquet): upper/lower case when case sensitive is true:
org/apache/spark/sql/execution/datasources/GlutenFileMetadataStructSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project ['name, 'age, '_METADATA, '_metadata]
+- Relation [name#603934,age#603935,_METADATA#603936] parquet
== Analyzed Logical Plan ==
name: string, age: int, _METADATA: struct<id:bigint,university:string>, _metadata: struct<file_path:string,file_name:string,file_size:bigint,file_block_start:bigint,file_block_length:bigint,file_modification_time:timestamp,row_index:bigint>
Project [name#603934, age#603935, _METADATA#603936, _metadata#603940]
+- Relation [name#603934,age#603935,_METADATA#603936,_metadata#603940] parquet
== Optimized Logical Plan ==
Relation [name#603934,age#603935,_METADATA#603936,_metadata#603940] parquet
== Physical Plan ==
VeloxColumnarToRow
+- ^(36710) ProjectExecTransformer [name#603934, age#603935, _METADATA#603936, knownnotnull(named_struct(file_path, file_path#603946, file_name, file_name#603947, file_size, file_size#603948L, file_block_start, file_block_start#603949L, file_block_length, file_block_length#603950L, file_modification_time, file_modification_time#603951, row_index, row_index#603952L)) AS _metadata#603940]
+- ^(36710) FileScanTransformer parquet [name#603934,age#603935,_METADATA#603936,_tmp_metadata_row_index#603952L,file_path#603946,file_name#603947,file_size#603948L,file_block_start#603949L,file_block_length#603950L,file_modification_time#603951] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(2 paths)[file:/tmp/spark-bdc25467-35cf-498a-9c9c-618ea620f619/data/f1, file:/tm..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<name:string,age:int,_METADATA:struct<id:bigint,university:string>,_tmp_metadata_row_index:...
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 2 ==
!struct<> struct<name:string,age:int,_METADATA:struct<id:bigint,university:string>,_metadata:struct<file_path:string,file_name:string,file_size:bigint,file_block_start:bigint,file_block_length:bigint,file_modification_time:timestamp,row_index:bigint>>
![jack,24,[12345,uom],[file:/tmp/spark-bdc25467-35cf-498a-9c9c-618ea620f619/data/f0/part-00000-a9863a4f-953f-48cd-b1d1-9ff77c39d0cd-c000.snappy.parquet,part-00000-a9863a4f-953f-48cd-b1d1-9ff77c39d0cd-c000.snappy.parquet,1302,0,1302,2024-11-21 02:06:10.649,0]] [jack,24,[12345,uom],[file:/tmp/spark-bdc25467-35cf-498a-9c9c-618ea620f619/data/f0/part-00000-a9863a4f-953f-48cd-b1d1-9ff77c39d0cd-c000.snappy.parquet,part-00000-a9863a4f-953f-48cd-b1d1-9ff77c39d0cd-c000.snappy.parquet,1302,0,1302,2024-11-21 10:06:10.649,0]]
![lily,31,[54321,ucb],[file:/tmp/spark-bdc25467-35cf-498a-9c9c-618ea620f619/data/f1/part-00000-809e2c86-f6ab-40be-8e06-0c9a0400f03c-c000.snappy.parquet,part-00000-809e2c86-f6ab-40be-8e06-0c9a0400f03c-c000.snappy.parquet,1302,0,1302,2024-11-21 02:06:10.701,0]] [lily,31,[54321,ucb],[file:/tmp/spark-bdc25467-35cf-498a-9c9c-618ea620f619/data/f1/part-00000-809e2c86-f6ab-40be-8e06-0c9a0400f03c-c000.snappy.parquet,part-00000-809e2c86-f6ab-40be-8e06-0c9a0400f03c-c000.snappy.parquet,1302,0,1302,2024-11-21 10:06:10.701,0]]
|
GlutenFileMetadataStructSuite.metadata struct (parquet): read metadata with offheap set to true:
org/apache/spark/sql/execution/datasources/GlutenFileMetadataStructSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project ['name, 'age, 'info, '_metadata.file_name, '_metadata.file_path, '_metadata.file_size, '_metadata.file_block_start, '_metadata.file_block_length, '_metadata.file_modification_time]
+- Relation [name#604172,age#604173,info#604174] parquet
== Analyzed Logical Plan ==
name: string, age: int, info: struct<id:bigint,university:string>, file_name: string, file_path: string, file_size: bigint, file_block_start: bigint, file_block_length: bigint, file_modification_time: timestamp
Project [name#604172, age#604173, info#604174, _metadata#604178.file_name AS file_name#604179, _metadata#604178.file_path AS file_path#604180, _metadata#604178.file_size AS file_size#604181L, _metadata#604178.file_block_start AS file_block_start#604182L, _metadata#604178.file_block_length AS file_block_length#604183L, _metadata#604178.file_modification_time AS file_modification_time#604184]
+- Relation [name#604172,age#604173,info#604174,_metadata#604178] parquet
== Optimized Logical Plan ==
Project [name#604172, age#604173, info#604174, _metadata#604178.file_name AS file_name#604179, _metadata#604178.file_path AS file_path#604180, _metadata#604178.file_size AS file_size#604181L, _metadata#604178.file_block_start AS file_block_start#604182L, _metadata#604178.file_block_length AS file_block_length#604183L, _metadata#604178.file_modification_time AS file_modification_time#604184]
+- Relation [name#604172,age#604173,info#604174,_metadata#604178] parquet
== Physical Plan ==
VeloxColumnarToRow
+- ^(36718) ProjectExecTransformer [name#604172, age#604173, info#604174, _metadata#604178.file_name AS file_name#604179, _metadata#604178.file_path AS file_path#604180, _metadata#604178.file_size AS file_size#604181L, _metadata#604178.file_block_start AS file_block_start#604182L, _metadata#604178.file_block_length AS file_block_length#604183L, _metadata#604178.file_modification_time AS file_modification_time#604184]
+- ^(36718) ProjectExecTransformer [name#604172, age#604173, info#604174, knownnotnull(named_struct(file_path, file_path#604199, file_name, file_name#604200, file_size, file_size#604201L, file_block_start, file_block_start#604202L, file_block_length, file_block_length#604203L, file_modification_time, file_modification_time#604204)) AS _metadata#604178]
+- ^(36718) FileScanTransformer parquet [name#604172,age#604173,info#604174,file_path#604199,file_name#604200,file_size#604201L,file_block_start#604202L,file_block_length#604203L,file_modification_time#604204] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(2 paths)[file:/tmp/spark-8a3970b3-c38e-4e40-924f-b39a28bf72de/data/f1, file:/tm..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<name:string,age:int,info:struct<id:bigint,university:string>>
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 2 ==
!struct<> struct<name:string,age:int,info:struct<id:bigint,university:string>,file_name:string,file_path:string,file_size:bigint,file_block_start:bigint,file_block_length:bigint,file_modification_time:timestamp>
![jack,24,[12345,uom],part-00000-12fc1559-3f13-45dd-9e83-b829e144a5ff-c000.snappy.parquet,file:/tmp/spark-8a3970b3-c38e-4e40-924f-b39a28bf72de/data/f0/part-00000-12fc1559-3f13-45dd-9e83-b829e144a5ff-c000.snappy.parquet,1282,0,1282,2024-11-21 02:06:11.669] [jack,24,[12345,uom],part-00000-12fc1559-3f13-45dd-9e83-b829e144a5ff-c000.snappy.parquet,file:/tmp/spark-8a3970b3-c38e-4e40-924f-b39a28bf72de/data/f0/part-00000-12fc1559-3f13-45dd-9e83-b829e144a5ff-c000.snappy.parquet,1282,0,1282,2024-11-21 10:06:11.669]
![lily,31,[54321,ucb],part-00000-62303a82-574a-4832-955c-407fee4fa2ed-c000.snappy.parquet,file:/tmp/spark-8a3970b3-c38e-4e40-924f-b39a28bf72de/data/f1/part-00000-62303a82-574a-4832-955c-407fee4fa2ed-c000.snappy.parquet,1282,0,1282,2024-11-21 02:06:11.733] [lily,31,[54321,ucb],part-00000-62303a82-574a-4832-955c-407fee4fa2ed-c000.snappy.parquet,file:/tmp/spark-8a3970b3-c38e-4e40-924f-b39a28bf72de/data/f1/part-00000-62303a82-574a-4832-955c-407fee4fa2ed-c000.snappy.parquet,1282,0,1282,2024-11-21 10:06:11.733]
|
GlutenFileMetadataStructSuite.metadata struct (parquet): read metadata with offheap set to false:
org/apache/spark/sql/execution/datasources/GlutenFileMetadataStructSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project ['name, 'age, 'info, '_metadata.file_name, '_metadata.file_path, '_metadata.file_size, '_metadata.file_block_start, '_metadata.file_block_length, '_metadata.file_modification_time]
+- Relation [name#604344,age#604345,info#604346] parquet
== Analyzed Logical Plan ==
name: string, age: int, info: struct<id:bigint,university:string>, file_name: string, file_path: string, file_size: bigint, file_block_start: bigint, file_block_length: bigint, file_modification_time: timestamp
Project [name#604344, age#604345, info#604346, _metadata#604350.file_name AS file_name#604351, _metadata#604350.file_path AS file_path#604352, _metadata#604350.file_size AS file_size#604353L, _metadata#604350.file_block_start AS file_block_start#604354L, _metadata#604350.file_block_length AS file_block_length#604355L, _metadata#604350.file_modification_time AS file_modification_time#604356]
+- Relation [name#604344,age#604345,info#604346,_metadata#604350] parquet
== Optimized Logical Plan ==
Project [name#604344, age#604345, info#604346, _metadata#604350.file_name AS file_name#604351, _metadata#604350.file_path AS file_path#604352, _metadata#604350.file_size AS file_size#604353L, _metadata#604350.file_block_start AS file_block_start#604354L, _metadata#604350.file_block_length AS file_block_length#604355L, _metadata#604350.file_modification_time AS file_modification_time#604356]
+- Relation [name#604344,age#604345,info#604346,_metadata#604350] parquet
== Physical Plan ==
VeloxColumnarToRow
+- ^(36724) ProjectExecTransformer [name#604344, age#604345, info#604346, _metadata#604350.file_name AS file_name#604351, _metadata#604350.file_path AS file_path#604352, _metadata#604350.file_size AS file_size#604353L, _metadata#604350.file_block_start AS file_block_start#604354L, _metadata#604350.file_block_length AS file_block_length#604355L, _metadata#604350.file_modification_time AS file_modification_time#604356]
+- ^(36724) ProjectExecTransformer [name#604344, age#604345, info#604346, knownnotnull(named_struct(file_path, file_path#604371, file_name, file_name#604372, file_size, file_size#604373L, file_block_start, file_block_start#604374L, file_block_length, file_block_length#604375L, file_modification_time, file_modification_time#604376)) AS _metadata#604350]
+- ^(36724) FileScanTransformer parquet [name#604344,age#604345,info#604346,file_path#604371,file_name#604372,file_size#604373L,file_block_start#604374L,file_block_length#604375L,file_modification_time#604376] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(2 paths)[file:/tmp/spark-6aceeab4-e7fb-40bc-8f5e-91bfd07119e7/data/f1, file:/tm..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<name:string,age:int,info:struct<id:bigint,university:string>>
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 2 ==
!struct<> struct<name:string,age:int,info:struct<id:bigint,university:string>,file_name:string,file_path:string,file_size:bigint,file_block_start:bigint,file_block_length:bigint,file_modification_time:timestamp>
![jack,24,[12345,uom],part-00000-6990195d-208b-4ca5-a7bc-295fa719f364-c000.snappy.parquet,file:/tmp/spark-6aceeab4-e7fb-40bc-8f5e-91bfd07119e7/data/f0/part-00000-6990195d-208b-4ca5-a7bc-295fa719f364-c000.snappy.parquet,1282,0,1282,2024-11-21 02:06:12.261] [jack,24,[12345,uom],part-00000-6990195d-208b-4ca5-a7bc-295fa719f364-c000.snappy.parquet,file:/tmp/spark-6aceeab4-e7fb-40bc-8f5e-91bfd07119e7/data/f0/part-00000-6990195d-208b-4ca5-a7bc-295fa719f364-c000.snappy.parquet,1282,0,1282,2024-11-21 10:06:12.261]
![lily,31,[54321,ucb],part-00000-ef8a6a58-bc6f-440d-aab2-7e991df6b858-c000.snappy.parquet,file:/tmp/spark-6aceeab4-e7fb-40bc-8f5e-91bfd07119e7/data/f1/part-00000-ef8a6a58-bc6f-440d-aab2-7e991df6b858-c000.snappy.parquet,1282,0,1282,2024-11-21 02:06:12.321] [lily,31,[54321,ucb],part-00000-ef8a6a58-bc6f-440d-aab2-7e991df6b858-c000.snappy.parquet,file:/tmp/spark-6aceeab4-e7fb-40bc-8f5e-91bfd07119e7/data/f1/part-00000-ef8a6a58-bc6f-440d-aab2-7e991df6b858-c000.snappy.parquet,1282,0,1282,2024-11-21 10:06:12.321]
|
GlutenFileMetadataStructSuite.metadata struct (parquet): write _metadata in parquet and read back:
org/apache/spark/sql/execution/datasources/GlutenFileMetadataStructSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [*]
+- Relation [name#605025,age#605026,info#605027,_metadata#605028] parquet
== Analyzed Logical Plan ==
name: string, age: int, info: struct<id:bigint,university:string>, _metadata: struct<file_path:string,file_name:string,file_size:bigint,file_block_start:bigint,file_block_length:bigint,file_modification_time:timestamp,row_index:bigint>
Project [name#605025, age#605026, info#605027, _metadata#605028]
+- Relation [name#605025,age#605026,info#605027,_metadata#605028] parquet
== Optimized Logical Plan ==
Relation [name#605025,age#605026,info#605027,_metadata#605028] parquet
== Physical Plan ==
VeloxColumnarToRow
+- ^(36752) FileScanTransformer parquet [name#605025,age#605026,info#605027,_metadata#605028] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/spark-10ef5b42-4086-4acd-be75-f5e6dc1251eb/new-data], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<name:string,age:int,info:struct<id:bigint,university:string>,_metadata:struct<file_path:st...
== Results ==
== Results ==
!== Correct Answer - 2 == == Spark Answer - 2 ==
!struct<> struct<name:string,age:int,info:struct<id:bigint,university:string>,_metadata:struct<file_path:string,file_name:string,file_size:bigint,file_block_start:bigint,file_block_length:bigint,file_modification_time:timestamp,row_index:bigint>>
![jack,24,[12345,uom],[file:/tmp/spark-b4946f50-0c12-4780-98fb-08177c43738c/data/f0/part-00000-d1889ae8-8cb0-4ec1-9fc6-a9e5e49b6fab-c000.snappy.parquet,part-00000-d1889ae8-8cb0-4ec1-9fc6-a9e5e49b6fab-c000.snappy.parquet,1282,0,1282,2024-11-21 02:06:15.077,0]] [jack,24,[12345,uom],[file:/tmp/spark-b4946f50-0c12-4780-98fb-08177c43738c/data/f0/part-00000-d1889ae8-8cb0-4ec1-9fc6-a9e5e49b6fab-c000.snappy.parquet,part-00000-d1889ae8-8cb0-4ec1-9fc6-a9e5e49b6fab-c000.snappy.parquet,1282,0,1282,2024-11-21 10:06:15.077,0]]
![lily,31,[54321,ucb],[file:/tmp/spark-b4946f50-0c12-4780-98fb-08177c43738c/data/f1/part-00000-b2197a68-5155-4e30-b440-569394439ddc-c000.snappy.parquet,part-00000-b2197a68-5155-4e30-b440-569394439ddc-c000.snappy.parquet,1282,0,1282,2024-11-21 02:06:15.121,0]] [lily,31,[54321,ucb],[file:/tmp/spark-b4946f50-0c12-4780-98fb-08177c43738c/data/f1/part-00000-b2197a68-5155-4e30-b440-569394439ddc-c000.snappy.parquet,part-00000-b2197a68-5155-4e30-b440-569394439ddc-c000.snappy.parquet,1282,0,1282,2024-11-21 10:06:15.121,0]]
|
VeloxIcebergSuite.iceberg partition type - timestamp:
org/apache/gluten/execution/VeloxIcebergSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [*]
+- 'UnresolvedRelation [part_by_timestamp], [], false
== Analyzed Logical Plan ==
p: timestamp
Project [p#36183]
+- SubqueryAlias spark_catalog.default.part_by_timestamp
+- RelationV2[p#36183] spark_catalog.default.part_by_timestamp spark_catalog.default.part_by_timestamp
== Optimized Logical Plan ==
RelationV2[p#36183] spark_catalog.default.part_by_timestamp
== Physical Plan ==
VeloxColumnarToRow
+- ^(1594) IcebergBatchScanTransformer spark_catalog.default.part_by_timestamp[p#36183] spark_catalog.default.part_by_timestamp (branch=null) [filters=, groupedBy=] RuntimeFilters: []
== Results ==
== Results ==
!== Correct Answer - 1 == == Gluten Answer - 1 ==
struct<> struct<>
![2022-01-01 00:01:20.0] [2022-01-01 08:01:20.0]
|
VeloxIcebergSuite.iceberg partition type - timestamp:
org/apache/gluten/execution/VeloxIcebergSuite#L1
Results do not match for query:
Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]]
Timezone Env:
== Parsed Logical Plan ==
'Project [*]
+- 'UnresolvedRelation [part_by_timestamp], [], false
== Analyzed Logical Plan ==
p: timestamp
Project [p#37746]
+- SubqueryAlias spark_catalog.default.part_by_timestamp
+- RelationV2[p#37746] spark_catalog.default.part_by_timestamp spark_catalog.default.part_by_timestamp
== Optimized Logical Plan ==
RelationV2[p#37746] spark_catalog.default.part_by_timestamp
== Physical Plan ==
VeloxColumnarToRow
+- ^(1775) IcebergBatchScanTransformer spark_catalog.default.part_by_timestamp[p#37746] spark_catalog.default.part_by_timestamp (branch=null) [filters=, groupedBy=] RuntimeFilters: []
== Results ==
== Results ==
!== Correct Answer - 1 == == Gluten Answer - 1 ==
struct<> struct<>
![2022-01-01 00:01:20.0] [2022-01-01 08:01:20.0]
|