Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parquet read TableDefinition support #4831

Merged
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Merge remote-tracking branch 'upstream/main' into nightly/explicit-pa…
…rquet-definitions
devinrsmith committed Nov 15, 2023
commit debb9a384d0066e0d09d3ac46bb58693ae1f04ec
Original file line number Diff line number Diff line change
@@ -934,7 +934,7 @@ public void partitionedParquetWithDotFilesTest() throws IOException {
writeTable(someTable, secondDataFile);

Table partitionedTable = readKeyValuePartitionedTable(parentDir, EMPTY).select();
final Set<?> columnsSet = partitionedTable.getColumnSourceMap().keySet();
final Set<String> columnsSet = partitionedTable.getDefinition().getColumnNameSet();
assertTrue(columnsSet.size() == 2 && columnsSet.contains("A") && columnsSet.contains("X"));

// Add an empty dot file and dot directory (with valid parquet files) in one of the partitions
4 changes: 3 additions & 1 deletion py/server/tests/test_parquet.py
Original file line number Diff line number Diff line change
@@ -383,7 +383,9 @@ def time_test_helper(pa_table, new_schema, dest):
# Write the provided pyarrow table type-casted to the new schema
pyarrow.parquet.write_table(pa_table.cast(new_schema), dest)
from_disk = read(dest, type=ParquetType.SINGLE)
df_from_disk = to_pandas(from_disk)

# TODO dtype_backend=None is a workaround until https://github.com/deephaven/deephaven-core/issues/4823 is fixed
df_from_disk = to_pandas(from_disk, dtype_backend=None)
original_df = pa_table.to_pandas()
# Compare the dataframes as strings
self.assertTrue((df_from_disk.astype(str) == original_df.astype(str)).all().values.all())
You are viewing a condensed version of this merge commit. You can view the full changes here.