Skip to content

Commit

Permalink
Scatter struct nulls when deserializing Presto wire format (facebooki…
Browse files Browse the repository at this point in the history
…ncubator#8526)

Summary:
When reading spill serialization, struct nulls are written before the struct columns and the reading can proceed i a single pass.

Like this, nulls from enclosing structs are passed down when reading. These are combined ith nulls of the contained column so that the contained column also has a null for rows where the enclosing struct is null.

When reading Presto Pages, struct nulls come after the child columns. A separate pass scatters the child column values so as to create a null gap for the rows where the containing struct is null.

Adds a test for encoding preserving roud trips. Adds a test for concatenating different encodings in a message, e.g. constant, dictionary, flat in all combinations of same/different encoding/value domain. This functionality only applies to nulls first representations. This will apply to Presto pages when the struct nulls are read before constructing the struct. See PR 8152 for the end state.

Pull Request resolved: facebookincubator#8526

Reviewed By: bikramSingh91

Differential Revision: D53056966

Pulled By: oerling

fbshipit-source-id: a1cdaab64895324fdf5b17b434307427612992dd
  • Loading branch information
Orri Erling authored and facebook-github-bot committed Feb 15, 2024
1 parent ec6741c commit 2037957
Show file tree
Hide file tree
Showing 4 changed files with 607 additions and 268 deletions.
7 changes: 5 additions & 2 deletions velox/exec/SpillFile.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,7 @@ uint64_t SpillWriter::write(
MicrosecondTimer timer(&timeUs);
if (batch_ == nullptr) {
serializer::presto::PrestoVectorSerde::PrestoOptions options = {
kDefaultUseLosslessTimestamp, compressionKind_};
kDefaultUseLosslessTimestamp, compressionKind_, true /*nullsFirst*/};
batch_ = std::make_unique<VectorStreamGroup>(pool_);
batch_->createStreamTree(
std::static_pointer_cast<const RowType>(rows->type()),
Expand Down Expand Up @@ -292,7 +292,10 @@ SpillReadFile::SpillReadFile(
numSortKeys_(numSortKeys),
sortCompareFlags_(sortCompareFlags),
compressionKind_(compressionKind),
readOptions_{kDefaultUseLosslessTimestamp, compressionKind_},
readOptions_{
kDefaultUseLosslessTimestamp,
compressionKind_,
true /*nullsFirst*/},
pool_(pool) {
constexpr uint64_t kMaxReadBufferSize =
(1 << 20) - AlignedBuffer::kPaddedSize; // 1MB - padding.
Expand Down
Loading

0 comments on commit 2037957

Please sign in to comment.