Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Barrage Refactor Read/Write Chunk Factories #6065

Open
wants to merge 78 commits into
base: main
Choose a base branch
from

Conversation

nbauernfeind
Copy link
Member

@nbauernfeind nbauernfeind commented Sep 13, 2024

Fixes #58 (custom type serialization / deserialization).
Fixes #936 (ColumnConversionModes is being replaced with easy to integrate custom serialization).
Fixes #2984 (refactoring has good interface documentation).
Fixes #3403 (by supporting a variety of mappings, these now must match client wiring).
Fixes #5258 (snapshot/subscribe methods with default w2w options).
Fixes #5453 (support other Timestamp arrow wire encodings).
Fixes #5864 (support for uint64_t).
Fixes #6114 (supports ObjectVector properly).

This PR adds support to (some to be implemented as this is a WIP):

  • enables users to target wire types and configurations
  • enables users to target deephaven types, including custom codecs
  • improves snapshot/subscription APIs for default options

nightlies: https://github.com/nbauernfeind/deephaven-core/actions/runs/12172832822

@nbauernfeind nbauernfeind added this to the 0.37.0 milestone Sep 13, 2024
@nbauernfeind nbauernfeind self-assigned this Sep 13, 2024
@nbauernfeind nbauernfeind marked this pull request as draft September 13, 2024 15:08
Copy link
Member

@rcaudy rcaudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little nervous about the scope, and the willingness to truncate inputs or infer units.

engine/chunk/src/main/java/io/deephaven/chunk/Chunk.java Outdated Show resolved Hide resolved
@@ -18,7 +20,56 @@
/**
* Consumes Flight/Barrage streams and transforms them into WritableChunks.
*/
public interface ChunkReader {
public interface ChunkReader<ReadChunkType extends WritableChunk<Values>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We wondered if you could make the bound looser, e.g. PoolableChunk or SafeCloseable. Looks like maybe not? Should PoolableChunk be a Chunk (it's not right now)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need these to return the Chunks to support usages such as LongChunkReader#transformTo. Since ownership is also being passed, it is best to advertise that they're WritableChunks.

}
}

private static long factorForTimeUnit(final TimeUnit unit) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ryan's placeholder.

jcferretti added a commit that referenced this pull request Dec 3, 2024
…protobuf repeated fields (#6434) (#6461)

The fix is taken from code in
#6065. That branch has
been in WIP state for several months and we need this fix for DHE.
fixes #6433
@nbauernfeind nbauernfeind modified the milestones: 0.37.0, 0.37.1 Dec 5, 2024
@nbauernfeind nbauernfeind marked this pull request as ready for review December 5, 2024 03:55
@nbauernfeind nbauernfeind changed the title feat: [WIP] Barrage Refactor Read/Write Chunk Factories and Default Type Mappings feat: Barrage Refactor Read/Write Chunk Factories and Default Type Mappings Dec 5, 2024
@nbauernfeind nbauernfeind changed the title feat: Barrage Refactor Read/Write Chunk Factories and Default Type Mappings feat: Barrage Refactor Read/Write Chunk Factories Dec 5, 2024
Copy link
Contributor

@cpwright cpwright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest thing I would like to see is unit tests that demonstrate coverage of the various different modes and cases that can happen for the types. There is a lot of code, and I think the testing you've done has mostly been using our clients or flight sql to show that the data makes it back and forth; but there is a lot of room in the spec and a lot of cases that we should make sure get covered for each of these types. We really do have a matrix that is pretty darn big with all of the various input types and output types in the DefaultChunkReaderFactory and DefaultChunkWriterFactory methods.

@@ -56,6 +56,8 @@ test_that("is_static returns the correct value", {
})

test_that("nrow returns the correct number of rows", {
skip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just temporary until you work with Corey to fix?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These few test skip()s are going to be committed with this PR to enable independent PRs between Corey and I.

The issue is that the cpp client cannot handle the data type explicitly uploaded by the r client in this test (as now DH assume you want to round-trip schema).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any skips should also have TODO comments so that they are easier to find later.

@@ -275,7 +275,8 @@ public <T extends WritableChunk<Values>> ChunkReader<T> newReader(
final ExpansionKernel<?> kernel =
ArrayExpansionKernel.makeExpansionKernel(chunkType, componentTypeInfo.type());
final ChunkReader<?> componentReader = newReader(componentTypeInfo, options);
return (ChunkReader<T>) new ListChunkReader<>(ListChunkReader.Mode.DENSE, 0, kernel, componentReader);
return (ChunkReader<T>) new ListChunkReader<>(ListChunkReader.Mode.VARIABLE, 0, kernel,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why hardcoded here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Primary reason is/was that the UI client (at this time) can only speak with a DH server. So, we know that the Schema field is going to be a List. I've added ListView and FixedSizeList to this case and differentiated the modes.

We don't have a way to currently test the code paths for ListView nor FixedSizeList for the web reader .. but I see that it might be useful if someone finds a way to get their paws on our reader in js or typescript and want to read from a non-deephaven arrow host.

It's also easy for me to add what looks to be correct.

@niloc132
Copy link
Member

I think this should fix #6201 - at one point I believe you had nbauernfeind@b18d660 in this branch, can you bring it back in so we can verify the fix, remove the TODO, and mark that issue as fixed in the description?

@rcaudy rcaudy modified the milestones: 0.37.1, 0.38.0 Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
8 participants