CSV writer improvements #5604

rcaudy · 2024-06-11T20:49:06Z

Change CSV writing code to use chunk-based reading code and stop allocating boxed primitives.
Correct some trivial warnings.
Fix column header separator-escaping bug.

…cating boxed primitives. Correct some trivial warnings. Fix column header separator-escaping bug.

rcaudy · 2024-06-12T16:19:51Z

Unit tests have been updated to provide full coverage for the new/changed code.
Some basic testing suggests significant speedup in cases where chunk filling is more performant than serial get.

devinrsmith

I see that using Table.columnIterators would be more expensive because they don't use a shared context and have individual keys; but that makes me wish we had a iterator-like construction that could be as efficient as this.

ColumnsIterator it = table.columnsIterator();
OfByte bytes = it.byteColumn("Foo");
OfDouble doubles = it.doubleColumn("Bar");
while (it.hasNext()) {
  byte myByte = bytes.getByte();
  double myDouble = doubles.getDouble();
  it.advance();
}
it.close()

Having something like this would make it easier IMO to splay out to these sorts of row-oriented formats without having to worry as much about low-level chunking details.

Happy to approve PR otherwise, but had a few Qs.

extensions/csv/src/main/java/io/deephaven/csv/CsvTools.java

rcaudy · 2024-06-12T18:05:10Z

I see that using Table.columnIterators would be more expensive because they don't use a shared context and have individual keys; but that makes me wish we had a iterator-like construction that could be as efficient as this.
ColumnsIterator it = table.columnsIterator();
OfByte bytes = it.byteColumn("Foo");
OfDouble doubles = it.doubleColumn("Bar");
while (it.hasNext()) {
  byte myByte = bytes.getByte();
  double myDouble = doubles.getDouble();
  it.advance();
}
it.close()
Having something like this would make it easier IMO to splay out to these sorts of row-oriented formats without having to worry as much about low-level chunking details.

Happy to approve PR otherwise, but had a few Qs.

That's a clever idea. It might be a lot easier to use for many use cases. It obviously wouldn't be a Java Iterator, though, which makes it less suitable for other use cases.

…limiter.

Change CSV writing code to use chunk-based reading code and stop allo…

5101684

…cating boxed primitives. Correct some trivial warnings. Fix column header separator-escaping bug.

rcaudy added core Core development tasks NoDocumentationNeeded csv ReleaseNotesNeeded Release notes are needed labels Jun 11, 2024

rcaudy added this to the June 2024 milestone Jun 11, 2024

rcaudy requested a review from lbooker42 June 11, 2024 20:49

rcaudy self-assigned this Jun 11, 2024

rcaudy requested review from devinrsmith and removed request for lbooker42 June 12, 2024 16:20

Use QueryConstants static import everywhere I can

52e8727

devinrsmith reviewed Jun 12, 2024

View reviewed changes

extensions/csv/src/main/java/io/deephaven/csv/CsvTools.java Show resolved Hide resolved

extensions/csv/src/main/java/io/deephaven/csv/CsvTools.java Show resolved Hide resolved

TestCsvTools.testWriteCsv: add explicit Boolean test, and '0' as a de…

b9aa9a6

…limiter.

devinrsmith previously approved these changes Jun 12, 2024

View reviewed changes

TestCsvTools.testWriteCsv: add tests for nullAsEmpty

e2a1b9d

rcaudy dismissed devinrsmith’s stale review via e2a1b9d June 12, 2024 19:26

devinrsmith previously approved these changes Jun 12, 2024

View reviewed changes

Insist on terminating our CSV files with a line separator.

a5bf8fa

rcaudy dismissed devinrsmith’s stale review via a5bf8fa June 12, 2024 19:38

rcaudy enabled auto-merge (squash) June 12, 2024 19:41

devinrsmith approved these changes Jun 12, 2024

View reviewed changes

rcaudy merged commit bf2fdec into deephaven:main Jun 12, 2024
15 checks passed

rcaudy deleted the rwc-csvwriter branch June 12, 2024 20:44

github-actions bot locked and limited conversation to collaborators Jun 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSV writer improvements #5604

CSV writer improvements #5604

rcaudy commented Jun 11, 2024

rcaudy commented Jun 12, 2024

devinrsmith left a comment

rcaudy commented Jun 12, 2024 •

edited

Loading

CSV writer improvements #5604

CSV writer improvements #5604

Conversation

rcaudy commented Jun 11, 2024

rcaudy commented Jun 12, 2024

devinrsmith left a comment

Choose a reason for hiding this comment

rcaudy commented Jun 12, 2024 • edited Loading

rcaudy commented Jun 12, 2024 •

edited

Loading