Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retire DataColumn and replace usages #5553

Merged
merged 4 commits into from
Jun 6, 2024

Conversation

rcaudy
Copy link
Member

@rcaudy rcaudy commented May 31, 2024

No description provided.

@rcaudy rcaudy added this to the 3. May 2024 milestone May 31, 2024
@rcaudy rcaudy requested a review from lbooker42 May 31, 2024 23:52
@rcaudy rcaudy self-assigned this May 31, 2024
jmao-denver
jmao-denver previously approved these changes Jun 1, 2024
Copy link
Contributor

@jmao-denver jmao-denver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Python changes LGTM. Two questions:

  1. what's the performance implication of this change?
  2. w/o reading the Java changes, can ColumnVectors be sliced (without flattening)?

… tests, and enable a pattern for "long-lived" chunk allocations that should not be tracked in the ChunkPoolReleaseTracking, applied to RowSet chunk views.
@rcaudy
Copy link
Member Author

rcaudy commented Jun 4, 2024

The Python changes LGTM. Two questions:

  1. what's the performance implication of this change?
  2. w/o reading the Java changes, can ColumnVectors be sliced (without flattening)?
  1. The Python code I changed will allocate less and be slightly faster when handling very small data sets. It should be nearly identical for large data sets.
  2. Vectors are positional; they never expose row keys as part of their interface. They can be sliced positionally without needing to flatten the source table, assuming I understood the question correctly.

@jmao-denver
Copy link
Contributor

The Python changes LGTM. Two questions:

  1. what's the performance implication of this change?
  2. w/o reading the Java changes, can ColumnVectors be sliced (without flattening)?
  1. The Python code I changed will allocate less and be slightly faster when handling very small data sets. It should be nearly identical for large data sets.
  2. Vectors are positional; they never expose row keys as part of their interface. They can be sliced positionally without needing to flatten the source table, assuming I understood the question correctly.

Yes,and I was thinking of using it as a way to give Python users the ability of reading the column data (single cell or range/slice) without needing to go through to_numpy which requires snapshotting the table first. Should we do that?

@rcaudy
Copy link
Member Author

rcaudy commented Jun 4, 2024

The Python changes LGTM. Two questions:

  1. what's the performance implication of this change?
  2. w/o reading the Java changes, can ColumnVectors be sliced (without flattening)?
  1. The Python code I changed will allocate less and be slightly faster when handling very small data sets. It should be nearly identical for large data sets.
  2. Vectors are positional; they never expose row keys as part of their interface. They can be sliced positionally without needing to flatten the source table, assuming I understood the question correctly.

Yes,and I was thinking of using it as a way to give Python users the ability of reading the column data (single cell or range/slice) without needing to go through to_numpy which requires snapshotting the table first. Should we do that?

I'm unsure. You're still dealing with a JNI access per cell if you just use get. Maybe you want a to_numpy on a slice that is aware of when it needs to use snapshots?

@jmao-denver
Copy link
Contributor

This is what I had in mind very roughly:

class Table:
    def read_column(col: str, start: int,  end: Optional[int] = None) -> Union[int, float, bool, str, np.ndarray, jpy.JType]:
        ...

chipkent
chipkent previously approved these changes Jun 4, 2024
Copy link
Member

@chipkent chipkent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plotting and py LGTM. Let me know if I need to review more.

Copy link
Contributor

@lbooker42 lbooker42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipped the python files, trusting @jmao-denver. Looks good, couple of comments for clarification.

out.write(separatorStr);
} else {
out.write("\n");
}
final Object o = cols[j].get(i);
final Object o = cols[ci].get(rowKey);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned we might make this work chunky.

Copy link
Member

@chipkent chipkent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python and plotting LGTM

@rcaudy rcaudy merged commit fc24a1f into deephaven:main Jun 6, 2024
15 checks passed
@rcaudy rcaudy deleted the rwc-delete-data-column branch June 6, 2024 15:18
@github-actions github-actions bot locked and limited conversation to collaborators Jun 6, 2024
@deephaven-internal
Copy link
Contributor

Labels indicate documentation is required. Issues for documentation have been opened:

Community: deephaven/deephaven-docs-community#227

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants