Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: use LazyTableProvider by default for write_to_deltalake for memory efficiency #3196

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rtyler
Copy link
Member

@rtyler rtyler commented Feb 8, 2025

This defaults write_to_deltalake in Python to attempt to use the LazytableProvider for a more stream-like execution. It's currently opted out for schewma evolution since that's not supported by default.

Some improvements in schema mismatch detection inside of the operations::write module are required as well

@github-actions github-actions bot added binding/python Issues for the Python package binding/rust Issues for the Rust crate labels Feb 8, 2025
Copy link

github-actions bot commented Feb 8, 2025

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

Copy link

codecov bot commented Feb 8, 2025

Codecov Report

Attention: Patch coverage is 3.17460% with 61 lines in your changes missing coverage. Please review.

Project coverage is 72.13%. Comparing base (4ef9fb3) to head (ba2845b).

Files with missing lines Patch % Lines
python/src/write.rs 0.00% 36 Missing ⚠️
python/src/lib.rs 0.00% 16 Missing ⚠️
crates/core/src/operations/write.rs 20.00% 6 Missing and 2 partials ⚠️
crates/core/src/delta_datafusion/mod.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3196      +/-   ##
==========================================
- Coverage   72.19%   72.13%   -0.06%     
==========================================
  Files         138      139       +1     
  Lines       45292    45311      +19     
  Branches    45292    45311      +19     
==========================================
- Hits        32697    32685      -12     
- Misses      10532    10552      +20     
- Partials     2063     2074      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rtyler rtyler changed the title feat: Use LazyTableProvider by default for write_to_deltalake for memory efficiency feat: use LazyTableProvider by default for write_to_deltalake for memory efficiency Feb 8, 2025
@rtyler rtyler force-pushed the feature/lazy-stream-write-2968 branch from 6c2ff51 to 8f85a64 Compare February 8, 2025 18:52
let table_schema = snapshot.input_schema()?;
let plan_schema = plan.schema().as_arrow();

if table_schema.fields.len() != plan_schema.fields.len() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an not_eq comparison on the StructTypes might be better here?

@rtyler rtyler force-pushed the feature/lazy-stream-write-2968 branch from 8f85a64 to dbaa4e1 Compare February 8, 2025 19:10
…ory efficiency

This defaults write_to_deltalake in Python to attempt to use the
LazytableProvider for a more stream-like execution. It's currently opted
out for schewma evolution since that's not supported by default.

Some improvements in schema mismatch detection inside of the
operations::write module are required as well

Signed-off-by: R. Tyler Croy <[email protected]>
@rtyler rtyler force-pushed the feature/lazy-stream-write-2968 branch from dbaa4e1 to ba2845b Compare February 8, 2025 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants