Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename chunks kwarg to partition_chunks into open_datatree method #37

Merged
merged 6 commits into from
Oct 15, 2024

Conversation

sjperkins
Copy link
Member

@sjperkins sjperkins commented Oct 15, 2024

The chunks keyword argument in open_dataset and open_datatree is reserved. In the case of open_datatree it is simply passed through to the open_dataset method.

Previously, xarray-ms allowed a custom per-partition chunking schema to be passed via the chunks kwarg in MSv2PartitionEntryPoint.open_datatree. For e.g.

dt = open_datatree("measurementset.ms", chunks={
  ("DATA_DESC_ID", 0), ("FIELD_ID", 0)): {"time": 2, "baseline": 2},
  ("DATA_DESC_ID", 0), ("FIELD_ID", 1)): {"time": 3, "baseline": 4}
}

While MSv2PartitionEntryPoint.open_datatree understands this chunking schema, other storage backends do not so there is potential for undefined behaviour.

This PR moves this functionality from chunks kwarg to the new partition_chunks kwarg which is only understood by MSv2PartitionEntryPoint.open_datatree. This separates xarray-ms's custom chunking behaviour from xarray's default chunking behaviour.

  • Test Cases covering your PR.
  • Documentation.
  • A Changelog entry in doc/source/changelog.rst.

📚 Documentation preview 📚: https://xarray-ms--37.org.readthedocs.build/en/37/

@sjperkins
Copy link
Member Author

@JSKenyon @landmanbester @o-smirnov Would you mind reviewing this general logic behind this change? The reasoning is explained further in

Copy link

@JSKenyon JSKenyon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look controversial to me, and it is better to change these things sooner rather than later.

Copy link

@landmanbester landmanbester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, seems uncontroversial and good to do this now rather than later

@sjperkins sjperkins changed the title Introduce a partition_chunking argument into open_datatree method Rename chunks kwarg to partition_chunks into open_datatree method Oct 15, 2024
@sjperkins sjperkins merged commit bce6370 into main Oct 15, 2024
1 check passed
@sjperkins sjperkins deleted the partition-chunks branch October 15, 2024 14:24
@sjperkins
Copy link
Member Author

Thanks for the review.

@sjperkins sjperkins mentioned this pull request Oct 28, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants