Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plan Comet 0.1.0 Release #369

Closed
10 of 13 tasks
andygrove opened this issue May 1, 2024 · 18 comments
Closed
10 of 13 tasks

Plan Comet 0.1.0 Release #369

andygrove opened this issue May 1, 2024 · 18 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@andygrove
Copy link
Member

andygrove commented May 1, 2024

What is the problem the feature request solves?

During the Comet public meeting this morning, there were questions about when the first official release would be. We do not really have an answer to that yet, but we can use this issue to discuss it.

Here are some ideas for milestones that we may want to achieve before creating an official source release (note that we do not necessarily need to create binary releases right away).

Nice to have:

Post-release items:

Describe the potential solution

No response

Additional context

No response

@andygrove andygrove added the enhancement New feature or request label May 1, 2024
@parthchandra
Copy link
Contributor

  • Ensure that currently supported operators and expressions are fully compatible with all supported Spark versions

I think this is good requirement for the first release

  • Achieve 100% coverage for TPC-H and/or TPC-DS benchmark with a clear performance advantage

TPC-H for sure. TPC-DS may be a stretch goal.

@andygrove
Copy link
Member Author

One benefit of creating a release is that it is a good opportunity to write a blog post to announce the release and provide an update on the status of the project, and try and encourage more people to contribute. It can also demonstrate the momentum that the project has. I suppose I am now making an argument against waiting until we have great benchmark results before making the first release. I would be interested to hear opinions on this.

@andygrove andygrove self-assigned this May 22, 2024
@andygrove andygrove changed the title Plan first release Plan Comet 0.1.0 Release May 22, 2024
@andygrove
Copy link
Member Author

I propose that we create the 0.1.0 source release as soon as we have upated the project to use the upcoming DataFusion 39.0.0 release which should be avalable around June 10.

@viirya
Copy link
Member

viirya commented May 31, 2024

+1. But we may need new arrow-rs release 52.0.0 too. Will it be released before DataFusion 39.0.0?

@andygrove
Copy link
Member Author

Here is the tracking issue for arrow-rs 52: apache/arrow-rs#5688

It should be available next week. I updated the DataFusion 39 release issue to add this to the prerequisites for the release: apache/datafusion#10517

@andygrove
Copy link
Member Author

I created a Google doc for the community to collaborate on a blog post announcing the release:

https://docs.google.com/document/d/1rnxnbi66oFr5B-OTUxtpi9pnifOxNkmvOVIRTN0BfhY/edit?usp=sharing

@advancedxy
Copy link
Contributor

I propose that we create the 0.1.0 source release as soon as we have upated the project to use the upcoming DataFusion 39.0.0 release which should be avalable around June 10.

I think this is great news. Some blocking issues are already listed in the issue content.

One issue that popped out of my mind is what about binary releases, especially publishing comet jar into Maven central?
I think it's crucial to have a published jar so that the downstream projects such as iceberg could depend on that and leverage Comet's vectorized reader. It might require a lot of extra work to release binaries so that we can skip it for Comet 0.1.0, but it should definitely be planned and hopefully we can release it in the next version.

@viirya
Copy link
Member

viirya commented Jun 4, 2024

We plan to do binary release, although it might not be able to catch up the 0.1.0 source release. Publishing to Maven repo needs more works to do. Comet involves native code, so it becomes more complicated than pure Java/Scala projects. We need to include pre-built binaries for different platforms in the published jar.

@parthchandra
Copy link
Contributor

I assume the source release will tag the repo with a release-0.1.0 tag. Even though a maven artifact would not be published, it does allow projects to build their own, or even add comet as a git submodule, based on a relatively 'stable' version.

@andygrove
Copy link
Member Author

I assume the source release will tag the repo with a release-0.1.0 tag. Even though a maven artifact would not be published, it does allow projects to build their own, or even add comet as a git submodule, based on a relatively 'stable' version.

Yes, absolutely. We (or anyone) can choose to release binary artifacts from the source release or the tag in the repo. ASF does not have any special involvement in that.

@andygrove
Copy link
Member Author

I created a milestone where we can track the priority issues for the 0.1.0 release

https://github.com/apache/datafusion-comet/milestone/1

@advancedxy
Copy link
Contributor

We plan to do binary release, although it might not be able to catch up the 0.1.0 source release.

I think we are on the same page.

Comet involves native code, so it becomes more complicated than pure Java/Scala projects. We need to include pre-built binaries for different platforms in the published jar.

yes, it might be a bit complicated. But I think the rust toolchain has done an excellent job of cross compiling. If I'm not wrong, the Makefile in this repo already has release-linux target, which builds both linux/mac(both intel and arm cpus) libs. It should be a good starting point.

Even though a maven artifact would not be published, it does allow projects to build their own, or even add comet as a git submodule, based on a relatively 'stable' version.

Of course, that could be an option. However, a maven artifact should be the convenient/easy way for people in the JVM echosystem.

@andygrove andygrove added this to the 0.1.0 milestone Jun 7, 2024
@andygrove
Copy link
Member Author

I think we are getting close to being able to release 0.1.0 now that we are using an official DataFusion release again (or will be in a few days when DF 40 is released to crates.io).

There a few remaining issues in the 0.1.0 milestone, the most important ones (IMO) being:

@parthchandra @viirya is there anything else that you think we need to address for the first source release?

@viirya
Copy link
Member

viirya commented Jul 10, 2024

is there anything else that you think we need to address for the first source release?

No. I think the above two issues are most notable at the moment.

@parthchandra
Copy link
Contributor

is there anything else that you think we need to address for the first source release?

No. I think the above two issues are most notable at the moment.

I think this is good.

@andygrove
Copy link
Member Author

@viirya @parthchandra I no longer think that it is critical to fix #387 before we release, because users can already enable shuffle, so this is just a config change from user point of view.

If there are no objections, I will plan on creating 0.1.0-rc1 next week, after review documentation to make sure all known issues are documented.

@viirya
Copy link
Member

viirya commented Jul 19, 2024

+1

@parthchandra
Copy link
Contributor

I no longer think that it is critical to fix #387 before we release, because users can already enable shuffle, so this is just a config change from user point of view.

Agreed

@andygrove andygrove modified the milestones: 0.1.0, 0.2.0 Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants