From 56528624cb8b45f7df48bbd31a038c86a3f06d85 Mon Sep 17 00:00:00 2001 From: Andy Grove Date: Mon, 7 Oct 2024 16:18:26 -0600 Subject: [PATCH] Blog post for Comet 0.3.0 (#28) * Blog post for Comet 0.3.0 * Fix documentation section --- _posts/2024-09-27-datafusion-comet-0.3.0.md | 97 +++++++++++++++++++++ 1 file changed, 97 insertions(+) create mode 100644 _posts/2024-09-27-datafusion-comet-0.3.0.md diff --git a/_posts/2024-09-27-datafusion-comet-0.3.0.md b/_posts/2024-09-27-datafusion-comet-0.3.0.md new file mode 100644 index 0000000..2633d55 --- /dev/null +++ b/_posts/2024-09-27-datafusion-comet-0.3.0.md @@ -0,0 +1,97 @@ +--- +layout: post +title: "Apache DataFusion Comet 0.3.0 Release" +date: "2024-09-27 00:00:00" +author: pmc +categories: [subprojects] +--- + + + +The Apache DataFusion PMC is pleased to announce version 0.3.0 of the [Comet](https://datafusion.apache.org/comet/) subproject. + +Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for +improved performance and efficiency without requiring any code changes. + +Comet runs on commodity hardware and aims to provide 100% compatibility with Apache Spark. Any operators or +expressions that are not fully compatible will fall back to Spark unless explicitly enabled by the user. Refer +to the [compatibility guide] for more information. + +[compatibility guide]: https://datafusion.apache.org/comet/user-guide/compatibility.html + +This release covers approximately four weeks of development work and is the result of merging 57 PRs from 12 +contributors. See the [change log] for more information. + +[change log]: https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.3.0.md + +## Release Highlights + +### Binary Releases + +Comet jar files are now published to Maven central for amd64 and arm64 architectures (Linux only). + +Files can be found at https://central.sonatype.com/search?q=org.apache.datafusion + +- Spark versions 3.3, 3.4, and 3.5 are supported. +- Scala versions 2.12 and 2.13 are supported. + +### New Features + +The following expressions are now supported natively: + +- `DateAdd` +- `DateSub` +- `ElementAt` +- `GetArrayElement` +- `ToJson` + +### Performance & Stability + +- Upgraded to DataFusion 42.0.0 +- Reduced memory overhead due to some memory leaks being fixed +- Comet will now fall back to Spark for queries that use DPP, to avoid performance regressions because Comet does + not have native support for DPP yet +- Improved performance when converting Spark columnar data to Arrow format +- Faster decimal sum and avg functions + +### Documentation Updates + +- Improved documentation for deploying Comet with Kubernetes and Helm in the [Comet Kubernetes Guide] +- More detailed architectural overview of Comet scan and execution in the [Comet Plugin Overview] in the contributor guide + +[Comet Kubernetes Guide]: https://datafusion.apache.org/comet/user-guide/kubernetes.html +[Comet Plugin Overview]: https://datafusion.apache.org/comet/contributor-guide/plugin_overview.html + +## Getting Involved + +The Comet project welcomes new contributors. We use the same [Slack and Discord] channels as the main DataFusion +project. + +[Slack and Discord]: https://datafusion.apache.org/contributor-guide/communication.html#slack-and-discord + +The easiest way to get involved is to test Comet with your current Spark jobs and file issues for any bugs or +performance regressions that you find. See the [Getting Started] guide for instructions on downloading and installing +Comet. + +[Getting Started]: https://datafusion.apache.org/comet/user-guide/installation.html + +There are also many [good first issues] waiting for contributions. + +[good first issues]: https://github.com/apache/datafusion-comet/contribute