From 555427d8130f1446ed93887ec49d7d4190f305dd Mon Sep 17 00:00:00 2001 From: Suraj Aralihalli Date: Tue, 23 Apr 2024 14:20:50 -0700 Subject: [PATCH] remove _config.yml, archives Signed-off-by: Suraj Aralihalli --- docs/_config.yml | 33 - docs/archives/CHANGELOG_0.1_to_0.5.md | 1325 -------------- docs/archives/CHANGELOG_21.06_to_21.12.md | 1237 ------------- docs/archives/CHANGELOG_22.02_to_22.12.md | 1906 --------------------- docs/archives/CHANGELOG_23.02_to_23.12.md | 1566 ----------------- 5 files changed, 6067 deletions(-) delete mode 100644 docs/_config.yml delete mode 100644 docs/archives/CHANGELOG_0.1_to_0.5.md delete mode 100644 docs/archives/CHANGELOG_21.06_to_21.12.md delete mode 100644 docs/archives/CHANGELOG_22.02_to_22.12.md delete mode 100644 docs/archives/CHANGELOG_23.02_to_23.12.md diff --git a/docs/_config.yml b/docs/_config.yml deleted file mode 100644 index e5bcfdfdc71..00000000000 --- a/docs/_config.yml +++ /dev/null @@ -1,33 +0,0 @@ -# -# Copyright (c) 2020, NVIDIA CORPORATION. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - - -remote_theme: pmarsceill/just-the-docs - -aux_links: - "RAPIDS Accelerator for Apache Spark User Guide": - - "//docs.nvidia.com/spark-rapids/user-guide/latest/index.html" - "RAPIDS Accelerator for Apache Spark Plugin on Github": - - "//github.com/nvidia/spark-rapids" - -plugins: - - jekyll-optional-front-matter # GitHub Pages - - jekyll-default-layout # GitHub Pages - - jekyll-titles-from-headings # GitHub Pages - - jekyll-readme-index # GitHub Pages - - jekyll-relative-links # GitHub Pages - -ga_tracking: G-4F2R1TSDHH diff --git a/docs/archives/CHANGELOG_0.1_to_0.5.md b/docs/archives/CHANGELOG_0.1_to_0.5.md deleted file mode 100644 index fa5412f7d8a..00000000000 --- a/docs/archives/CHANGELOG_0.1_to_0.5.md +++ /dev/null @@ -1,1325 +0,0 @@ -# Change log -Generated on 2022-01-28 - -## Release 0.5 - -### Features -||| -|:---|:---| -|[#938](https://github.com/NVIDIA/spark-rapids/issues/938)|[FEA] Have hashed shuffle match spark| -|[#1604](https://github.com/NVIDIA/spark-rapids/issues/1604)|[FEA] Support casting structs to strings | -|[#1920](https://github.com/NVIDIA/spark-rapids/issues/1920)|[FEA] Support murmur3 hashing of structs| -|[#2018](https://github.com/NVIDIA/spark-rapids/issues/2018)|[FEA] A way for user to find out the plugin version and cudf version in REPL| -|[#77](https://github.com/NVIDIA/spark-rapids/issues/77)|[FEA] Support ArrayContains| -|[#1721](https://github.com/NVIDIA/spark-rapids/issues/1721)|[FEA] build cudf jars with NVTX enabled| -|[#1782](https://github.com/NVIDIA/spark-rapids/issues/1782)|[FEA] Shim layers to support spark versions| -|[#1625](https://github.com/NVIDIA/spark-rapids/issues/1625)|[FEA] Support Decimal Casts to String and String to Decimal| -|[#166](https://github.com/NVIDIA/spark-rapids/issues/166)|[FEA] Support get_json_object| -|[#1698](https://github.com/NVIDIA/spark-rapids/issues/1698)|[FEA] Support casting structs to string| -|[#1912](https://github.com/NVIDIA/spark-rapids/issues/1912)|[FEA] Let `Scalar Pandas UDF ` support array of struct type.| -|[#1136](https://github.com/NVIDIA/spark-rapids/issues/1136)|[FEA] Audit: Script to list commits between different Spark versions/tags| -|[#1921](https://github.com/NVIDIA/spark-rapids/issues/1921)|[FEA] cudf version check should be lenient on later patch version| -|[#19](https://github.com/NVIDIA/spark-rapids/issues/19)|[FEA] Out of core sorts| - -### Performance -||| -|:---|:---| -|[#2090](https://github.com/NVIDIA/spark-rapids/issues/2090)|[FEA] Make row count estimates available to the cost-based optimizer| -|[#1341](https://github.com/NVIDIA/spark-rapids/issues/1341)|Optimize unnecessary columnar->row->columnar transitions with AQE| -|[#1558](https://github.com/NVIDIA/spark-rapids/issues/1558)|[FEA] Initialize UCX early| -|[#1633](https://github.com/NVIDIA/spark-rapids/issues/1633)|[FEA] Implement a cost-based optimizer| -|[#1727](https://github.com/NVIDIA/spark-rapids/issues/1727)|[FEA] Put RangePartitioner data path on the GPU| - -### Bugs Fixed -||| -|:---|:---| -|[#2279](https://github.com/NVIDIA/spark-rapids/issues/2279)|[BUG] Hash Partitioning can fail for very small batches| -|[#2314](https://github.com/NVIDIA/spark-rapids/issues/2314)|[BUG] v0.5.0 pre-release pytests join_test.py::test_hash_join_array FAILED on SPARK-EGX Yarn Cluster| -|[#2317](https://github.com/NVIDIA/spark-rapids/issues/2317)|[BUG] GpuColumnarToRowIterator can stop after receiving an empty batch| -|[#2244](https://github.com/NVIDIA/spark-rapids/issues/2244)|[BUG] Executors hanging when running NDS benchmarks| -|[#2278](https://github.com/NVIDIA/spark-rapids/issues/2278)|[BUG] FullOuter join can produce too many results| -|[#2220](https://github.com/NVIDIA/spark-rapids/issues/2220)|[BUG] csv_test.py::test_csv_fallback FAILED on the EMR Cluster| -|[#2225](https://github.com/NVIDIA/spark-rapids/issues/2225)|[BUG] GpuSort fails on tables containing arrays.| -|[#2232](https://github.com/NVIDIA/spark-rapids/issues/2232)|[BUG] hash_aggregate_test.py::test_hash_grpby_pivot FAILED on the Databricks Cluster| -|[#2231](https://github.com/NVIDIA/spark-rapids/issues/2231)|[BUG]string_test.py::test_re_replace FAILED on the Dataproc Cluster| -|[#2042](https://github.com/NVIDIA/spark-rapids/issues/2042)|[BUG] NDS q14a fails with "GpuColumnarToRow does not implement doExecuteBroadcast"| -|[#2203](https://github.com/NVIDIA/spark-rapids/issues/2203)|[BUG] Spark nightly cache tests fail with -- master flag| -|[#2230](https://github.com/NVIDIA/spark-rapids/issues/2230)|[BUG] qa_nightly_select_test.py::test_select FAILED on the Dataproc Cluster| -|[#1711](https://github.com/NVIDIA/spark-rapids/issues/1711)|[BUG] find a way to stop allocating from RMM on the shuffle-client thread| -|[#2109](https://github.com/NVIDIA/spark-rapids/issues/2109)|[BUG] Fix high priority violations detected by code analysis tools| -|[#2217](https://github.com/NVIDIA/spark-rapids/issues/2217)|[BUG] qa_nightly_select_test failure in test_select | -|[#2127](https://github.com/NVIDIA/spark-rapids/issues/2127)|[BUG] Parsing with two-digit year should fall back to CPU| -|[#2078](https://github.com/NVIDIA/spark-rapids/issues/2078)|[BUG] java.lang.ArithmeticException: divide by zero when spark.sql.ansi.enabled=true| -|[#2048](https://github.com/NVIDIA/spark-rapids/issues/2048)|[BUG] split function+ repartition result in "ai.rapids.cudf.CudaException: device-side assert triggered"| -|[#2036](https://github.com/NVIDIA/spark-rapids/issues/2036)|[BUG] Stackoverflow when writing wide parquet files.| -|[#1973](https://github.com/NVIDIA/spark-rapids/issues/1973)|[BUG] generate_expr_test FAILED on Dataproc Cluster| -|[#2079](https://github.com/NVIDIA/spark-rapids/issues/2079)|[BUG] koalas.sql fails with java.lang.ArrayIndexOutOfBoundsException| -|[#217](https://github.com/NVIDIA/spark-rapids/issues/217)|[BUG] CudaUtil should be removed| -|[#1550](https://github.com/NVIDIA/spark-rapids/issues/1550)|[BUG] The ORC output data of a query is not readable| -|[#2074](https://github.com/NVIDIA/spark-rapids/issues/2074)|[BUG] Intermittent NPE in RapidsBufferCatalog when running test suite| -|[#2027](https://github.com/NVIDIA/spark-rapids/issues/2027)|[BUG] udf_cudf_test.py integration tests fail | -|[#1899](https://github.com/NVIDIA/spark-rapids/issues/1899)|[BUG] Some queries fail when cost-based optimizations are enabled| -|[#1914](https://github.com/NVIDIA/spark-rapids/issues/1914)|[BUG] Add in float, double, timestamp, and date support to murmur3| -|[#2014](https://github.com/NVIDIA/spark-rapids/issues/2014)|[BUG] earlyStart option added in 0.5 can cause errors when starting UCX| -|[#1984](https://github.com/NVIDIA/spark-rapids/issues/1984)|[BUG] NDS q58 Decimal scale (59) cannot be greater than precision (38).| -|[#2001](https://github.com/NVIDIA/spark-rapids/issues/2001)|[BUG] RapidsShuffleManager didn't pass `dirs` to `getBlockData` from a wrapped `ShuffleBlockResolver`| -|[#1797](https://github.com/NVIDIA/spark-rapids/issues/1797)|[BUG] occasional crashes in CI| -|[#1861](https://github.com/NVIDIA/spark-rapids/issues/1861)|Encountered column data outside the range of input buffer| -|[#1905](https://github.com/NVIDIA/spark-rapids/issues/1905)|[BUG] Large concat task time in GpuShuffleCoalesce with pinned memory pool| -|[#1638](https://github.com/NVIDIA/spark-rapids/issues/1638)|[BUG] Tests `test_window_aggs_for_rows_collect_list` fails when there are null values in columns.| -|[#1864](https://github.com/NVIDIA/spark-rapids/issues/1864)|[BUG]HostColumnarToGPU inefficient when only doing count()| -|[#1862](https://github.com/NVIDIA/spark-rapids/issues/1862)|[BUG] spark 3.2.0-snapshot integration test failed due to conf change| -|[#1844](https://github.com/NVIDIA/spark-rapids/issues/1844)|[BUG] branch-0.5 nightly IT FAILED on the The mortgage ETL test "Could not read footer for file: file:/xxx/xxx.snappy.parquet"| -|[#1627](https://github.com/NVIDIA/spark-rapids/issues/1627)|[BUG] GDS exception when restoring spilled buffer| -|[#1802](https://github.com/NVIDIA/spark-rapids/issues/1802)|[BUG] Many decimal integration test failures for 0.5| - -### PRs -||| -|:---|:---| -|[#2326](https://github.com/NVIDIA/spark-rapids/pull/2326)|Update changelog for 0.5.0 release| -|[#2316](https://github.com/NVIDIA/spark-rapids/pull/2316)|Update doc to note that single quoted json strings are not ok| -|[#2319](https://github.com/NVIDIA/spark-rapids/pull/2319)|Disable hash partitioning on arrays| -|[#2318](https://github.com/NVIDIA/spark-rapids/pull/2318)|Fix ColumnarToRowIterator handling of empty batches| -|[#2304](https://github.com/NVIDIA/spark-rapids/pull/2304)|Update CHANGELOG.md| -|[#2301](https://github.com/NVIDIA/spark-rapids/pull/2301)|Update doc to reflect nanosleep problem with 460.32.03| -|[#2298](https://github.com/NVIDIA/spark-rapids/pull/2298)|Update changelog for v0.5.0 release [skip ci]| -|[#2293](https://github.com/NVIDIA/spark-rapids/pull/2293)|update cudf version to 0.19.2| -|[#2289](https://github.com/NVIDIA/spark-rapids/pull/2289)|Update docs to warn against 450.80.02 driver with 10.x toolkit| -|[#2285](https://github.com/NVIDIA/spark-rapids/pull/2285)|Require single batch for full outer join streaming| -|[#2281](https://github.com/NVIDIA/spark-rapids/pull/2281)|Remove download section for unreleased 0.4.2| -|[#2264](https://github.com/NVIDIA/spark-rapids/pull/2264)|Add spark312 and spark320 versions of cache serializer| -|[#2254](https://github.com/NVIDIA/spark-rapids/pull/2254)|updated gcp docs with custom dataproc image instructions| -|[#2247](https://github.com/NVIDIA/spark-rapids/pull/2247)|Allow specifying a superclass for non-GPU execs| -|[#2235](https://github.com/NVIDIA/spark-rapids/pull/2235)|Fix distributed cache to read requested schema | -|[#2261](https://github.com/NVIDIA/spark-rapids/pull/2261)|Make CBO row count test more robust| -|[#2237](https://github.com/NVIDIA/spark-rapids/pull/2237)|update cudf version to 0.19.1| -|[#2240](https://github.com/NVIDIA/spark-rapids/pull/2240)|Get the correct 'PIPESTATUS' in bash [skip ci]| -|[#2242](https://github.com/NVIDIA/spark-rapids/pull/2242)|Add shuffle doc section on the periodicGC configuration| -|[#2251](https://github.com/NVIDIA/spark-rapids/pull/2251)|Fix issue when out of core sorting nested data types| -|[#2204](https://github.com/NVIDIA/spark-rapids/pull/2204)|Run nightly tests for ParquetCachedBatchSerializer| -|[#2245](https://github.com/NVIDIA/spark-rapids/pull/2245)|Fix pivot bug for decimalType| -|[#2093](https://github.com/NVIDIA/spark-rapids/pull/2093)|Initial implementation of row count estimates in cost-based optimizer| -|[#2188](https://github.com/NVIDIA/spark-rapids/pull/2188)|Support GPU broadcast exchange reuse to feed CPU BHJ when AQE is enabled| -|[#2227](https://github.com/NVIDIA/spark-rapids/pull/2227)|ParquetCachedBatchSerializer broadcast AllConfs instead of SQLConf to fix distributed mode| -|[#2223](https://github.com/NVIDIA/spark-rapids/pull/2223)|Adds subquery aggregate tests from SPARK-31620| -|[#2222](https://github.com/NVIDIA/spark-rapids/pull/2222)|Remove groupId already specified in parent pom| -|[#2209](https://github.com/NVIDIA/spark-rapids/pull/2209)|Fixed a few issues with out of core sort| -|[#2218](https://github.com/NVIDIA/spark-rapids/pull/2218)|Fix incorrect RegExpReplace children handling on Spark 3.1+| -|[#2207](https://github.com/NVIDIA/spark-rapids/pull/2207)|fix batch size default values in the tuning guide| -|[#2208](https://github.com/NVIDIA/spark-rapids/pull/2208)|Revert "add nightly cache tests (#2083)"| -|[#2206](https://github.com/NVIDIA/spark-rapids/pull/2206)|Fix shim301db build| -|[#2192](https://github.com/NVIDIA/spark-rapids/pull/2192)|Fix index-based access to the head elements| -|[#2210](https://github.com/NVIDIA/spark-rapids/pull/2210)|Avoid redundant collection conversions| -|[#2190](https://github.com/NVIDIA/spark-rapids/pull/2190)|JNI fixes for StringWordCount native UDF example| -|[#2086](https://github.com/NVIDIA/spark-rapids/pull/2086)|Updating documentation for data format support| -|[#2172](https://github.com/NVIDIA/spark-rapids/pull/2172)|Remove easy unused symbols| -|[#2089](https://github.com/NVIDIA/spark-rapids/pull/2089)|Update PandasUDF doc| -|[#2195](https://github.com/NVIDIA/spark-rapids/pull/2195)|fix cudf 0.19.0 download link [skip ci]| -|[#2175](https://github.com/NVIDIA/spark-rapids/pull/2175)|Branch 0.5 doc update| -|[#2168](https://github.com/NVIDIA/spark-rapids/pull/2168)|Simplify GpuExpressions w/ withResourceIfAllowed| -|[#2055](https://github.com/NVIDIA/spark-rapids/pull/2055)|Support PivotFirst| -|[#2183](https://github.com/NVIDIA/spark-rapids/pull/2183)|GpuParquetScan#readBufferToTable remove dead code| -|[#2129](https://github.com/NVIDIA/spark-rapids/pull/2129)|Fall back to CPU when parsing two-digit years| -|[#2083](https://github.com/NVIDIA/spark-rapids/pull/2083)|add nightly cache tests| -|[#2151](https://github.com/NVIDIA/spark-rapids/pull/2151)|add corresponding close call for HostMemoryOutputStream| -|[#2169](https://github.com/NVIDIA/spark-rapids/pull/2169)|Work around bug in Spark for integration test| -|[#2130](https://github.com/NVIDIA/spark-rapids/pull/2130)|Fix divide-by-zero in GpuAverage with ansi mode| -|[#2149](https://github.com/NVIDIA/spark-rapids/pull/2149)|Auto generate the supported types for the file formats| -|[#2072](https://github.com/NVIDIA/spark-rapids/pull/2072)|Disable CSV parsing by default and update tests to better show what is left| -|[#2157](https://github.com/NVIDIA/spark-rapids/pull/2157)|fix merge conflict for 0.4.2 [skip ci]| -|[#2144](https://github.com/NVIDIA/spark-rapids/pull/2144)|Allow array and struct types to pass thru when doing join| -|[#2145](https://github.com/NVIDIA/spark-rapids/pull/2145)|Avoid GPU shuffle for round-robin of unsortable types| -|[#2021](https://github.com/NVIDIA/spark-rapids/pull/2021)|Add in support for murmur3 hashing of structs| -|[#2128](https://github.com/NVIDIA/spark-rapids/pull/2128)|Add in Partition type check support| -|[#2116](https://github.com/NVIDIA/spark-rapids/pull/2116)|Add dynamic Spark configuration for Databricks| -|[#2132](https://github.com/NVIDIA/spark-rapids/pull/2132)|Log plugin and cudf versions on startup| -|[#2135](https://github.com/NVIDIA/spark-rapids/pull/2135)|Disable Spark 3.2 shim by default| -|[#2125](https://github.com/NVIDIA/spark-rapids/pull/2125)|enable auto-merge from 0.5 to 0.6 [skip ci]| -|[#2120](https://github.com/NVIDIA/spark-rapids/pull/2120)|Materialize Stream before serialization| -|[#2119](https://github.com/NVIDIA/spark-rapids/pull/2119)|Add more comprehensive documentation on supported date formats| -|[#1717](https://github.com/NVIDIA/spark-rapids/pull/1717)|Decimal32 support| -|[#2114](https://github.com/NVIDIA/spark-rapids/pull/2114)|Modified the Download page for 0.4.1 and updated doc to point to K8s guide| -|[#2106](https://github.com/NVIDIA/spark-rapids/pull/2106)|Fix some buffer leaks| -|[#2097](https://github.com/NVIDIA/spark-rapids/pull/2097)|fix the bound row project empty issue in row frame| -|[#2099](https://github.com/NVIDIA/spark-rapids/pull/2099)|Remove verbose log prints to make the build/test log clean| -|[#2105](https://github.com/NVIDIA/spark-rapids/pull/2105)|Cleanup prior Spark sessions in tests consistently| -|[#2104](https://github.com/NVIDIA/spark-rapids/pull/2104)| Clone apache spark source code to parse the git commit IDs| -|[#2095](https://github.com/NVIDIA/spark-rapids/pull/2095)|fix refcount when materializing device buffer from GDS| -|[#2100](https://github.com/NVIDIA/spark-rapids/pull/2100)|[BUG] add wget for fetching conda [skip ci]| -|[#2096](https://github.com/NVIDIA/spark-rapids/pull/2096)|Adjust images for integration tests| -|[#2094](https://github.com/NVIDIA/spark-rapids/pull/2094)|Changed name of parquet files for Mortgage ETL Integration test| -|[#2035](https://github.com/NVIDIA/spark-rapids/pull/2035)|Accelerate data transfer for map Pandas UDF plan| -|[#2050](https://github.com/NVIDIA/spark-rapids/pull/2050)|stream shuffle buffers from GDS to UCX| -|[#2084](https://github.com/NVIDIA/spark-rapids/pull/2084)|Enable ORC write by default| -|[#2088](https://github.com/NVIDIA/spark-rapids/pull/2088)|Upgrade ScalaTest plugin to respect JAVA_HOME| -|[#1932](https://github.com/NVIDIA/spark-rapids/pull/1932)|Create a getting started on K8s page| -|[#2080](https://github.com/NVIDIA/spark-rapids/pull/2080)|Improve error message after failed RMM shutdown| -|[#2064](https://github.com/NVIDIA/spark-rapids/pull/2064)|Optimize unnecessary columnar->row->columnar transitions with AQE| -|[#2025](https://github.com/NVIDIA/spark-rapids/pull/2025)|Update the doc for pandas udf on databricks| -|[#2059](https://github.com/NVIDIA/spark-rapids/pull/2059)|Add the flag 'TEST_TYPE' to avoid integration tests silently skipping some test cases| -|[#2075](https://github.com/NVIDIA/spark-rapids/pull/2075)|Remove debug println from CBO test| -|[#2046](https://github.com/NVIDIA/spark-rapids/pull/2046)|support casting Decimal to String| -|[#1812](https://github.com/NVIDIA/spark-rapids/pull/1812)|allow spilled buffers to be unspilled| -|[#2061](https://github.com/NVIDIA/spark-rapids/pull/2061)|Run the pandas udf using cudf on Databricks| -|[#1893](https://github.com/NVIDIA/spark-rapids/pull/1893)|Plug-in support for get_json_object| -|[#2044](https://github.com/NVIDIA/spark-rapids/pull/2044)|Use partition for GPU hash partitioning| -|[#1954](https://github.com/NVIDIA/spark-rapids/pull/1954)|Fix CBO bug where incompatible plans were produced with AQE on| -|[#2049](https://github.com/NVIDIA/spark-rapids/pull/2049)|Remove incompatable int overflow checking| -|[#2056](https://github.com/NVIDIA/spark-rapids/pull/2056)|Remove Spark 3.2 from premerge and nightly CI run| -|[#1814](https://github.com/NVIDIA/spark-rapids/pull/1814)|Struct to string casting functionality| -|[#2037](https://github.com/NVIDIA/spark-rapids/pull/2037)|Fix warnings from use of deprecated cudf methods| -|[#2033](https://github.com/NVIDIA/spark-rapids/pull/2033)|Bump up pre-merge OS from ubuntu 16 to ubuntu 18 [skip ci]| -|[#1883](https://github.com/NVIDIA/spark-rapids/pull/1883)|Enable sort for single-level nesting struct columns on GPU| -|[#2016](https://github.com/NVIDIA/spark-rapids/pull/2016)|Refactor logic for parallel testing| -|[#2022](https://github.com/NVIDIA/spark-rapids/pull/2022)|Update order by to not load native libraries when sorting| -|[#2017](https://github.com/NVIDIA/spark-rapids/pull/2017)|Add in murmur3 support for float, double, date and timestamp| -|[#1981](https://github.com/NVIDIA/spark-rapids/pull/1981)|Fix GpuSize| -|[#1999](https://github.com/NVIDIA/spark-rapids/pull/1999)|support casting string to decimal| -|[#2006](https://github.com/NVIDIA/spark-rapids/pull/2006)|Enable windowed `collect_list` by default| -|[#2000](https://github.com/NVIDIA/spark-rapids/pull/2000)|Use Spark's HybridRowQueue to avoid MemoryConsumer API shim| -|[#2015](https://github.com/NVIDIA/spark-rapids/pull/2015)|Fix bug where rkey buffer is getting advanced after the first handshake| -|[#2007](https://github.com/NVIDIA/spark-rapids/pull/2007)|Fix unknown column name error when filtering ORC file with no names| -|[#2005](https://github.com/NVIDIA/spark-rapids/pull/2005)|Update to new is_before_spark_311 function name| -|[#1944](https://github.com/NVIDIA/spark-rapids/pull/1944)|Support running scalar pandas UDF with array type.| -|[#1991](https://github.com/NVIDIA/spark-rapids/pull/1991)|Fixes creation of invalid DecimalType in GpuDivide.tagExprForGpu| -|[#1958](https://github.com/NVIDIA/spark-rapids/pull/1958)|Support legacy behavior of parameterless count | -|[#1919](https://github.com/NVIDIA/spark-rapids/pull/1919)|Add support for Structs for UnionExec| -|[#2002](https://github.com/NVIDIA/spark-rapids/pull/2002)|Pass dirs to getBlockData for a wrapped shuffle resolver| -|[#1983](https://github.com/NVIDIA/spark-rapids/pull/1983)|document building against different CUDA Toolkit versions| -|[#1994](https://github.com/NVIDIA/spark-rapids/pull/1994)|Merge 0.4 to 0.5 [skip ci]| -|[#1982](https://github.com/NVIDIA/spark-rapids/pull/1982)|Update ORC pushdown filter building to latest Spark logic| -|[#1978](https://github.com/NVIDIA/spark-rapids/pull/1978)|Add audit script to list commits from Spark| -|[#1976](https://github.com/NVIDIA/spark-rapids/pull/1976)|Temp fix for parquet write changes| -|[#1970](https://github.com/NVIDIA/spark-rapids/pull/1970)|add maven profiles for supported CUDA versions| -|[#1951](https://github.com/NVIDIA/spark-rapids/pull/1951)|Branch 0.5 doc remove numpartitions| -|[#1967](https://github.com/NVIDIA/spark-rapids/pull/1967)|Update FAQ for Dataset API and format supported versions| -|[#1972](https://github.com/NVIDIA/spark-rapids/pull/1972)|support GpuSize| -|[#1966](https://github.com/NVIDIA/spark-rapids/pull/1966)|add xml report for codecov| -|[#1955](https://github.com/NVIDIA/spark-rapids/pull/1955)|Fix typo in Arrow optimization config| -|[#1956](https://github.com/NVIDIA/spark-rapids/pull/1956)|Fix NPE in plugin shutdown| -|[#1930](https://github.com/NVIDIA/spark-rapids/pull/1930)|Relax cudf version check for patch-level versions| -|[#1787](https://github.com/NVIDIA/spark-rapids/pull/1787)|support distributed file path in cloud environment| -|[#1961](https://github.com/NVIDIA/spark-rapids/pull/1961)|change premege GPU_TYPE from secret to global env [skip ci]| -|[#1957](https://github.com/NVIDIA/spark-rapids/pull/1957)|Update Spark 3.1.2 shim for float upcast behavior| -|[#1889](https://github.com/NVIDIA/spark-rapids/pull/1889)|Decimal DIV changes | -|[#1947](https://github.com/NVIDIA/spark-rapids/pull/1947)|Move doc of Pandas UDF to additional-functionality| -|[#1938](https://github.com/NVIDIA/spark-rapids/pull/1938)|Add spark.executor.resource.gpu.amount=1 to YARN and K8s docs| -|[#1937](https://github.com/NVIDIA/spark-rapids/pull/1937)|Fix merge conflict with branch-0.4| -|[#1878](https://github.com/NVIDIA/spark-rapids/pull/1878)|spillable cache for GpuCartesianRDD| -|[#1843](https://github.com/NVIDIA/spark-rapids/pull/1843)|Refactor GpuGenerateExec and Explode| -|[#1933](https://github.com/NVIDIA/spark-rapids/pull/1933)|Split DB scripts to make them common for the build and IT pipeline| -|[#1935](https://github.com/NVIDIA/spark-rapids/pull/1935)|Update Alias SQL quoting and float-to-timestamp casting to match Spark 3.2| -|[#1926](https://github.com/NVIDIA/spark-rapids/pull/1926)|Consolidate RAT settings in parent pom| -|[#1918](https://github.com/NVIDIA/spark-rapids/pull/1918)|Minor code cleanup in dateTImeExpressions| -|[#1906](https://github.com/NVIDIA/spark-rapids/pull/1906)|Remove get call on timeZoneId| -|[#1908](https://github.com/NVIDIA/spark-rapids/pull/1908)|Remove the Scala version of Mortgage ETL tests from nightly test| -|[#1894](https://github.com/NVIDIA/spark-rapids/pull/1894)|Modified Download Page to re-order the items and change the format of download links| -|[#1909](https://github.com/NVIDIA/spark-rapids/pull/1909)|Avoid pinned memory for shuffle host buffers| -|[#1891](https://github.com/NVIDIA/spark-rapids/pull/1891)|Connect UCX endpoints early during app startup| -|[#1877](https://github.com/NVIDIA/spark-rapids/pull/1877)|remove docker build in pre-merge [skip ci]| -|[#1830](https://github.com/NVIDIA/spark-rapids/pull/1830)|Enable the tests for collect over window.| -|[#1882](https://github.com/NVIDIA/spark-rapids/pull/1882)|GpuArrowColumnarBatchBuilder retains the references of ArrowBuf until HostToGpuCoalesceIterator put them into device| -|[#1868](https://github.com/NVIDIA/spark-rapids/pull/1868)|Increase row limit when doing count() for HostColumnarToGpu | -|[#1855](https://github.com/NVIDIA/spark-rapids/pull/1855)|Expose row count statistics in GpuShuffleExchangeExec| -|[#1875](https://github.com/NVIDIA/spark-rapids/pull/1875)|Fix merge conflict with branch-0.4| -|[#1841](https://github.com/NVIDIA/spark-rapids/pull/1841)|Add in support for DateAddInterval| -|[#1869](https://github.com/NVIDIA/spark-rapids/pull/1869)|Fix tests for Spark 3.2.0 shim| -|[#1858](https://github.com/NVIDIA/spark-rapids/pull/1858)|fix shuffle manager doc on ucx library path| -|[#1836](https://github.com/NVIDIA/spark-rapids/pull/1836)|Add shim for Spark 3.1.2| -|[#1852](https://github.com/NVIDIA/spark-rapids/pull/1852)|Fix Part Suite Tests| -|[#1616](https://github.com/NVIDIA/spark-rapids/pull/1616)|Cost-based optimizer| -|[#1834](https://github.com/NVIDIA/spark-rapids/pull/1834)|Add shim for Spark 3.0.3| -|[#1839](https://github.com/NVIDIA/spark-rapids/pull/1839)|Refactor join code to reduce duplicated code| -|[#1848](https://github.com/NVIDIA/spark-rapids/pull/1848)|Fix merge conflict with branch-0.4| -|[#1796](https://github.com/NVIDIA/spark-rapids/pull/1796)|Have most of range partitioning run on the GPU| -|[#1845](https://github.com/NVIDIA/spark-rapids/pull/1845)|Fix fails on the mortgage ETL test| -|[#1829](https://github.com/NVIDIA/spark-rapids/pull/1829)|Cleanup unused Jenkins files and scripts| -|[#1704](https://github.com/NVIDIA/spark-rapids/pull/1704)|Create a shim for Spark 3.2.0 development| -|[#1838](https://github.com/NVIDIA/spark-rapids/pull/1838)|Make databricks build.sh more convenient for dev| -|[#1835](https://github.com/NVIDIA/spark-rapids/pull/1835)|Fix merge conflict with branch-0.4| -|[#1808](https://github.com/NVIDIA/spark-rapids/pull/1808)|Update mortgage tests to support reading multiple dataset formats| -|[#1822](https://github.com/NVIDIA/spark-rapids/pull/1822)|Fix conflict 0.4 to 0.5| -|[#1807](https://github.com/NVIDIA/spark-rapids/pull/1807)|Fix merge conflict between branch-0.4 and branch-0.5| -|[#1788](https://github.com/NVIDIA/spark-rapids/pull/1788)|Spill metrics everywhere| -|[#1719](https://github.com/NVIDIA/spark-rapids/pull/1719)|Add in out of core sort| -|[#1728](https://github.com/NVIDIA/spark-rapids/pull/1728)|Skip RAPIDS accelerated Java UDF tests if UDF fails to load| -|[#1689](https://github.com/NVIDIA/spark-rapids/pull/1689)|Update docs for plugin 0.5.0-SNAPSHOT and cudf 0.19-SNAPSHOT| -|[#1682](https://github.com/NVIDIA/spark-rapids/pull/1682)|init CI/CD dependencies branch-0.5| - -## Release 0.4.1 - -### Bugs Fixed -||| -|:---|:---| -|[#1985](https://github.com/NVIDIA/spark-rapids/issues/1985)|[BUG] broadcast exchange can fail on 0.4| - -### PRs -||| -|:---|:---| -|[#1995](https://github.com/NVIDIA/spark-rapids/pull/1995)|update changelog 0.4.1 [skip ci]| -|[#1990](https://github.com/NVIDIA/spark-rapids/pull/1990)|Prepare for v0.4.1 release| -|[#1988](https://github.com/NVIDIA/spark-rapids/pull/1988)|broadcast exchange can fail when job group set| - -## Release 0.4 - -### Features -||| -|:---|:---| -|[#1773](https://github.com/NVIDIA/spark-rapids/issues/1773)|[FEA] Spark 3.0.2 release support| -|[#80](https://github.com/NVIDIA/spark-rapids/issues/80)|[FEA] Support the struct SQL function| -|[#76](https://github.com/NVIDIA/spark-rapids/issues/76)|[FEA] Support CreateArray| -|[#1635](https://github.com/NVIDIA/spark-rapids/issues/1635)|[FEA] RAPIDS accelerated Java UDF| -|[#1333](https://github.com/NVIDIA/spark-rapids/issues/1333)|[FEA] Support window operations on Decimal| -|[#1419](https://github.com/NVIDIA/spark-rapids/issues/1419)|[FEA] Support GPU accelerated UDF alternative for higher order function "aggregate" over window| -|[#1580](https://github.com/NVIDIA/spark-rapids/issues/1580)|[FEA] Support Decimal for ParquetCachedBatchSerializer| -|[#1600](https://github.com/NVIDIA/spark-rapids/issues/1600)|[FEA] Support ScalarSubquery| -|[#1072](https://github.com/NVIDIA/spark-rapids/issues/1072)|[FEA] Support for a custom DataSource V2 which supplies Arrow data| -|[#906](https://github.com/NVIDIA/spark-rapids/issues/906)|[FEA] Clarify query explanation to directly state what will run on GPU| -|[#1335](https://github.com/NVIDIA/spark-rapids/issues/1335)|[FEA] Support CollectLimitExec for decimal| -|[#1485](https://github.com/NVIDIA/spark-rapids/issues/1485)|[FEA] Decimal Support for Parquet Write| -|[#1329](https://github.com/NVIDIA/spark-rapids/issues/1329)|[FEA] Decimal support for multiply int div, add, subtract and null safe equals| -|[#1351](https://github.com/NVIDIA/spark-rapids/issues/1351)|[FEA] Execute UDFs that provide a RAPIDS execution path| -|[#1330](https://github.com/NVIDIA/spark-rapids/issues/1330)|[FEA] Support Decimal Casts| -|[#1353](https://github.com/NVIDIA/spark-rapids/issues/1353)|[FEA] Example of RAPIDS UDF using custom GPU code| -|[#1487](https://github.com/NVIDIA/spark-rapids/issues/1487)|[FEA] Change spark 3.1.0 to 3.1.1| -|[#1334](https://github.com/NVIDIA/spark-rapids/issues/1334)|[FEA] Add support for count aggregate on decimal| -|[#1325](https://github.com/NVIDIA/spark-rapids/issues/1325)|[FEA] Add in join support for decimal| -|[#1326](https://github.com/NVIDIA/spark-rapids/issues/1326)|[FEA] Add in Broadcast support for decimal values| -|[#37](https://github.com/NVIDIA/spark-rapids/issues/37)|[FEA] round and bround SQL functions| -|[#78](https://github.com/NVIDIA/spark-rapids/issues/78)|[FEA] Support CreateNamedStruct function| -|[#1331](https://github.com/NVIDIA/spark-rapids/issues/1331)|[FEA] UnionExec and ExpandExec support for decimal| -|[#1332](https://github.com/NVIDIA/spark-rapids/issues/1332)|[FEA] Support CaseWhen, Coalesce and IfElse for decimal| -|[#937](https://github.com/NVIDIA/spark-rapids/issues/937)|[FEA] have murmur3 hash function that matches exactly with spark| -|[#1324](https://github.com/NVIDIA/spark-rapids/issues/1324)|[FEA] Support Parquet Read of Decimal FIXED_LENGTH_BYTE_ARRAY| -|[#1428](https://github.com/NVIDIA/spark-rapids/issues/1428)|[FEA] Add support for unary decimal operations abs, floor, ceil, unary - and unary +| -|[#1375](https://github.com/NVIDIA/spark-rapids/issues/1375)|[FEA] Add log statement for what the concurrentGpuTasks tasks is set to on executor startup| -|[#1352](https://github.com/NVIDIA/spark-rapids/issues/1352)|[FEA] Example of RAPIDS UDF using cudf Java APIs| -|[#1328](https://github.com/NVIDIA/spark-rapids/issues/1328)|[FEA] Support sorting and shuffle of decimal| -|[#1316](https://github.com/NVIDIA/spark-rapids/issues/1316)|[FEA] Support simple DECIMAL aggregates| - -### Performance -||| -|:---|:---| -|[#1435](https://github.com/NVIDIA/spark-rapids/issues/1435)|[FEA]Improve the file reading by using local file caching| -|[#1738](https://github.com/NVIDIA/spark-rapids/issues/1738)|[FEA] Reduce regex usage in CAST string to date/timestamp| -|[#987](https://github.com/NVIDIA/spark-rapids/issues/987)|[FEA] Optimize CAST from string to temporal types by using cuDF is_timestamp function| -|[#1594](https://github.com/NVIDIA/spark-rapids/issues/1594)|[FEA] RAPIDS accelerated ScalaUDF| -|[#103](https://github.com/NVIDIA/spark-rapids/issues/103)|[FEA] GPU version of TakeOrderedAndProject| -|[#1024](https://github.com/NVIDIA/spark-rapids/issues/1024)|Cleanup RAPIDS transport calls to `receive`| -|[#1366](https://github.com/NVIDIA/spark-rapids/issues/1366)|Seeing performance differences of multi-threaded/coalesce/perfile Parquet reader type for a single file| -|[#1200](https://github.com/NVIDIA/spark-rapids/issues/1200)|[FEA] Accelerate the scan speed for coalescing parquet reader when reading files from multiple partitioned folders| - -### Bugs Fixed -||| -|:---|:---| -|[#1885](https://github.com/NVIDIA/spark-rapids/issues/1885)|[BUG] natural join on string key results in a data frame with spurious NULLs| -|[#1785](https://github.com/NVIDIA/spark-rapids/issues/1785)|[BUG] Rapids pytest integration tests FAILED on Yarn cluster with unrecognized arguments: `--std_input_path=src/test/resources/`| -|[#999](https://github.com/NVIDIA/spark-rapids/issues/999)|[BUG] test_multi_types_window_aggs_for_rows_lead_lag fails against Spark 3.1.0| -|[#1818](https://github.com/NVIDIA/spark-rapids/issues/1818)|[BUG] unmoored doc comment warnings in GpuCast| -|[#1817](https://github.com/NVIDIA/spark-rapids/issues/1817)|[BUG] Developer build with local modifications fails during verify phase| -|[#1644](https://github.com/NVIDIA/spark-rapids/issues/1644)|[BUG] test_window_aggregate_udf_array_from_python fails on databricks| -|[#1771](https://github.com/NVIDIA/spark-rapids/issues/1771)|[BUG] Databricks AWS CI/CD failing to create cluster| -|[#1157](https://github.com/NVIDIA/spark-rapids/issues/1157)|[BUG] Fix regression supporting to_date on GPU with Spark 3.1.0| -|[#716](https://github.com/NVIDIA/spark-rapids/issues/716)|[BUG] Cast String to TimeStamp issues| -|[#1117](https://github.com/NVIDIA/spark-rapids/issues/1117)|[BUG] CAST string to date returns wrong values for dates with out-of-range values| -|[#1670](https://github.com/NVIDIA/spark-rapids/issues/1670)|[BUG] Some TPC-DS queries fail with AQE when decimal types enabled| -|[#1730](https://github.com/NVIDIA/spark-rapids/issues/1730)|[BUG] Range Partitioning can crash when processing is in the order-by| -|[#1726](https://github.com/NVIDIA/spark-rapids/issues/1726)|[BUG] java url decode test failing on databricks, emr, and dataproc| -|[#1651](https://github.com/NVIDIA/spark-rapids/issues/1651)|[BUG] GDS exception when writing shuffle file| -|[#1702](https://github.com/NVIDIA/spark-rapids/issues/1702)|[BUG] check all tests marked xfail for Spark 3.1.1| -|[#575](https://github.com/NVIDIA/spark-rapids/issues/575)|[BUG] Spark 3.1 FAILED join_test.py::test_broadcast_join_mixed[FullOuter][IGNORE_ORDER] failed| -|[#577](https://github.com/NVIDIA/spark-rapids/issues/577)|[BUG] Spark 3.1 log arithmetic functions fail| -|[#1541](https://github.com/NVIDIA/spark-rapids/issues/1541)|[BUG] Tests fail in integration in distributed mode after allowing nested types through in sort and shuffle| -|[#1626](https://github.com/NVIDIA/spark-rapids/issues/1626)|[BUG] TPC-DS-like query 77 at scale=3TB fails with maxResultSize exceeded error| -|[#1576](https://github.com/NVIDIA/spark-rapids/issues/1576)|[BUG] loading SPARK-32639 example parquet file triggers a JVM crash | -|[#1643](https://github.com/NVIDIA/spark-rapids/issues/1643)|[BUG] TPC-DS-Like q10, q35, and q69 - slow or hanging at leftSemiJoin| -|[#1650](https://github.com/NVIDIA/spark-rapids/issues/1650)|[BUG] BenchmarkRunner does not include query name in JSON summary filename when running multiple queries| -|[#1654](https://github.com/NVIDIA/spark-rapids/issues/1654)|[BUG] TPC-DS-like query 59 at scale=3TB with AQE fails with join mismatch| -|[#1274](https://github.com/NVIDIA/spark-rapids/issues/1274)|[BUG] OutOfMemoryError - Maximum pool size exceeded while running 24 day criteo ETL Transform stage| -|[#1497](https://github.com/NVIDIA/spark-rapids/issues/1497)|[BUG] Spark-rapids v0.3.0 pytest integration tests with UCX on FAILED on Yarn cluster| -|[#1534](https://github.com/NVIDIA/spark-rapids/issues/1534)|[BUG] Spark 3.1.1 test failure in writing due to removal of InMemoryFileIndex.shouldFilterOut| -|[#1155](https://github.com/NVIDIA/spark-rapids/issues/1155)|[BUG] on shutdown don't print `Socket closed` exception when shutting down UCX.scala| -|[#1510](https://github.com/NVIDIA/spark-rapids/issues/1510)|[BUG] IllegalArgumentException during shuffle| -|[#1513](https://github.com/NVIDIA/spark-rapids/issues/1513)|[BUG] executor not fully initialized may get calls from Spark, in the process setting the `catalog` incorrectly| -|[#1466](https://github.com/NVIDIA/spark-rapids/issues/1466)|[BUG] Databricks build must run before the rapids nightly| -|[#1456](https://github.com/NVIDIA/spark-rapids/issues/1456)|[BUG] Databricks 0.4 parquet integration tests fail| -|[#1400](https://github.com/NVIDIA/spark-rapids/issues/1400)|[BUG] Regressions in spark-shell usage of benchmark utilities| -|[#1119](https://github.com/NVIDIA/spark-rapids/issues/1119)|[BUG] inner join fails with Column size cannot be negative| -|[#1079](https://github.com/NVIDIA/spark-rapids/issues/1079)|[BUG]The Scala UDF function cannot invoke the UDF compiler when it's passed to "explode"| -|[#1298](https://github.com/NVIDIA/spark-rapids/issues/1298)|TPCxBB query16 failed at UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary| -|[#1271](https://github.com/NVIDIA/spark-rapids/issues/1271)|[BUG] CastOpSuite and AnsiCastOpSuite failing with ArithmeticException on Spark 3.1| -|[#84](https://github.com/NVIDIA/spark-rapids/issues/84)|[BUG] sort does not match spark for -0.0 and 0.0| -|[#578](https://github.com/NVIDIA/spark-rapids/issues/578)|[BUG] Spark 3.1 qa_nightly_select_test.py Full join test failures| -|[#586](https://github.com/NVIDIA/spark-rapids/issues/586)|[BUG] Spark3.1 tpch failures| -|[#837](https://github.com/NVIDIA/spark-rapids/issues/837)|[BUG] Distinct count of floating point values differs with regular spark| -|[#953](https://github.com/NVIDIA/spark-rapids/issues/953)|[BUG] 3.1.0 pos_explode tests are failing| -|[#127](https://github.com/NVIDIA/spark-rapids/issues/127)|[BUG] String CSV parsing does not respect nullValues| -|[#1203](https://github.com/NVIDIA/spark-rapids/issues/1203)|[BUG] tpcds query 51 fails with join error on Spark 3.1.0| -|[#750](https://github.com/NVIDIA/spark-rapids/issues/750)|[BUG] udf_cudf_test::test_with_column fails with IPC error | -|[#1348](https://github.com/NVIDIA/spark-rapids/issues/1348)|[BUG] Host columnar decimal conversions are failing| -|[#1270](https://github.com/NVIDIA/spark-rapids/issues/1270)|[BUG] Benchmark runner fails to produce report if benchmark fails due to an invalid query plan| -|[#1179](https://github.com/NVIDIA/spark-rapids/issues/1179)|[BUG] SerializeConcatHostBuffersDeserializeBatch may have thread issues| -|[#1115](https://github.com/NVIDIA/spark-rapids/issues/1115)|[BUG] Unchecked type warning in SparkQueryCompareTestSuite| - -### PRs -||| -|:---|:---| -|[#1963](https://github.com/NVIDIA/spark-rapids/pull/1963)|Update changelog 0.4 [skip ci]| -|[#1960](https://github.com/NVIDIA/spark-rapids/pull/1960)|Replace sonatype staging link with maven central link| -|[#1945](https://github.com/NVIDIA/spark-rapids/pull/1945)|Update changelog 0.4 [skip ci]| -|[#1910](https://github.com/NVIDIA/spark-rapids/pull/1910)|Make hash partitioning match CPU| -|[#1927](https://github.com/NVIDIA/spark-rapids/pull/1927)|Change cuDF dependency to 0.18.1| -|[#1934](https://github.com/NVIDIA/spark-rapids/pull/1934)|Update documentation to use cudf version 0.18.1| -|[#1871](https://github.com/NVIDIA/spark-rapids/pull/1871)|Disable coalesce batch spilling to avoid cudf contiguous_split bug| -|[#1849](https://github.com/NVIDIA/spark-rapids/pull/1849)|Update changelog for 0.4| -|[#1744](https://github.com/NVIDIA/spark-rapids/pull/1744)|Fix NullPointerException on null partition insert| -|[#1842](https://github.com/NVIDIA/spark-rapids/pull/1842)|Update to note support for 3.0.2| -|[#1832](https://github.com/NVIDIA/spark-rapids/pull/1832)|Spark 3.1.1 shim no longer a snapshot shim| -|[#1831](https://github.com/NVIDIA/spark-rapids/pull/1831)|Spark 3.0.2 shim no longer a snapshot shim| -|[#1826](https://github.com/NVIDIA/spark-rapids/pull/1826)|Remove benchmarks| -|[#1828](https://github.com/NVIDIA/spark-rapids/pull/1828)|Update cudf dependency to 0.18| -|[#1813](https://github.com/NVIDIA/spark-rapids/pull/1813)|Fix LEAD/LAG failures in Spark 3.1.1| -|[#1819](https://github.com/NVIDIA/spark-rapids/pull/1819)|Fix scaladoc warning in GpuCast| -|[#1820](https://github.com/NVIDIA/spark-rapids/pull/1820)|[BUG] make modified check pre-merge only| -|[#1780](https://github.com/NVIDIA/spark-rapids/pull/1780)|Remove SNAPSHOT from test and integration_test READMEs| -|[#1809](https://github.com/NVIDIA/spark-rapids/pull/1809)|check if modified files after update_config/supported| -|[#1804](https://github.com/NVIDIA/spark-rapids/pull/1804)|Update UCX documentation for RX_QUEUE_LEN and Docker| -|[#1810](https://github.com/NVIDIA/spark-rapids/pull/1810)|Pandas UDF: Sort the data before computing the sum.| -|[#1751](https://github.com/NVIDIA/spark-rapids/pull/1751)|Exclude foldable expressions from GPU if constant folding is disabled| -|[#1798](https://github.com/NVIDIA/spark-rapids/pull/1798)|Add documentation about explain not on GPU when AQE is on| -|[#1766](https://github.com/NVIDIA/spark-rapids/pull/1766)|Branch 0.4 release docs| -|[#1794](https://github.com/NVIDIA/spark-rapids/pull/1794)|Build python output schema from udf expressions| -|[#1783](https://github.com/NVIDIA/spark-rapids/pull/1783)|Fix the collect_list over window tests failures on db| -|[#1781](https://github.com/NVIDIA/spark-rapids/pull/1781)|Better float/double cases for casting tests| -|[#1790](https://github.com/NVIDIA/spark-rapids/pull/1790)|Record row counts in benchmark runs that call collect| -|[#1779](https://github.com/NVIDIA/spark-rapids/pull/1779)|Add support of DateType and TimestampType for GetTimestamp expression| -|[#1768](https://github.com/NVIDIA/spark-rapids/pull/1768)|Updating getting started Databricks docs| -|[#1742](https://github.com/NVIDIA/spark-rapids/pull/1742)|Fix regression supporting to_date with Spark-3.1| -|[#1775](https://github.com/NVIDIA/spark-rapids/pull/1775)|Fix ambiguous ordering for some tests| -|[#1760](https://github.com/NVIDIA/spark-rapids/pull/1760)|Update GpuDataSourceScanExec and GpuBroadcastExchangeExec to fix audit issues| -|[#1750](https://github.com/NVIDIA/spark-rapids/pull/1750)|Detect task failures in benchmarks| -|[#1767](https://github.com/NVIDIA/spark-rapids/pull/1767)|Consistent Spark version for test and production| -|[#1741](https://github.com/NVIDIA/spark-rapids/pull/1741)|Reduce regex use in CAST| -|[#1756](https://github.com/NVIDIA/spark-rapids/pull/1756)|Skip RAPIDS accelerated Java UDF tests if UDF fails to load| -|[#1716](https://github.com/NVIDIA/spark-rapids/pull/1716)|Update RapidsShuffleManager documentation for branch 0.4| -|[#1740](https://github.com/NVIDIA/spark-rapids/pull/1740)|Disable ORC writes until bug can be fixed| -|[#1747](https://github.com/NVIDIA/spark-rapids/pull/1747)|Fix resource leaks in unit tests| -|[#1725](https://github.com/NVIDIA/spark-rapids/pull/1725)|Branch 0.4 FAQ reorg| -|[#1718](https://github.com/NVIDIA/spark-rapids/pull/1718)|CAST string to temporal type now calls isTimestamp| -|[#1734](https://github.com/NVIDIA/spark-rapids/pull/1734)|Disable range partitioning if computation is needed| -|[#1723](https://github.com/NVIDIA/spark-rapids/pull/1723)|Removed StructTypes support for ParquetCachedBatchSerializer as cudf doesn't support it yet| -|[#1714](https://github.com/NVIDIA/spark-rapids/pull/1714)|Add support for RAPIDS accelerated Java UDFs| -|[#1713](https://github.com/NVIDIA/spark-rapids/pull/1713)|Call GpuDeviceManager.shutdown when the executor plugin is shutting down| -|[#1596](https://github.com/NVIDIA/spark-rapids/pull/1596)|Added in Decimal support to ParquetCachedBatchSerializer| -|[#1706](https://github.com/NVIDIA/spark-rapids/pull/1706)|cleanup unused is_before_spark_310| -|[#1685](https://github.com/NVIDIA/spark-rapids/pull/1685)|Fix CustomShuffleReader replacement when decimal types enabled| -|[#1699](https://github.com/NVIDIA/spark-rapids/pull/1699)|Add docs about Spark 3.1 in standalone modes not needing extra class path| -|[#1701](https://github.com/NVIDIA/spark-rapids/pull/1701)|remove xfail for orc test_input_meta for spark 3.1.0| -|[#1703](https://github.com/NVIDIA/spark-rapids/pull/1703)|Remove xfail for spark 3.1.0 test_broadcast_join_mixed FullOuter| -|[#1676](https://github.com/NVIDIA/spark-rapids/pull/1676)|BenchmarkRunner option to generate query plan diagrams in DOT format| -|[#1695](https://github.com/NVIDIA/spark-rapids/pull/1695)|support alternate jar paths| -|[#1694](https://github.com/NVIDIA/spark-rapids/pull/1694)|increase mem and limit parallelism for pre-merge| -|[#1691](https://github.com/NVIDIA/spark-rapids/pull/1691)|add validate_execs_in_gpu_plan to pytest.ini| -|[#1692](https://github.com/NVIDIA/spark-rapids/pull/1692)|Add the integration test resources to the test tarball| -|[#1677](https://github.com/NVIDIA/spark-rapids/pull/1677)|When PTDS is enabled, print warning if the allocator is not ARENA| -|[#1683](https://github.com/NVIDIA/spark-rapids/pull/1683)|update changelog to verify autotmerge 0.5 setup [skip ci]| -|[#1673](https://github.com/NVIDIA/spark-rapids/pull/1673)|support auto-merge for branch 0.5 [skip ci]| -|[#1681](https://github.com/NVIDIA/spark-rapids/pull/1681)|Xfail the collect_list tests for databricks| -|[#1678](https://github.com/NVIDIA/spark-rapids/pull/1678)|Fix array/struct checks in Sort and HashAggregate and sorting tests in distributed mode| -|[#1671](https://github.com/NVIDIA/spark-rapids/pull/1671)|Allow metrics to be configurable by level| -|[#1675](https://github.com/NVIDIA/spark-rapids/pull/1675)|add run_pyspark_from_build.sh to the pytest distribution tarball| -|[#1548](https://github.com/NVIDIA/spark-rapids/pull/1548)|Support executing collect_list on GPU with windowing.| -|[#1593](https://github.com/NVIDIA/spark-rapids/pull/1593)|Avoid unnecessary Table instances after contiguous split| -|[#1592](https://github.com/NVIDIA/spark-rapids/pull/1592)|Add in support for Decimal divide| -|[#1668](https://github.com/NVIDIA/spark-rapids/pull/1668)|Implement way for python integration tests to validate Exec is in GPU plan| -|[#1669](https://github.com/NVIDIA/spark-rapids/pull/1669)|Add FAQ entries for executor-per-GPU questions| -|[#1661](https://github.com/NVIDIA/spark-rapids/pull/1661)|Enable Parquet test for file containing map struct key| -|[#1664](https://github.com/NVIDIA/spark-rapids/pull/1664)|Filter nulls for left semi and left anti join to work around cudf| -|[#1665](https://github.com/NVIDIA/spark-rapids/pull/1665)|Add better automated tests for Arrow columnar copy in HostColumnarToGpu| -|[#1614](https://github.com/NVIDIA/spark-rapids/pull/1614)|add alluxio getting start document| -|[#1639](https://github.com/NVIDIA/spark-rapids/pull/1639)|support GpuScalarSubquery| -|[#1656](https://github.com/NVIDIA/spark-rapids/pull/1656)|Move UDF to Catalyst Expressions to its own document| -|[#1663](https://github.com/NVIDIA/spark-rapids/pull/1663)|BenchmarkRunner - Include query name in JSON summary filename| -|[#1655](https://github.com/NVIDIA/spark-rapids/pull/1655)|Fix extraneous shuffles added by AQE| -|[#1652](https://github.com/NVIDIA/spark-rapids/pull/1652)|Fix typo in arrow optimized config name - spark.rapids.arrowCopyOptimizationEnabled| -|[#1645](https://github.com/NVIDIA/spark-rapids/pull/1645)|Run Databricks IT with python-xdist parallel, includes test fixes and xfail| -|[#1649](https://github.com/NVIDIA/spark-rapids/pull/1649)|Move building from source docs to contributing guide| -|[#1637](https://github.com/NVIDIA/spark-rapids/pull/1637)|Fail DivModLike on zero divisor in ANSI mode| -|[#1646](https://github.com/NVIDIA/spark-rapids/pull/1646)|Update links in rapids-udfs.md after moving to subfolder| -|[#1641](https://github.com/NVIDIA/spark-rapids/pull/1641)|Xfail struct and array order by tests on Dataproc| -|[#1565](https://github.com/NVIDIA/spark-rapids/pull/1565)|Add GPU accelerated array_contains operator| -|[#1617](https://github.com/NVIDIA/spark-rapids/pull/1617)|Enable nightly test checks for Apache Spark| -|[#1636](https://github.com/NVIDIA/spark-rapids/pull/1636)|RAPIDS accelerated Spark Scala UDF support| -|[#1634](https://github.com/NVIDIA/spark-rapids/pull/1634)|Fix databricks build since Arrow code added| -|[#1599](https://github.com/NVIDIA/spark-rapids/pull/1599)|Add division by zero tests for Spark 3.1 behavior| -|[#1619](https://github.com/NVIDIA/spark-rapids/pull/1619)|Update GpuFileSourceScanExec to be in sync with DataSourceScanExec| -|[#1631](https://github.com/NVIDIA/spark-rapids/pull/1631)|Explicitly add maven-jar-plugin version to improve incremental build time.| -|[#1624](https://github.com/NVIDIA/spark-rapids/pull/1624)|Update explain format to show what will and will not run on the GPU| -|[#1622](https://github.com/NVIDIA/spark-rapids/pull/1622)|Support faster copy for a custom DataSource V2 which supplies Arrow data| -|[#1621](https://github.com/NVIDIA/spark-rapids/pull/1621)|Additional functionality docs| -|[#1618](https://github.com/NVIDIA/spark-rapids/pull/1618)|update blossom-ci for security updates [skip ci]| -|[#1562](https://github.com/NVIDIA/spark-rapids/pull/1562)|add alluxio support| -|[#1597](https://github.com/NVIDIA/spark-rapids/pull/1597)|Documentation for Parquet serializer| -|[#1611](https://github.com/NVIDIA/spark-rapids/pull/1611)|Add in flag for integration tests to not skip required tests| -|[#1609](https://github.com/NVIDIA/spark-rapids/pull/1609)|Disable float round/bround by default| -|[#1615](https://github.com/NVIDIA/spark-rapids/pull/1615)|Add in window support for average| -|[#1610](https://github.com/NVIDIA/spark-rapids/pull/1610)|Limit length of spark app name in BenchmarkRunner| -|[#1579](https://github.com/NVIDIA/spark-rapids/pull/1579)|Support TakeOrderedAndProject| -|[#1581](https://github.com/NVIDIA/spark-rapids/pull/1581)|Support Decimal type for CollectLimitExec| -|[#1591](https://github.com/NVIDIA/spark-rapids/pull/1591)|Add support for running multiple queries in BenchmarkRunner| -|[#1595](https://github.com/NVIDIA/spark-rapids/pull/1595)|Fix Github documentation issue template| -|[#1577](https://github.com/NVIDIA/spark-rapids/pull/1577)|rename directory from spark310 to spark311| -|[#1578](https://github.com/NVIDIA/spark-rapids/pull/1578)|Test to track RAPIDS-side issues re SPARK-32639| -|[#1583](https://github.com/NVIDIA/spark-rapids/pull/1583)|fix request-action issue [skip ci]| -|[#1555](https://github.com/NVIDIA/spark-rapids/pull/1555)|Enable ANSI mode for CAST string to timestamp| -|[#1531](https://github.com/NVIDIA/spark-rapids/pull/1531)|Decimal Support for writing Parquet| -|[#1545](https://github.com/NVIDIA/spark-rapids/pull/1545)|Support comparing ORC data| -|[#1570](https://github.com/NVIDIA/spark-rapids/pull/1570)|Branch 0.4 doc cleanup| -|[#1569](https://github.com/NVIDIA/spark-rapids/pull/1569)|Add shim method shouldIgnorePath| -|[#1564](https://github.com/NVIDIA/spark-rapids/pull/1564)|Add in support for Decimal Multiply and DIV| -|[#1561](https://github.com/NVIDIA/spark-rapids/pull/1561)|Decimal support for add and subtract| -|[#1560](https://github.com/NVIDIA/spark-rapids/pull/1560)|support sum in window aggregation for decimal| -|[#1546](https://github.com/NVIDIA/spark-rapids/pull/1546)|Cleanup shutdown logging for UCX shuffle| -|[#1551](https://github.com/NVIDIA/spark-rapids/pull/1551)|RAPIDS-accelerated Hive UDFs support all types| -|[#1543](https://github.com/NVIDIA/spark-rapids/pull/1543)|Shuffle/transport enabled by default| -|[#1552](https://github.com/NVIDIA/spark-rapids/pull/1552)|Disable blackduck signature check| -|[#1540](https://github.com/NVIDIA/spark-rapids/pull/1540)|Handle ShuffleManager api calls when plugin is not fully initialized| -|[#1547](https://github.com/NVIDIA/spark-rapids/pull/1547)|Cleanup shuffle transport receive calls| -|[#1512](https://github.com/NVIDIA/spark-rapids/pull/1512)|Support window operations on Decimal| -|[#1532](https://github.com/NVIDIA/spark-rapids/pull/1532)|Support casting from decimal to decimal| -|[#1542](https://github.com/NVIDIA/spark-rapids/pull/1542)|Change the number of partitions to zero when a range is empty| -|[#1506](https://github.com/NVIDIA/spark-rapids/pull/1506)|Add --use-decimals flag to TPC-DS ConvertFiles| -|[#1511](https://github.com/NVIDIA/spark-rapids/pull/1511)|Remove unused Jenkinsfiles [skip ci]| -|[#1505](https://github.com/NVIDIA/spark-rapids/pull/1505)|Add least, greatest and eqNullSafe support for DecimalType| -|[#1484](https://github.com/NVIDIA/spark-rapids/pull/1484)|add doc for nsight systems bundled with cuda toolkit| -|[#1478](https://github.com/NVIDIA/spark-rapids/pull/1478)|Documentation for RAPIDS-accelerated Hive UDFs| -|[#1477](https://github.com/NVIDIA/spark-rapids/pull/1477)|Allow structs and arrays to pass through for Shuffle and Sort | -|[#1489](https://github.com/NVIDIA/spark-rapids/pull/1489)|Adds in some support for the array sql function| -|[#1438](https://github.com/NVIDIA/spark-rapids/pull/1438)|Cast from numeric types to decimal type| -|[#1493](https://github.com/NVIDIA/spark-rapids/pull/1493)|Moved ParquetRecordMaterializer to the shim package to follow convention| -|[#1495](https://github.com/NVIDIA/spark-rapids/pull/1495)|Fix merge conflict, merge branch 0.3 to branch 0.4 [skip ci]| -|[#1472](https://github.com/NVIDIA/spark-rapids/pull/1472)|Add an example RAPIDS-accelerated Hive UDF using native code| -|[#1488](https://github.com/NVIDIA/spark-rapids/pull/1488)|Rename Spark 3.1.0 shim to Spark 3.1.1 to match community| -|[#1474](https://github.com/NVIDIA/spark-rapids/pull/1474)|Fix link| -|[#1476](https://github.com/NVIDIA/spark-rapids/pull/1476)|DecimalType support for Aggregate Count| -|[#1475](https://github.com/NVIDIA/spark-rapids/pull/1475)| Join support for DecimalType| -|[#1244](https://github.com/NVIDIA/spark-rapids/pull/1244)|Support round and bround SQL functions | -|[#1458](https://github.com/NVIDIA/spark-rapids/pull/1458)|Add in support for struct and named_struct| -|[#1465](https://github.com/NVIDIA/spark-rapids/pull/1465)|DecimalType support for UnionExec and ExpandExec| -|[#1450](https://github.com/NVIDIA/spark-rapids/pull/1450)|Add dynamic configs for the spark-rapids IT pipelines| -|[#1207](https://github.com/NVIDIA/spark-rapids/pull/1207)|Spark SQL hash function using murmur3| -|[#1457](https://github.com/NVIDIA/spark-rapids/pull/1457)|Support reading decimal columns from parquet files on Databricks| -|[#1455](https://github.com/NVIDIA/spark-rapids/pull/1455)|Upgrade Scala Maven Plugin to 4.3.0| -|[#1453](https://github.com/NVIDIA/spark-rapids/pull/1453)|DecimalType support for IfElse and Coalesce| -|[#1452](https://github.com/NVIDIA/spark-rapids/pull/1452)|Support DecimalType for CaseWhen| -|[#1444](https://github.com/NVIDIA/spark-rapids/pull/1444)|Improve UX when running benchmarks from Spark shell| -|[#1294](https://github.com/NVIDIA/spark-rapids/pull/1294)|Support reading decimal columns from parquet files| -|[#1153](https://github.com/NVIDIA/spark-rapids/pull/1153)|Scala UDF will compile children expressions in Project| -|[#1416](https://github.com/NVIDIA/spark-rapids/pull/1416)|Optimize mvn dependency download scripts| -|[#1430](https://github.com/NVIDIA/spark-rapids/pull/1430)|Add project for testing code that requires Spark 3.1.0 or later| -|[#1425](https://github.com/NVIDIA/spark-rapids/pull/1425)|Add in Decimal support for abs, floor, ceil, unary - and unary +| -|[#1427](https://github.com/NVIDIA/spark-rapids/pull/1427)|Revert "Make the multi-threaded parquet reader the default"| -|[#1420](https://github.com/NVIDIA/spark-rapids/pull/1420)|Add udf jar to nightly integration tests| -|[#1422](https://github.com/NVIDIA/spark-rapids/pull/1422)|Log the number of concurrent gpu tasks allowed on Executor startup| -|[#1401](https://github.com/NVIDIA/spark-rapids/pull/1401)|Accelerate the coalescing parquet reader when reading files from multiple partitioned folders| -|[#1413](https://github.com/NVIDIA/spark-rapids/pull/1413)|Add config for cast float to integral types| -|[#1313](https://github.com/NVIDIA/spark-rapids/pull/1313)|Support spilling to disk directly via cuFile/GDS| -|[#1411](https://github.com/NVIDIA/spark-rapids/pull/1411)|Add udf-examples jar to databricks build| -|[#1412](https://github.com/NVIDIA/spark-rapids/pull/1412)|Fix a lot of tests marked with xfail for Spark 3.1.0 that no longer fail| -|[#1414](https://github.com/NVIDIA/spark-rapids/pull/1414)|Build merged code of HEAD and BASE branch for pre-merge [skip ci]| -|[#1409](https://github.com/NVIDIA/spark-rapids/pull/1409)|Add option to use decimals in tpc-ds csv to parquet conversion| -|[#1410](https://github.com/NVIDIA/spark-rapids/pull/1410)|Add Decimal support for In, InSet, AtLeastNNonNulls, GetArrayItem, GetStructField, and GenerateExec| -|[#1408](https://github.com/NVIDIA/spark-rapids/pull/1408)|Support RAPIDS-accelerated HiveGenericUDF| -|[#1407](https://github.com/NVIDIA/spark-rapids/pull/1407)|Update docs and tests for null CSV support| -|[#1393](https://github.com/NVIDIA/spark-rapids/pull/1393)|Support RAPIDS-accelerated HiveSimpleUDF| -|[#1392](https://github.com/NVIDIA/spark-rapids/pull/1392)|Turn on hash partitioning for decimal support| -|[#1402](https://github.com/NVIDIA/spark-rapids/pull/1402)|Better GPU Cast type checks| -|[#1404](https://github.com/NVIDIA/spark-rapids/pull/1404)|Fix branch 0.4 merge conflict| -|[#1323](https://github.com/NVIDIA/spark-rapids/pull/1323)|More advanced type checking and documentation| -|[#1391](https://github.com/NVIDIA/spark-rapids/pull/1391)|Remove extra null join filtering because cudf is fast for this now.| -|[#1395](https://github.com/NVIDIA/spark-rapids/pull/1395)|Fix branch-0.3 -> branch-0.4 automerge| -|[#1382](https://github.com/NVIDIA/spark-rapids/pull/1382)|Handle "MM[/-]dd" and "dd[/-]MM" datetime formats in UnixTimeExprMeta| -|[#1390](https://github.com/NVIDIA/spark-rapids/pull/1390)|Accelerated columnar to row/row to columnar for decimal| -|[#1380](https://github.com/NVIDIA/spark-rapids/pull/1380)|Adds in basic support for decimal sort, sum, and some shuffle| -|[#1367](https://github.com/NVIDIA/spark-rapids/pull/1367)|Reuse gpu expression conversion rules when checking sort order| -|[#1349](https://github.com/NVIDIA/spark-rapids/pull/1349)|Add canonicalization tests| -|[#1368](https://github.com/NVIDIA/spark-rapids/pull/1368)|Move to cudf 0.18-SNAPSHOT| -|[#1361](https://github.com/NVIDIA/spark-rapids/pull/1361)|Use the correct precision when reading spark columnar data.| -|[#1273](https://github.com/NVIDIA/spark-rapids/pull/1273)|Update docs and scripts to 0.4.0-SNAPSHOT| -|[#1321](https://github.com/NVIDIA/spark-rapids/pull/1321)|Refactor to stop inheriting from HashJoin| -|[#1311](https://github.com/NVIDIA/spark-rapids/pull/1311)|ParquetCachedBatchSerializer code cleanup| -|[#1303](https://github.com/NVIDIA/spark-rapids/pull/1303)|Add explicit outputOrdering for BHJ and SHJ in spark310 shim| -|[#1299](https://github.com/NVIDIA/spark-rapids/pull/1299)|Benchmark runner improved error handling| - -## Release 0.3 - -### Features -||| -|:---|:---| -|[#1002](https://github.com/NVIDIA/spark-rapids/issues/1002)|[FEA] RapidsHostColumnVectorCore should verify cudf data with respect to the expected spark type | -|[#444](https://github.com/NVIDIA/spark-rapids/issues/444)|[FEA] Plugable Cache| -|[#1158](https://github.com/NVIDIA/spark-rapids/issues/1158)|[FEA] Better documentation on type support| -|[#57](https://github.com/NVIDIA/spark-rapids/issues/57)|[FEA] Support INT96 for parquet reads and writes| -|[#1003](https://github.com/NVIDIA/spark-rapids/issues/1003)|[FEA] Reduce overlap between RapidsHostColumnVector and RapidsHostColumnVectorCore| -|[#913](https://github.com/NVIDIA/spark-rapids/issues/913)|[FEA] In Pluggable Cache Support CalendarInterval while creating CachedBatches| -|[#1092](https://github.com/NVIDIA/spark-rapids/issues/1092)|[FEA] In Pluggable Cache handle nested types having CalendarIntervalType and NullType| -|[#670](https://github.com/NVIDIA/spark-rapids/issues/670)|[FEA] Support NullType| -|[#50](https://github.com/NVIDIA/spark-rapids/issues/50)|[FEA] support `spark.sql.legacy.timeParserPolicy`| -|[#1144](https://github.com/NVIDIA/spark-rapids/issues/1144)|[FEA] Remove Databricks 3.0.0 shim layer| -|[#1096](https://github.com/NVIDIA/spark-rapids/issues/1096)|[FEA] Implement parquet CreateDataSourceTableAsSelectCommand| -|[#688](https://github.com/NVIDIA/spark-rapids/issues/688)|[FEA] udf compiler should be auto-appended to `spark.sql.extensions`| -|[#502](https://github.com/NVIDIA/spark-rapids/issues/502)|[FEA] Support Databricks 7.3 LTS Runtime| -|[#764](https://github.com/NVIDIA/spark-rapids/issues/764)|[FEA] Sanity checks for cudf jar mismatch| -|[#1018](https://github.com/NVIDIA/spark-rapids/issues/1018)|[FEA] Log details related to GPU memory fragmentation on GPU OOM| -|[#619](https://github.com/NVIDIA/spark-rapids/issues/619)|[FEA] log whether libcudf and libcudfjni were built for PTDS| -|[#905](https://github.com/NVIDIA/spark-rapids/issues/905)|[FEA] create AWS EMR 3.0.1 shim| -|[#838](https://github.com/NVIDIA/spark-rapids/issues/838)|[FEA] Support window count for a column| -|[#864](https://github.com/NVIDIA/spark-rapids/issues/864)|[FEA] config option to enable RMM arena memory resource| -|[#430](https://github.com/NVIDIA/spark-rapids/issues/430)|[FEA] Audit: Parquet Writer support for TIMESTAMP_MILLIS| -|[#818](https://github.com/NVIDIA/spark-rapids/issues/818)|[FEA] Create shim layer for AWS EMR | -|[#608](https://github.com/NVIDIA/spark-rapids/issues/608)|[FEA] Parquet small file optimization improve handle merge schema| - -### Performance -||| -|:---|:---| -|[#446](https://github.com/NVIDIA/spark-rapids/issues/446)|[FEA] Test jucx in 1.9.x branch| -|[#1038](https://github.com/NVIDIA/spark-rapids/issues/1038)|[FEA] Accelerate the data transfer for plan `WindowInPandasExec`| -|[#533](https://github.com/NVIDIA/spark-rapids/issues/533)|[FEA] Improve PTDS performance| -|[#849](https://github.com/NVIDIA/spark-rapids/issues/849)|[FEA] Have GpuColumnarBatchSerializer return GpuColumnVectorFromBuffer instances| -|[#784](https://github.com/NVIDIA/spark-rapids/issues/784)|[FEA] Allow Host Spilling to be more dynamic| -|[#627](https://github.com/NVIDIA/spark-rapids/issues/627)|[FEA] Further parquet reading small file improvements| -|[#5](https://github.com/NVIDIA/spark-rapids/issues/5)|[FEA] Support Adaptive Execution| - -### Bugs Fixed -||| -|:---|:---| -|[#1423](https://github.com/NVIDIA/spark-rapids/issues/1423)|[BUG] Mortgage ETL sample failed with spark.sql.adaptive enabled on AWS EMR 6.2 | -|[#1369](https://github.com/NVIDIA/spark-rapids/issues/1369)|[BUG] TPC-DS Query Failing on EMR 6.2 with AQE| -|[#1344](https://github.com/NVIDIA/spark-rapids/issues/1344)|[BUG] Spark-rapids Pytests failed on On Databricks cluster spark standalone mode| -|[#1279](https://github.com/NVIDIA/spark-rapids/issues/1279)|[BUG] TPC-DS query 2 failing with NPE| -|[#1280](https://github.com/NVIDIA/spark-rapids/issues/1280)|[BUG] TPC-DS query 93 failing with UnsupportedOperationException| -|[#1308](https://github.com/NVIDIA/spark-rapids/issues/1308)|[BUG] TPC-DS query 14a runs much slower on 0.3| -|[#1284](https://github.com/NVIDIA/spark-rapids/issues/1284)|[BUG] TPC-DS query 77 at scale=1TB fails with maxResultSize exceeded error| -|[#1061](https://github.com/NVIDIA/spark-rapids/issues/1061)|[BUG] orc_test.py is failing| -|[#1197](https://github.com/NVIDIA/spark-rapids/issues/1197)|[BUG] java.lang.NullPointerException when exporting delta table| -|[#685](https://github.com/NVIDIA/spark-rapids/issues/685)|[BUG] In ParqueCachedBatchSerializer, serializing parquet buffers might blow up in certain cases| -|[#1269](https://github.com/NVIDIA/spark-rapids/issues/1269)|[BUG] GpuSubstring is not expected to be a part of a SortOrder| -|[#1246](https://github.com/NVIDIA/spark-rapids/issues/1246)|[BUG] Many TPC-DS benchmarks fail when writing to Parquet| -|[#961](https://github.com/NVIDIA/spark-rapids/issues/961)|[BUG] ORC predicate pushdown should work with case-insensitive analysis| -|[#962](https://github.com/NVIDIA/spark-rapids/issues/962)|[BUG] Loading columns from an ORC file without column names returns no data| -|[#1245](https://github.com/NVIDIA/spark-rapids/issues/1245)|[BUG] Code adding buffers to the spillable store should synchronize| -|[#570](https://github.com/NVIDIA/spark-rapids/issues/570)|[BUG] Continue debugging OOM after ensuring device store is empty| -|[#972](https://github.com/NVIDIA/spark-rapids/issues/972)|[BUG] total time metric is redundant with scan time| -|[#1039](https://github.com/NVIDIA/spark-rapids/issues/1039)|[BUG] UNBOUNDED window ranges on null timestamp columns produces incorrect results.| -|[#1195](https://github.com/NVIDIA/spark-rapids/issues/1195)|[BUG] AcceleratedColumnarToRowIterator queue empty| -|[#1177](https://github.com/NVIDIA/spark-rapids/issues/1177)|[BUG] leaks possible in the rapids shuffle if batches are received after the task completes| -|[#1216](https://github.com/NVIDIA/spark-rapids/issues/1216)|[BUG] Failure to recognize ORC file format when loaded via Hive| -|[#898](https://github.com/NVIDIA/spark-rapids/issues/898)|[BUG] count reductions are failing on databricks because lack for Complete support| -|[#1184](https://github.com/NVIDIA/spark-rapids/issues/1184)|[BUG] test_window_aggregate_udf_array_from_python fails on databricks 3.0.1| -|[#1151](https://github.com/NVIDIA/spark-rapids/issues/1151)|[BUG]Add databricks 3.0.1 shim layer for GpuWindowInPandasExec.| -|[#1199](https://github.com/NVIDIA/spark-rapids/issues/1199)|[BUG] No data size in Input column in Stages page from Spark UI when using Parquet as file source| -|[#1031](https://github.com/NVIDIA/spark-rapids/issues/1031)|[BUG] dependency info properties file contains error messages| -|[#1149](https://github.com/NVIDIA/spark-rapids/issues/1149)|[BUG] Scaladoc warnings in GpuDataSource| -|[#1185](https://github.com/NVIDIA/spark-rapids/issues/1185)|[BUG] test_hash_multiple_mode_query failing| -|[#724](https://github.com/NVIDIA/spark-rapids/issues/724)|[BUG] PySpark test_broadcast_nested_loop_join_special_case intermittent failure| -|[#1164](https://github.com/NVIDIA/spark-rapids/issues/1164)|[BUG] ansi_cast tests are failing in 3.1.0| -|[#1110](https://github.com/NVIDIA/spark-rapids/issues/1110)|[BUG] Special date "now" has wrong value on GPU| -|[#1139](https://github.com/NVIDIA/spark-rapids/issues/1139)|[BUG] Host columnar to GPU can be very slow| -|[#1094](https://github.com/NVIDIA/spark-rapids/issues/1094)|[BUG] unix_timestamp on GPU returns invalid data for special dates| -|[#1098](https://github.com/NVIDIA/spark-rapids/issues/1098)|[BUG] unix_timestamp on GPU returns invalid data for bad input| -|[#1082](https://github.com/NVIDIA/spark-rapids/issues/1082)|[BUG] string to timestamp conversion fails with split| -|[#1140](https://github.com/NVIDIA/spark-rapids/issues/1140)|[BUG] ConcurrentModificationException error after scala test suite completes| -|[#1073](https://github.com/NVIDIA/spark-rapids/issues/1073)|[BUG] java.lang.RuntimeException: BinaryExpressions must override either eval or nullSafeEval| -|[#975](https://github.com/NVIDIA/spark-rapids/issues/975)|[BUG] BroadcastExchangeExec fails to fall back to CPU on driver node on GCP Dataproc| -|[#773](https://github.com/NVIDIA/spark-rapids/issues/773)|[BUG] Investigate high task deserialization| -|[#1035](https://github.com/NVIDIA/spark-rapids/issues/1035)|[BUG] TPC-DS query 90 with AQE enabled fails with doExecuteBroadcast exception| -|[#825](https://github.com/NVIDIA/spark-rapids/issues/825)|[BUG] test_window_aggs_for_ranges intermittently fails| -|[#1008](https://github.com/NVIDIA/spark-rapids/issues/1008)|[BUG] limit function is producing inconsistent result when type is Byte, Long, Boolean and Timestamp| -|[#996](https://github.com/NVIDIA/spark-rapids/issues/996)|[BUG] TPC-DS benchmark via spark-submit does not provide option to disable appending .dat to path| -|[#1006](https://github.com/NVIDIA/spark-rapids/issues/1006)|[BUG] Spark3.1.0 changed BasicWriteTaskStats breaks BasicColumnarWriteTaskStatsTracker| -|[#985](https://github.com/NVIDIA/spark-rapids/issues/985)|[BUG] missing metric `dataSize`| -|[#881](https://github.com/NVIDIA/spark-rapids/issues/881)|[BUG] cannot disable Sort by itself| -|[#812](https://github.com/NVIDIA/spark-rapids/issues/812)|[BUG] Test failures for 0.2 when run with multiple executors| -|[#925](https://github.com/NVIDIA/spark-rapids/issues/925)|[BUG]Range window-functions with non-timestamp order-by expressions not falling back to CPU| -|[#852](https://github.com/NVIDIA/spark-rapids/issues/852)|[BUG] BenchUtils.compareResults cannot compare partitioned files when ignoreOrdering=false| -|[#868](https://github.com/NVIDIA/spark-rapids/issues/868)|[BUG] Rounding error when casting timestamp to string for timestamps before 1970| -|[#880](https://github.com/NVIDIA/spark-rapids/issues/880)|[BUG] doing a window operation with an orderby for a single constant crashes| -|[#776](https://github.com/NVIDIA/spark-rapids/issues/776)|[BUG] Integration test fails on spark 3.1.0-SNAPSHOT| -|[#874](https://github.com/NVIDIA/spark-rapids/issues/874)|[BUG] `RapidsConf.scala` has some un-consistency for `spark.rapids.sql.format.parquet.multiThreadedRead`| -|[#860](https://github.com/NVIDIA/spark-rapids/issues/860)|[BUG] we need to mark columns from received shuffle buffers as `GpuColumnVectorFromBuffer`| -|[#122](https://github.com/NVIDIA/spark-rapids/issues/122)|[BUG] CSV Timestamp parseing is broken for TS < 1902 and TS > 2038| -|[#810](https://github.com/NVIDIA/spark-rapids/issues/810)|[BUG] UDF Integration tests fail if pandas is not installed| -|[#746](https://github.com/NVIDIA/spark-rapids/issues/746)|[BUG] cudf_udf_test.py is flakey| -|[#811](https://github.com/NVIDIA/spark-rapids/issues/811)|[BUG] 0.3 nightly is timing out | -|[#574](https://github.com/NVIDIA/spark-rapids/issues/574)|[BUG] Fix GpuTimeSub for Spark 3.1.0| - -### PRs -||| -|:---|:---| -|[#1496](https://github.com/NVIDIA/spark-rapids/pull/1496)|Update changelog for v0.3.0 release [skip ci]| -|[#1473](https://github.com/NVIDIA/spark-rapids/pull/1473)|Update documentation for 0.3 release| -|[#1371](https://github.com/NVIDIA/spark-rapids/pull/1371)|Start Guide for RAPIDS on AWS EMR 6.2| -|[#1446](https://github.com/NVIDIA/spark-rapids/pull/1446)|Update changelog for 0.3.0 release [skip ci]| -|[#1439](https://github.com/NVIDIA/spark-rapids/pull/1439)|when AQE enabled we fail to fix up exchanges properly and EMR| -|[#1433](https://github.com/NVIDIA/spark-rapids/pull/1433)|fix pandas 1.2 compatible issue| -|[#1424](https://github.com/NVIDIA/spark-rapids/pull/1424)|Make the multi-threaded parquet reader the default since coalescing doesn't handle partitioned files well| -|[#1389](https://github.com/NVIDIA/spark-rapids/pull/1389)|Update project version to 0.3.0| -|[#1387](https://github.com/NVIDIA/spark-rapids/pull/1387)|Update cudf version to 0.17| -|[#1370](https://github.com/NVIDIA/spark-rapids/pull/1370)|[REVIEW] init changelog 0.3 [skip ci]| -|[#1376](https://github.com/NVIDIA/spark-rapids/pull/1376)|MetaUtils.getBatchFromMeta should return batches with GpuColumnVectorFromBuffer| -|[#1358](https://github.com/NVIDIA/spark-rapids/pull/1358)|auto-merge: instant merge after creation [skip ci]| -|[#1359](https://github.com/NVIDIA/spark-rapids/pull/1359)|Use SortOrder from shims.| -|[#1343](https://github.com/NVIDIA/spark-rapids/pull/1343)|Do not run UDFs when the partition is empty.| -|[#1342](https://github.com/NVIDIA/spark-rapids/pull/1342)|Fix and edit docs for standalone mode| -|[#1350](https://github.com/NVIDIA/spark-rapids/pull/1350)|fix GpuRangePartitioning canonicalization| -|[#1281](https://github.com/NVIDIA/spark-rapids/pull/1281)|Documentation added for testing| -|[#1336](https://github.com/NVIDIA/spark-rapids/pull/1336)|Fix missing post-shuffle coalesce with AQE| -|[#1318](https://github.com/NVIDIA/spark-rapids/pull/1318)|Fix copying GpuFileSourceScanExec node| -|[#1337](https://github.com/NVIDIA/spark-rapids/pull/1337)|Use UTC instead of GMT| -|[#1307](https://github.com/NVIDIA/spark-rapids/pull/1307)|Fallback to cpu when reading Delta log files for stats| -|[#1310](https://github.com/NVIDIA/spark-rapids/pull/1310)|Fix canonicalization of GpuFileSourceScanExec, GpuShuffleCoalesceExec| -|[#1302](https://github.com/NVIDIA/spark-rapids/pull/1302)|Add GpuSubstring handling to SortOrder canonicalization| -|[#1265](https://github.com/NVIDIA/spark-rapids/pull/1265)|Chunking input before writing a ParquetCachedBatch| -|[#1278](https://github.com/NVIDIA/spark-rapids/pull/1278)|Add a config to disable decimal types by default| -|[#1272](https://github.com/NVIDIA/spark-rapids/pull/1272)|Add Alias to shims| -|[#1268](https://github.com/NVIDIA/spark-rapids/pull/1268)|Adds in support docs for 0.3 release| -|[#1235](https://github.com/NVIDIA/spark-rapids/pull/1235)|Trigger reading and handling control data.| -|[#1266](https://github.com/NVIDIA/spark-rapids/pull/1266)|Updating Databricks getting started for 0.3 release| -|[#1291](https://github.com/NVIDIA/spark-rapids/pull/1291)|Increase pre-merge resource requests [skip ci]| -|[#1275](https://github.com/NVIDIA/spark-rapids/pull/1275)|Temporarily disable more CAST tests for Spark 3.1.0| -|[#1264](https://github.com/NVIDIA/spark-rapids/pull/1264)|Fix race condition in batch creation| -|[#1260](https://github.com/NVIDIA/spark-rapids/pull/1260)|Update UCX license info in NOTIFY-binary for 1.9 and RAPIDS plugin copyright dates| -|[#1247](https://github.com/NVIDIA/spark-rapids/pull/1247)|Ensure column names are valid when writing benchmark query results to file| -|[#1240](https://github.com/NVIDIA/spark-rapids/pull/1240)|Fix loading from ORC file with no column names| -|[#1242](https://github.com/NVIDIA/spark-rapids/pull/1242)|Remove compatibility documentation about unsupported INT96| -|[#1192](https://github.com/NVIDIA/spark-rapids/pull/1192)|[REVIEW] Support GpuFilter and GpuCoalesceBatches for decimal data| -|[#1170](https://github.com/NVIDIA/spark-rapids/pull/1170)|Add nested type support to MetaUtils| -|[#1194](https://github.com/NVIDIA/spark-rapids/pull/1194)|Drop redundant total time metric from scan| -|[#1248](https://github.com/NVIDIA/spark-rapids/pull/1248)|At BatchedTableCompressor.finish synchronize to allow for "right-size…| -|[#1169](https://github.com/NVIDIA/spark-rapids/pull/1169)|Use CUDF's "UNBOUNDED" window boundaries for time-range queries.| -|[#1204](https://github.com/NVIDIA/spark-rapids/pull/1204)|Avoid empty batches on columnar to row conversion| -|[#1133](https://github.com/NVIDIA/spark-rapids/pull/1133)|Refactor batch coalesce to be based solely on batch data size| -|[#1237](https://github.com/NVIDIA/spark-rapids/pull/1237)|In transport, limit pending transfer requests to fit within a bounce| -|[#1232](https://github.com/NVIDIA/spark-rapids/pull/1232)|Move SortOrder creation to shims| -|[#1068](https://github.com/NVIDIA/spark-rapids/pull/1068)|Write int96 to parquet| -|[#1193](https://github.com/NVIDIA/spark-rapids/pull/1193)|Verify shuffle of decimal columns| -|[#1180](https://github.com/NVIDIA/spark-rapids/pull/1180)|Remove batches if they are received after the iterator detects that t…| -|[#1173](https://github.com/NVIDIA/spark-rapids/pull/1173)|Support relational operators for decimal type| -|[#1220](https://github.com/NVIDIA/spark-rapids/pull/1220)|Support replacing ORC format when Hive is configured| -|[#1219](https://github.com/NVIDIA/spark-rapids/pull/1219)|Upgrade to jucx 1.9.0| -|[#1081](https://github.com/NVIDIA/spark-rapids/pull/1081)|Add option to upload benchmark summary JSON file| -|[#1217](https://github.com/NVIDIA/spark-rapids/pull/1217)|Aggregate reductions in Complete mode should use updateExpressions| -|[#1218](https://github.com/NVIDIA/spark-rapids/pull/1218)|Remove obsolete HiveStringType usage| -|[#1214](https://github.com/NVIDIA/spark-rapids/pull/1214)|changelog update 2020-11-30. Trigger automerge check [skip ci]| -|[#1210](https://github.com/NVIDIA/spark-rapids/pull/1210)|Support auto-merge for branch-0.4 [skip ci]| -|[#1202](https://github.com/NVIDIA/spark-rapids/pull/1202)|Fix a bug with the support for java.lang.StringBuilder.append.| -|[#1213](https://github.com/NVIDIA/spark-rapids/pull/1213)|Skip casting StringType to TimestampType for Spark 310| -|[#1201](https://github.com/NVIDIA/spark-rapids/pull/1201)|Replace only window expressions on databricks.| -|[#1208](https://github.com/NVIDIA/spark-rapids/pull/1208)|[BUG] Fix GHSL2020-239 [skip ci]| -|[#1205](https://github.com/NVIDIA/spark-rapids/pull/1205)|Fix missing input bytes read metric for Parquet| -|[#1206](https://github.com/NVIDIA/spark-rapids/pull/1206)|Update Spark 3.1 shim for ShuffleOrigin shuffle parameter| -|[#1196](https://github.com/NVIDIA/spark-rapids/pull/1196)|Rename ShuffleCoalesceExec to GpuShuffleCoalesceExec| -|[#1191](https://github.com/NVIDIA/spark-rapids/pull/1191)|Skip window array tests for databricks.| -|[#1183](https://github.com/NVIDIA/spark-rapids/pull/1183)|Support for CalendarIntervalType and NullType| -|[#1150](https://github.com/NVIDIA/spark-rapids/pull/1150)|udf spec| -|[#1188](https://github.com/NVIDIA/spark-rapids/pull/1188)|Add in tests for parquet nested pruning support| -|[#1189](https://github.com/NVIDIA/spark-rapids/pull/1189)|Enable NullType for First and Last in 3.0.1+| -|[#1181](https://github.com/NVIDIA/spark-rapids/pull/1181)|Fix resource leaks in unit tests| -|[#1186](https://github.com/NVIDIA/spark-rapids/pull/1186)|Fix compilation and scaladoc warnings| -|[#1187](https://github.com/NVIDIA/spark-rapids/pull/1187)|Updated documentation for distinct count compatibility| -|[#1182](https://github.com/NVIDIA/spark-rapids/pull/1182)|Close buffer catalog on device manager shutdown| -|[#1137](https://github.com/NVIDIA/spark-rapids/pull/1137)|Let GpuWindowInPandas declare ArrayType supported.| -|[#1176](https://github.com/NVIDIA/spark-rapids/pull/1176)|Add in support for null type| -|[#1174](https://github.com/NVIDIA/spark-rapids/pull/1174)|Fix race condition in SerializeConcatHostBuffersDeserializeBatch| -|[#1175](https://github.com/NVIDIA/spark-rapids/pull/1175)|Fix leaks seen in shuffle tests| -|[#1138](https://github.com/NVIDIA/spark-rapids/pull/1138)|[REVIEW] Support decimal type for GpuProjectExec| -|[#1162](https://github.com/NVIDIA/spark-rapids/pull/1162)|Set job descriptions in benchmark runner| -|[#1172](https://github.com/NVIDIA/spark-rapids/pull/1172)|Revert "Fix race condition (#1165)"| -|[#1060](https://github.com/NVIDIA/spark-rapids/pull/1060)|Show partition metrics for custom shuffler reader| -|[#1152](https://github.com/NVIDIA/spark-rapids/pull/1152)|Add spark301db shim layer for WindowInPandas.| -|[#1167](https://github.com/NVIDIA/spark-rapids/pull/1167)|Nulls out the dataframe if --gc-between-runs is set| -|[#1165](https://github.com/NVIDIA/spark-rapids/pull/1165)|Fix race condition in SerializeConcatHostBuffersDeserializeBatch| -|[#1163](https://github.com/NVIDIA/spark-rapids/pull/1163)|Add in support for GetStructField| -|[#1166](https://github.com/NVIDIA/spark-rapids/pull/1166)|Fix the cast tests for 3.1.0+| -|[#1159](https://github.com/NVIDIA/spark-rapids/pull/1159)|fix bug where 'now' had same value as 'today' for timestamps| -|[#1161](https://github.com/NVIDIA/spark-rapids/pull/1161)|Fix nightly build pipeline failure.| -|[#1160](https://github.com/NVIDIA/spark-rapids/pull/1160)|Fix some performance problems with columnar to columnar conversion| -|[#1105](https://github.com/NVIDIA/spark-rapids/pull/1105)|[REVIEW] Change ColumnViewAccess usage to work with ColumnView| -|[#1148](https://github.com/NVIDIA/spark-rapids/pull/1148)|Add in tests for Maps and extend map support where possible| -|[#1154](https://github.com/NVIDIA/spark-rapids/pull/1154)|Mark test as xfail until we can get a fix in| -|[#1113](https://github.com/NVIDIA/spark-rapids/pull/1113)|Support unix_timestamp on GPU for subset of formats| -|[#1156](https://github.com/NVIDIA/spark-rapids/pull/1156)|Fix warning introduced in iterator suite| -|[#1095](https://github.com/NVIDIA/spark-rapids/pull/1095)|Dependency info| -|[#1145](https://github.com/NVIDIA/spark-rapids/pull/1145)|Remove support for databricks 7.0 runtime - shim spark300db| -|[#1147](https://github.com/NVIDIA/spark-rapids/pull/1147)|Change the assert to require for handling TIMESTAMP_MILLIS in isDateTimeRebaseNeeded | -|[#1132](https://github.com/NVIDIA/spark-rapids/pull/1132)|Add in basic support to read structs from parquet| -|[#1121](https://github.com/NVIDIA/spark-rapids/pull/1121)|Shuffle/better error handling| -|[#1134](https://github.com/NVIDIA/spark-rapids/pull/1134)|Support saveAsTable for writing orc and parquet| -|[#1124](https://github.com/NVIDIA/spark-rapids/pull/1124)|Add shim layers for GpuWindowInPandasExec.| -|[#1131](https://github.com/NVIDIA/spark-rapids/pull/1131)|Add in some basic support for Structs| -|[#1127](https://github.com/NVIDIA/spark-rapids/pull/1127)|Add in basic support for reading lists from parquet| -|[#1129](https://github.com/NVIDIA/spark-rapids/pull/1129)|Fix resource leaks with new shuffle optimization| -|[#1116](https://github.com/NVIDIA/spark-rapids/pull/1116)|Optimize normal shuffle by coalescing smaller batches on host| -|[#1102](https://github.com/NVIDIA/spark-rapids/pull/1102)|Auto-register UDF extention when main plugin is set| -|[#1108](https://github.com/NVIDIA/spark-rapids/pull/1108)|Remove integration test pipelines on NGCC| -|[#1123](https://github.com/NVIDIA/spark-rapids/pull/1123)|Mark Pandas udf over window tests as xfail on databricks until they can be fixed| -|[#1120](https://github.com/NVIDIA/spark-rapids/pull/1120)|Add in support for filtering ArrayType| -|[#1080](https://github.com/NVIDIA/spark-rapids/pull/1080)|Support for CalendarIntervalType and NullType for ParquetCachedSerializer| -|[#994](https://github.com/NVIDIA/spark-rapids/pull/994)|Packs bounce buffers for highly partitioned shuffles| -|[#1112](https://github.com/NVIDIA/spark-rapids/pull/1112)|Remove bad config from pytest setup| -|[#1107](https://github.com/NVIDIA/spark-rapids/pull/1107)|closeOnExcept -> withResources in MetaUtils| -|[#1104](https://github.com/NVIDIA/spark-rapids/pull/1104)|Support lists to/from the GPU| -|[#1106](https://github.com/NVIDIA/spark-rapids/pull/1106)|Improve mechanism for expected exceptions in tests| -|[#1069](https://github.com/NVIDIA/spark-rapids/pull/1069)|Accelerate the data transfer between JVM and Python for the plan 'GpuWindowInPandasExec'| -|[#1099](https://github.com/NVIDIA/spark-rapids/pull/1099)|Update how we deal with type checking| -|[#1077](https://github.com/NVIDIA/spark-rapids/pull/1077)|Improve AQE transitions for shuffle and coalesce batches| -|[#1097](https://github.com/NVIDIA/spark-rapids/pull/1097)|Cleanup some instances of excess closure serialization| -|[#1090](https://github.com/NVIDIA/spark-rapids/pull/1090)|Fix the integration build| -|[#1086](https://github.com/NVIDIA/spark-rapids/pull/1086)|Speed up test performance using pytest-xdist| -|[#1084](https://github.com/NVIDIA/spark-rapids/pull/1084)|Avoid issues where more scalars that expected show up in an expression| -|[#1076](https://github.com/NVIDIA/spark-rapids/pull/1076)|[FEA] Support Databricks 7.3 LTS Runtime| -|[#1083](https://github.com/NVIDIA/spark-rapids/pull/1083)|Revert "Get cudf/spark dependency from the correct .m2 dir"| -|[#1062](https://github.com/NVIDIA/spark-rapids/pull/1062)|Get cudf/spark dependency from the correct .m2 dir| -|[#1078](https://github.com/NVIDIA/spark-rapids/pull/1078)|Another round of fixes for mapping of DataType to DType| -|[#1066](https://github.com/NVIDIA/spark-rapids/pull/1066)|More fixes for conversion to ColumnarBatch| -|[#1029](https://github.com/NVIDIA/spark-rapids/pull/1029)|BenchmarkRunner should produce JSON summary file even when queries fail| -|[#1055](https://github.com/NVIDIA/spark-rapids/pull/1055)|Fix build warnings| -|[#1064](https://github.com/NVIDIA/spark-rapids/pull/1064)|Use array instead of List for from(Table, DataType)| -|[#1057](https://github.com/NVIDIA/spark-rapids/pull/1057)|Fix empty table broadcast requiring a GPU on driver node| -|[#1047](https://github.com/NVIDIA/spark-rapids/pull/1047)|Sanity checks for cudf jar mismatch| -|[#1044](https://github.com/NVIDIA/spark-rapids/pull/1044)|Accelerated row to columnar and columnar to row transitions| -|[#1056](https://github.com/NVIDIA/spark-rapids/pull/1056)|Add query number to Spark app name when running benchmarks| -|[#1054](https://github.com/NVIDIA/spark-rapids/pull/1054)|Log total RMM allocated on GPU OOM| -|[#1053](https://github.com/NVIDIA/spark-rapids/pull/1053)|Remove isGpuBroadcastNestedLoopJoin from shims| -|[#1052](https://github.com/NVIDIA/spark-rapids/pull/1052)|Allow for GPUCoalesceBatch to deal with Map| -|[#1051](https://github.com/NVIDIA/spark-rapids/pull/1051)|Add simple retry for URM dependencies [skip ci]| -|[#1046](https://github.com/NVIDIA/spark-rapids/pull/1046)|Fix broken links| -|[#1017](https://github.com/NVIDIA/spark-rapids/pull/1017)|Log whether PTDS is enabled| -|[#1040](https://github.com/NVIDIA/spark-rapids/pull/1040)|Update to cudf 0.17-SNAPSHOT and fix tests| -|[#1042](https://github.com/NVIDIA/spark-rapids/pull/1042)|Fix inconsistencies in AQE support for broadcast joins| -|[#1037](https://github.com/NVIDIA/spark-rapids/pull/1037)|Add in support for the SQL functions Least and Greatest| -|[#1036](https://github.com/NVIDIA/spark-rapids/pull/1036)|Increase number of retries when waiting for databricks cluster| -|[#1034](https://github.com/NVIDIA/spark-rapids/pull/1034)|[BUG] To honor spark.rapids.memory.gpu.pool=NONE| -|[#854](https://github.com/NVIDIA/spark-rapids/pull/854)|Arbitrary function call in UDF| -|[#1028](https://github.com/NVIDIA/spark-rapids/pull/1028)|Update to cudf-0.16| -|[#1023](https://github.com/NVIDIA/spark-rapids/pull/1023)|Add --gc-between-run flag for TPC* benchmarks.| -|[#1001](https://github.com/NVIDIA/spark-rapids/pull/1001)|ColumnarBatch to CachedBatch and back| -|[#990](https://github.com/NVIDIA/spark-rapids/pull/990)|Parquet coalesce file reader for local filesystems| -|[#1014](https://github.com/NVIDIA/spark-rapids/pull/1014)|Add --append-dat flag for TPC-DS benchmark| -|[#991](https://github.com/NVIDIA/spark-rapids/pull/991)|Updated GCP Dataproc Mortgage-ETL-GPU.ipynb| -|[#886](https://github.com/NVIDIA/spark-rapids/pull/886)|Spark BinaryType and cast to BinaryType| -|[#1016](https://github.com/NVIDIA/spark-rapids/pull/1016)|Change Hash Aggregate to allow pass-through on MapType| -|[#984](https://github.com/NVIDIA/spark-rapids/pull/984)|Add support for MapType in selected operators | -|[#1012](https://github.com/NVIDIA/spark-rapids/pull/1012)|Update for new position parameter in Spark 3.1.0 RegExpReplace| -|[#995](https://github.com/NVIDIA/spark-rapids/pull/995)|Add shim for EMR 3.0.1 and EMR 3.0.1-SNAPSHOT| -|[#998](https://github.com/NVIDIA/spark-rapids/pull/998)|Update benchmark automation script| -|[#1000](https://github.com/NVIDIA/spark-rapids/pull/1000)|Always use RAPIDS shuffle when running TPCH and Mortgage tests| -|[#981](https://github.com/NVIDIA/spark-rapids/pull/981)|Change databricks build to dynamically create a cluster| -|[#986](https://github.com/NVIDIA/spark-rapids/pull/986)|Fix missing dataSize metric when using RAPIDS shuffle| -|[#914](https://github.com/NVIDIA/spark-rapids/pull/914)|Write InternalRow to CachedBatch| -|[#934](https://github.com/NVIDIA/spark-rapids/pull/934)|Iterator to make it easier to work with a window of blocks in the RAPIDS shuffle| -|[#992](https://github.com/NVIDIA/spark-rapids/pull/992)|Skip post-clean if aborted before the image build stage in pre-merge [skip ci]| -|[#988](https://github.com/NVIDIA/spark-rapids/pull/988)|Change in Spark caused the 3.1.0 CI to fail| -|[#983](https://github.com/NVIDIA/spark-rapids/pull/983)|clean jenkins file for premerge on NGCC| -|[#964](https://github.com/NVIDIA/spark-rapids/pull/964)|Refactor TPC benchmarks to reduce duplicate code| -|[#978](https://github.com/NVIDIA/spark-rapids/pull/978)|Enable scalastyle checks for udf-compiler module| -|[#949](https://github.com/NVIDIA/spark-rapids/pull/949)|Fix GpuWindowExec to work with a CPU SortExec| -|[#973](https://github.com/NVIDIA/spark-rapids/pull/973)|Stop reporting totalTime metric for GpuShuffleExchangeExec| -|[#968](https://github.com/NVIDIA/spark-rapids/pull/968)|XFail pos_explode tests until final fix can be put in| -|[#970](https://github.com/NVIDIA/spark-rapids/pull/970)|Add legacy config to clear active Spark 3.1.0 session in tests| -|[#918](https://github.com/NVIDIA/spark-rapids/pull/918)|Benchmark runner script| -|[#915](https://github.com/NVIDIA/spark-rapids/pull/915)|Add option to control number of partitions when converting from CSV to Parquet| -|[#944](https://github.com/NVIDIA/spark-rapids/pull/944)|Fix some issues with non-determinism| -|[#935](https://github.com/NVIDIA/spark-rapids/pull/935)|Add in support/tests for a window count on a column| -|[#940](https://github.com/NVIDIA/spark-rapids/pull/940)|Fix closeOnExcept suppressed exception handling| -|[#942](https://github.com/NVIDIA/spark-rapids/pull/942)|fix github action env setup [skip ci]| -|[#933](https://github.com/NVIDIA/spark-rapids/pull/933)|Update first/last tests to avoid non-determinisim and ordering differences| -|[#931](https://github.com/NVIDIA/spark-rapids/pull/931)|Fix checking for nullable columns in window range query| -|[#924](https://github.com/NVIDIA/spark-rapids/pull/924)|Benchmark guide update for command-line interface / spark-submit| -|[#926](https://github.com/NVIDIA/spark-rapids/pull/926)|Move pandas_udf functions into the tests functions| -|[#929](https://github.com/NVIDIA/spark-rapids/pull/929)|Pick a default tableId to use that is non 0 so that flatbuffers allow…| -|[#928](https://github.com/NVIDIA/spark-rapids/pull/928)|Fix RapidsBufferStore NPE when no spillable buffers are available| -|[#820](https://github.com/NVIDIA/spark-rapids/pull/820)|Benchmarking guide| -|[#859](https://github.com/NVIDIA/spark-rapids/pull/859)|Compare partitioned files in order| -|[#916](https://github.com/NVIDIA/spark-rapids/pull/916)|create new sparkContext explicitly in CPU notebook| -|[#917](https://github.com/NVIDIA/spark-rapids/pull/917)|create new SparkContext in GPU notebook explicitly.| -|[#919](https://github.com/NVIDIA/spark-rapids/pull/919)|Add label benchmark to performance subsection in changelog| -|[#850](https://github.com/NVIDIA/spark-rapids/pull/850)| Add in basic support for lead/lag| -|[#843](https://github.com/NVIDIA/spark-rapids/pull/843)|[REVIEW] Cache plugin to handle reading CachedBatch to an InternalRow| -|[#904](https://github.com/NVIDIA/spark-rapids/pull/904)|Add command-line argument for benchmark result filename| -|[#909](https://github.com/NVIDIA/spark-rapids/pull/909)|GCP preview version image name update| -|[#903](https://github.com/NVIDIA/spark-rapids/pull/903)|update getting-started-gcp.md with new component list| -|[#900](https://github.com/NVIDIA/spark-rapids/pull/900)|Turn off CollectLimitExec replacement by default| -|[#907](https://github.com/NVIDIA/spark-rapids/pull/907)|remove configs from databricks that shouldn't be used by default| -|[#893](https://github.com/NVIDIA/spark-rapids/pull/893)|Fix rounding error when casting timestamp to string for timestamps before 1970| -|[#899](https://github.com/NVIDIA/spark-rapids/pull/899)|Mark reduction corner case tests as xfail on databricks until they can be fixed| -|[#894](https://github.com/NVIDIA/spark-rapids/pull/894)|Replace whole-buffer slicing with direct refcounting| -|[#891](https://github.com/NVIDIA/spark-rapids/pull/891)|Add config to dump heap on GPU OOM| -|[#890](https://github.com/NVIDIA/spark-rapids/pull/890)|Clean up CoalesceBatch to use withResource| -|[#892](https://github.com/NVIDIA/spark-rapids/pull/892)|Only manifest the current batch in cached block shuffle read iterator| -|[#871](https://github.com/NVIDIA/spark-rapids/pull/871)|Add support for using the arena allocator| -|[#889](https://github.com/NVIDIA/spark-rapids/pull/889)|Fix crash on scalar only orderby| -|[#879](https://github.com/NVIDIA/spark-rapids/pull/879)|Update SpillableColumnarBatch to remove buffer from catalog on close| -|[#888](https://github.com/NVIDIA/spark-rapids/pull/888)|Shrink detect scope to compile only [skip ci]| -|[#885](https://github.com/NVIDIA/spark-rapids/pull/885)|[BUG] fix IT dockerfile arguments [skip ci]| -|[#883](https://github.com/NVIDIA/spark-rapids/pull/883)|[BUG] fix IT dockerfile args ordering [skip ci]| -|[#875](https://github.com/NVIDIA/spark-rapids/pull/875)|fix the non-consistency for `spark.rapids.sql.format.parquet.multiThreadedRead` in RapidsConf.scala| -|[#862](https://github.com/NVIDIA/spark-rapids/pull/862)|Migrate nightly&integration pipelines to blossom [skip ci]| -|[#872](https://github.com/NVIDIA/spark-rapids/pull/872)|Ensure that receive-side batches use GpuColumnVectorFromBuffer to avoid| -|[#833](https://github.com/NVIDIA/spark-rapids/pull/833)|Add nvcomp LZ4 codec support| -|[#870](https://github.com/NVIDIA/spark-rapids/pull/870)|Cleaned up tests and documentation for csv timestamp parsing| -|[#823](https://github.com/NVIDIA/spark-rapids/pull/823)|Add command-line interface for TPC-* for use with spark-submit| -|[#856](https://github.com/NVIDIA/spark-rapids/pull/856)|Move GpuWindowInPandasExec in shims layers| -|[#756](https://github.com/NVIDIA/spark-rapids/pull/756)|Add stream-time metric| -|[#832](https://github.com/NVIDIA/spark-rapids/pull/832)|Skip pandas tests if pandas cannot be found| -|[#841](https://github.com/NVIDIA/spark-rapids/pull/841)|Fix a hanging issue when processing empty data.| -|[#840](https://github.com/NVIDIA/spark-rapids/pull/840)|[REVIEW] Fixed failing cache tests| -|[#848](https://github.com/NVIDIA/spark-rapids/pull/848)|Update task memory and disk spill metrics when buffer store spills| -|[#851](https://github.com/NVIDIA/spark-rapids/pull/851)|Use contiguous table when deserializing columnar batch| -|[#857](https://github.com/NVIDIA/spark-rapids/pull/857)|fix pvc scheduling issue| -|[#853](https://github.com/NVIDIA/spark-rapids/pull/853)|Remove nodeAffinity from premerge pipeline| -|[#796](https://github.com/NVIDIA/spark-rapids/pull/796)|Record spark plan SQL metrics to JSON when running benchmarks| -|[#781](https://github.com/NVIDIA/spark-rapids/pull/781)|Add AQE unit tests| -|[#824](https://github.com/NVIDIA/spark-rapids/pull/824)|Skip cudf_udf test by default| -|[#839](https://github.com/NVIDIA/spark-rapids/pull/839)|First/Last reduction and cleanup of agg APIs| -|[#827](https://github.com/NVIDIA/spark-rapids/pull/827)|Add Spark 3.0 EMR Shim layer | -|[#816](https://github.com/NVIDIA/spark-rapids/pull/816)|[BUG] fix nightly is timing out| -|[#782](https://github.com/NVIDIA/spark-rapids/pull/782)|Benchmark utility to perform diff of output from benchmark runs, allowing for precision differences| -|[#813](https://github.com/NVIDIA/spark-rapids/pull/813)|Revert "Enable tests in udf_cudf_test.py"| -|[#788](https://github.com/NVIDIA/spark-rapids/pull/788)|[FEA] Persist workspace data on PVC for premerge| -|[#805](https://github.com/NVIDIA/spark-rapids/pull/805)|[FEA] nightly build trigger both IT on spark 300 and 301| -|[#797](https://github.com/NVIDIA/spark-rapids/pull/797)|Allow host spill store to fit a buffer larger than configured max size| -|[#807](https://github.com/NVIDIA/spark-rapids/pull/807)|Deploy integration-tests javadoc and sources| -|[#777](https://github.com/NVIDIA/spark-rapids/pull/777)|Enable tests in udf_cudf_test.py| -|[#790](https://github.com/NVIDIA/spark-rapids/pull/790)|CI: Update cudf python to 0.16 nightly| -|[#772](https://github.com/NVIDIA/spark-rapids/pull/772)|Add support for empty array construction.| -|[#783](https://github.com/NVIDIA/spark-rapids/pull/783)|Improved GpuArrowEvalPythonExec| -|[#771](https://github.com/NVIDIA/spark-rapids/pull/771)|Various improvements to benchmarks| -|[#763](https://github.com/NVIDIA/spark-rapids/pull/763)|[REVIEW] Allow CoalesceBatch to spill data that is not in active use| -|[#727](https://github.com/NVIDIA/spark-rapids/pull/727)|Update cudf dependency to 0.16-SNAPSHOT| -|[#726](https://github.com/NVIDIA/spark-rapids/pull/726)|parquet writer support for TIMESTAMP_MILLIS| -|[#674](https://github.com/NVIDIA/spark-rapids/pull/674)|Unit test for GPU exchange re-use with AQE| -|[#723](https://github.com/NVIDIA/spark-rapids/pull/723)|Update code coverage to find source files in new places| -|[#766](https://github.com/NVIDIA/spark-rapids/pull/766)|Update the integration Dockerfile to reduce the image size| -|[#762](https://github.com/NVIDIA/spark-rapids/pull/762)|Fixing conflicts in branch-0.3| -|[#738](https://github.com/NVIDIA/spark-rapids/pull/738)|[auto-merge] branch-0.2 to branch-0.3 - resolve conflict| -|[#722](https://github.com/NVIDIA/spark-rapids/pull/722)|Initial code changes to support spilling outside of shuffle| -|[#693](https://github.com/NVIDIA/spark-rapids/pull/693)|Update jenkins files for 0.3| -|[#692](https://github.com/NVIDIA/spark-rapids/pull/692)|Merge shims dependency to spark-3.0.1 into branch-0.3| -|[#690](https://github.com/NVIDIA/spark-rapids/pull/690)|Update the version to 0.3.0-SNAPSHOT| - -## Release 0.2 - -### Features -||| -|:---|:---| -|[#696](https://github.com/NVIDIA/spark-rapids/issues/696)|[FEA] run integration tests against SPARK-3.0.1| -|[#455](https://github.com/NVIDIA/spark-rapids/issues/455)|[FEA] Support UCX shuffle with optimized AQE| -|[#510](https://github.com/NVIDIA/spark-rapids/issues/510)|[FEA] Investigate libcudf features needed to support struct schema pruning during loads| -|[#541](https://github.com/NVIDIA/spark-rapids/issues/541)|[FEA] Scala UDF:Support for null Value operands| -|[#542](https://github.com/NVIDIA/spark-rapids/issues/542)|[FEA] Scala UDF: Support for Date and Time | -|[#499](https://github.com/NVIDIA/spark-rapids/issues/499)|[FEA] disable any kind of warnings about ExecutedCommandExec not being on the GPU| -|[#540](https://github.com/NVIDIA/spark-rapids/issues/540)|[FEA] Scala UDF: Support for String replaceFirst()| -|[#340](https://github.com/NVIDIA/spark-rapids/issues/340)|[FEA] widen the rendered Jekyll pages| -|[#602](https://github.com/NVIDIA/spark-rapids/issues/602)|[FEA] don't release with any -SNAPSHOT dependencies| -|[#579](https://github.com/NVIDIA/spark-rapids/issues/579)|[FEA] Auto-merge between branches| -|[#515](https://github.com/NVIDIA/spark-rapids/issues/515)|[FEA] Write tests for AQE skewed join optimization| -|[#452](https://github.com/NVIDIA/spark-rapids/issues/452)|[FEA] Update HashSortOptimizerSuite to work with AQE| -|[#454](https://github.com/NVIDIA/spark-rapids/issues/454)|[FEA] Update GpuCoalesceBatchesSuite to work with AQE enabled| -|[#354](https://github.com/NVIDIA/spark-rapids/issues/354)|[FEA]Spark 3.1 FileSourceScanExec adds parameter optionalNumCoalescedBuckets| -|[#566](https://github.com/NVIDIA/spark-rapids/issues/566)|[FEA] Add support for StringSplit with an array index.| -|[#524](https://github.com/NVIDIA/spark-rapids/issues/524)|[FEA] Add GPU specific metrics to GpuFileSourceScanExec| -|[#494](https://github.com/NVIDIA/spark-rapids/issues/494)|[FEA] Add some AQE-specific tests to the PySpark test suite| -|[#146](https://github.com/NVIDIA/spark-rapids/issues/146)|[FEA] Python tests should support running with Adaptive Query Execution enabled| -|[#465](https://github.com/NVIDIA/spark-rapids/issues/465)|[FEA] Audit: Update script to audit multiple versions of Spark | -|[#488](https://github.com/NVIDIA/spark-rapids/issues/488)|[FEA] Ability to limit total GPU memory used| -|[#70](https://github.com/NVIDIA/spark-rapids/issues/70)|[FEA] Support StringSplit| -|[#403](https://github.com/NVIDIA/spark-rapids/issues/403)|[FEA] Add in support for GetArrayItem| -|[#493](https://github.com/NVIDIA/spark-rapids/issues/493)|[FEA] Implement shuffle optimization when AQE is enabled| -|[#500](https://github.com/NVIDIA/spark-rapids/issues/500)|[FEA] Add maven profiles for testing with AQE on or off| -|[#471](https://github.com/NVIDIA/spark-rapids/issues/471)|[FEA] create a formal process for updating the github-pages branch| -|[#233](https://github.com/NVIDIA/spark-rapids/issues/233)|[FEA] Audit DataWritingCommandExec | -|[#240](https://github.com/NVIDIA/spark-rapids/issues/240)|[FEA] Audit Api validation script follow on - Optimize StringToTypeTag | -|[#388](https://github.com/NVIDIA/spark-rapids/issues/388)|[FEA] Audit WindowExec| -|[#425](https://github.com/NVIDIA/spark-rapids/issues/425)|[FEA] Add tests for configs in BatchScan Readers| -|[#453](https://github.com/NVIDIA/spark-rapids/issues/453)|[FEA] Update HashAggregatesSuite to work with AQE| -|[#184](https://github.com/NVIDIA/spark-rapids/issues/184)|[FEA] Enable NoScalaDoc scalastyle rule| -|[#438](https://github.com/NVIDIA/spark-rapids/issues/438)|[FEA] Enable StringLPad| -|[#232](https://github.com/NVIDIA/spark-rapids/issues/232)|[FEA] Audit SortExec | -|[#236](https://github.com/NVIDIA/spark-rapids/issues/236)|[FEA] Audit ShuffleExchangeExec | -|[#355](https://github.com/NVIDIA/spark-rapids/issues/355)|[FEA] Support Multiple Spark versions in the same jar| -|[#385](https://github.com/NVIDIA/spark-rapids/issues/385)|[FEA] Support RangeExec on the GPU| -|[#317](https://github.com/NVIDIA/spark-rapids/issues/317)|[FEA] Write test wrapper to run SQL queries via pyspark| -|[#235](https://github.com/NVIDIA/spark-rapids/issues/235)|[FEA] Audit BroadcastExchangeExec| -|[#234](https://github.com/NVIDIA/spark-rapids/issues/234)|[FEA] Audit BatchScanExec| -|[#238](https://github.com/NVIDIA/spark-rapids/issues/238)|[FEA] Audit ShuffledHashJoinExec | -|[#237](https://github.com/NVIDIA/spark-rapids/issues/237)|[FEA] Audit BroadcastHashJoinExec | -|[#316](https://github.com/NVIDIA/spark-rapids/issues/316)|[FEA] Add some basic Dataframe tests for CoalesceExec| -|[#145](https://github.com/NVIDIA/spark-rapids/issues/145)|[FEA] Scala tests should support running with Adaptive Query Execution enabled| -|[#231](https://github.com/NVIDIA/spark-rapids/issues/231)|[FEA] Audit ProjectExec | -|[#229](https://github.com/NVIDIA/spark-rapids/issues/229)|[FEA] Audit FileSourceScanExec | - -### Performance -||| -|:---|:---| -|[#326](https://github.com/NVIDIA/spark-rapids/issues/326)|[DISCUSS] Shuffle read-side error handling| -|[#601](https://github.com/NVIDIA/spark-rapids/issues/601)|[FEA] Optimize unnecessary sorts when replacing SortAggregate| -|[#333](https://github.com/NVIDIA/spark-rapids/issues/333)|[FEA] Better handling of reading lots of small Parquet files| -|[#511](https://github.com/NVIDIA/spark-rapids/issues/511)|[FEA] Connect shuffle table compression to shuffle exec metrics| -|[#15](https://github.com/NVIDIA/spark-rapids/issues/15)|[FEA] Multiple threads sharing the same GPU| -|[#272](https://github.com/NVIDIA/spark-rapids/issues/272)|[DOC] Getting started guide for UCX shuffle| - -### Bugs Fixed -||| -|:---|:---| -|[#780](https://github.com/NVIDIA/spark-rapids/issues/780)|[BUG] Inner Join dropping data with bucketed Table input| -|[#569](https://github.com/NVIDIA/spark-rapids/issues/569)|[BUG] left_semi_join operation is abnormal and serious time-consuming| -|[#744](https://github.com/NVIDIA/spark-rapids/issues/744)|[BUG] TPC-DS query 6 now produces incorrect results.| -|[#718](https://github.com/NVIDIA/spark-rapids/issues/718)|[BUG] GpuBroadcastHashJoinExec ArrayIndexOutOfBoundsException| -|[#698](https://github.com/NVIDIA/spark-rapids/issues/698)|[BUG] batch coalesce can fail to appear between columnar shuffle and subsequent columnar operation| -|[#658](https://github.com/NVIDIA/spark-rapids/issues/658)|[BUG] GpuCoalesceBatches collectTime metric can be underreported| -|[#59](https://github.com/NVIDIA/spark-rapids/issues/59)|[BUG] enable tests for string literals in a select| -|[#486](https://github.com/NVIDIA/spark-rapids/issues/486)|[BUG] GpuWindowExec does not implement requiredChildOrdering| -|[#631](https://github.com/NVIDIA/spark-rapids/issues/631)|[BUG] Rows are dropped when AQE is enabled in some cases| -|[#671](https://github.com/NVIDIA/spark-rapids/issues/671)|[BUG] Databricks hash_aggregate_test fails trying to canonicalize a WrappedAggFunction| -|[#218](https://github.com/NVIDIA/spark-rapids/issues/218)|[BUG] Window function COUNT(x) includes null-values, when it shouldn't| -|[#153](https://github.com/NVIDIA/spark-rapids/issues/153)|[BUG] Incorrect output from partial-only hash aggregates with multiple distincts and non-distinct functions| -|[#656](https://github.com/NVIDIA/spark-rapids/issues/656)|[BUG] integration tests produce hive metadata files| -|[#607](https://github.com/NVIDIA/spark-rapids/issues/607)|[BUG] Fix misleading "cannot run on GPU" warnings when AQE is enabled| -|[#630](https://github.com/NVIDIA/spark-rapids/issues/630)|[BUG] GpuCustomShuffleReader metrics always show zero rows/batches output| -|[#643](https://github.com/NVIDIA/spark-rapids/issues/643)|[BUG] race condition while registering a buffer and spilling at the same time| -|[#606](https://github.com/NVIDIA/spark-rapids/issues/606)|[BUG] Multiple scans for same data source with TPC-DS query59 with delta format| -|[#626](https://github.com/NVIDIA/spark-rapids/issues/626)|[BUG] parquet_test showing leaked memory buffer| -|[#155](https://github.com/NVIDIA/spark-rapids/issues/155)|[BUG] Incorrect output from averages with filters in partial only mode| -|[#277](https://github.com/NVIDIA/spark-rapids/issues/277)|[BUG] HashAggregateSuite failure when AQE is enabled| -|[#276](https://github.com/NVIDIA/spark-rapids/issues/276)|[BUG] GpuCoalesceBatchSuite failure when AQE is enabled| -|[#598](https://github.com/NVIDIA/spark-rapids/issues/598)|[BUG] Non-deterministic output from MapOutputTracker.getStatistics() with AQE on GPU| -|[#192](https://github.com/NVIDIA/spark-rapids/issues/192)|[BUG] test_read_merge_schema fails on Databricks| -|[#341](https://github.com/NVIDIA/spark-rapids/issues/341)|[BUG] Document compression formats for readers/writers| -|[#587](https://github.com/NVIDIA/spark-rapids/issues/587)|[BUG] Spark3.1 changed FileScan which means or GpuScans need to be added to shim layer| -|[#362](https://github.com/NVIDIA/spark-rapids/issues/362)|[BUG] Implement getReaderForRange in the RapidsShuffleManager| -|[#528](https://github.com/NVIDIA/spark-rapids/issues/528)|[BUG] HashAggregateSuite "Avg Distinct with filter" no longer valid when testing against Spark 3.1.0| -|[#416](https://github.com/NVIDIA/spark-rapids/issues/416)|[BUG] Fix Spark 3.1.0 integration tests| -|[#556](https://github.com/NVIDIA/spark-rapids/issues/556)|[BUG] NPE when removing shuffle| -|[#553](https://github.com/NVIDIA/spark-rapids/issues/553)|[BUG] GpuColumnVector build warnings from raw type access| -|[#492](https://github.com/NVIDIA/spark-rapids/issues/492)|[BUG] Re-enable AQE integration tests| -|[#275](https://github.com/NVIDIA/spark-rapids/issues/275)|[BUG] TpchLike query 2 fails when AQE is enabled| -|[#508](https://github.com/NVIDIA/spark-rapids/issues/508)|[BUG] GpuUnion publishes metrics on the UI that are all 0| -|[#269](https://github.com/NVIDIA/spark-rapids/issues/269)|Needed to add `--conf spark.driver.extraClassPath=` | -|[#473](https://github.com/NVIDIA/spark-rapids/issues/473)|[BUG] PartMerge:countDistinct:sum fails sporadically| -|[#531](https://github.com/NVIDIA/spark-rapids/issues/531)|[BUG] Temporary RMM workaround needs to be removed| -|[#532](https://github.com/NVIDIA/spark-rapids/issues/532)|[BUG] NPE when enabling shuffle manager| -|[#525](https://github.com/NVIDIA/spark-rapids/issues/525)|[BUG] GpuFilterExec reports incorrect nullability of output in some cases| -|[#483](https://github.com/NVIDIA/spark-rapids/issues/483)|[BUG] Multiple scans for the same parquet data source| -|[#382](https://github.com/NVIDIA/spark-rapids/issues/382)|[BUG] Spark3.1 StringFallbackSuite regexp_replace null cpu fall back test fails.| -|[#489](https://github.com/NVIDIA/spark-rapids/issues/489)|[FEA] Fix Spark 3.1 GpuHashJoin since it now requires CodegenSupport| -|[#441](https://github.com/NVIDIA/spark-rapids/issues/441)|[BUG] test_broadcast_nested_loop_join_special_case fails on databricks| -|[#347](https://github.com/NVIDIA/spark-rapids/issues/347)|[BUG] Failed to read Parquet file generated by GPU-enabled Spark.| -|[#433](https://github.com/NVIDIA/spark-rapids/issues/433)|`InSet` operator produces an error for Strings| -|[#144](https://github.com/NVIDIA/spark-rapids/issues/144)|[BUG] spark.sql.legacy.parquet.datetimeRebaseModeInWrite is ignored| -|[#323](https://github.com/NVIDIA/spark-rapids/issues/323)|[BUG] GpuBroadcastNestedLoopJoinExec can fail if there are no columns| -|[#356](https://github.com/NVIDIA/spark-rapids/issues/356)|[BUG] Integration cache test for BroadcastNestedLoopJoin failure| -|[#280](https://github.com/NVIDIA/spark-rapids/issues/280)|[BUG] Full Outer Join does not work on nullable keys| -|[#149](https://github.com/NVIDIA/spark-rapids/issues/149)|[BUG] Spark driver fails to load native libs when running on node without CUDA| - -### PRs -||| -|:---|:---| -|[#826](https://github.com/NVIDIA/spark-rapids/pull/826)|Fix link to cudf-0.15-cuda11.jar| -|[#815](https://github.com/NVIDIA/spark-rapids/pull/815)|Update documentation for Scala UDFs in 0.2 since you need two things| -|[#802](https://github.com/NVIDIA/spark-rapids/pull/802)|Update 0.2 CHANGELOG| -|[#793](https://github.com/NVIDIA/spark-rapids/pull/793)|Update Jenkins scripts for release| -|[#798](https://github.com/NVIDIA/spark-rapids/pull/798)|Fix shims provider override config not being seen by executors| -|[#785](https://github.com/NVIDIA/spark-rapids/pull/785)|Make shuffle run on CPU if we do a join where we read from bucketed table| -|[#765](https://github.com/NVIDIA/spark-rapids/pull/765)|Add config to override shims provider class| -|[#759](https://github.com/NVIDIA/spark-rapids/pull/759)|Add CHANGELOG for release 0.2| -|[#758](https://github.com/NVIDIA/spark-rapids/pull/758)|Skip the udf test fails periodically.| -|[#752](https://github.com/NVIDIA/spark-rapids/pull/752)|Fix snapshot plugin jar version in docs| -|[#751](https://github.com/NVIDIA/spark-rapids/pull/751)|Correct the channel for cudf installation| -|[#754](https://github.com/NVIDIA/spark-rapids/pull/754)|Filter nulls from joins where possible to improve performance| -|[#732](https://github.com/NVIDIA/spark-rapids/pull/732)|Add a timeout for RapidsShuffleIterator to prevent jobs to hang infin…| -|[#637](https://github.com/NVIDIA/spark-rapids/pull/637)|Documentation changes for 0.2 release | -|[#747](https://github.com/NVIDIA/spark-rapids/pull/747)|Disable udf tests that fail periodically| -|[#745](https://github.com/NVIDIA/spark-rapids/pull/745)|Revert Null Join Filter| -|[#741](https://github.com/NVIDIA/spark-rapids/pull/741)|Fix issue with parquet partitioned reads| -|[#733](https://github.com/NVIDIA/spark-rapids/pull/733)|Remove GPU Types from github| -|[#720](https://github.com/NVIDIA/spark-rapids/pull/720)|Stop removing GpuCoalesceBatches from non-AQE queries when AQE is enabled| -|[#729](https://github.com/NVIDIA/spark-rapids/pull/729)|Fix collect time metric in CoalesceBatches| -|[#640](https://github.com/NVIDIA/spark-rapids/pull/640)|Support running Pandas UDFs on GPUs in Python processes.| -|[#721](https://github.com/NVIDIA/spark-rapids/pull/721)|Add some more checks to databricks build scripts| -|[#714](https://github.com/NVIDIA/spark-rapids/pull/714)|Move spark 3.0.1-shims out of snapshot-shims| -|[#711](https://github.com/NVIDIA/spark-rapids/pull/711)|fix blossom checkout repo| -|[#709](https://github.com/NVIDIA/spark-rapids/pull/709)|[BUG] fix unexpected indentation issue in blossom yml| -|[#642](https://github.com/NVIDIA/spark-rapids/pull/642)|Init workflow for blossom-ci| -|[#705](https://github.com/NVIDIA/spark-rapids/pull/705)|Enable configuration check for cast string to timestamp| -|[#702](https://github.com/NVIDIA/spark-rapids/pull/702)|Update slack channel for Jenkins builds| -|[#701](https://github.com/NVIDIA/spark-rapids/pull/701)|fix checkout-ref for automerge| -|[#695](https://github.com/NVIDIA/spark-rapids/pull/695)|Fix spark-3.0.1 shim to be released| -|[#668](https://github.com/NVIDIA/spark-rapids/pull/668)|refactor automerge to support merge for protected branch| -|[#687](https://github.com/NVIDIA/spark-rapids/pull/687)|Include the UDF compiler in the dist jar| -|[#689](https://github.com/NVIDIA/spark-rapids/pull/689)|Change shims dependency to spark-3.0.1| -|[#677](https://github.com/NVIDIA/spark-rapids/pull/677)|Use multi-threaded parquet read with small files| -|[#638](https://github.com/NVIDIA/spark-rapids/pull/638)|Add Parquet-based cache serializer| -|[#613](https://github.com/NVIDIA/spark-rapids/pull/613)|Enable UCX + AQE| -|[#684](https://github.com/NVIDIA/spark-rapids/pull/684)|Enable test for literal string values in a select| -|[#686](https://github.com/NVIDIA/spark-rapids/pull/686)|Remove sorts when replacing sort aggregate if possible| -|[#675](https://github.com/NVIDIA/spark-rapids/pull/675)|Added TimeAdd| -|[#645](https://github.com/NVIDIA/spark-rapids/pull/645)|[window] Add GpuWindowExec requiredChildOrdering| -|[#676](https://github.com/NVIDIA/spark-rapids/pull/676)|fixUpJoinConsistency rule now works when AQE is enabled| -|[#683](https://github.com/NVIDIA/spark-rapids/pull/683)|Fix issues with cannonicalization of WrappedAggFunction| -|[#682](https://github.com/NVIDIA/spark-rapids/pull/682)|Fix path to start-slave.sh script in docs| -|[#673](https://github.com/NVIDIA/spark-rapids/pull/673)|Increase build timeouts on nightly and premerge builds| -|[#648](https://github.com/NVIDIA/spark-rapids/pull/648)|add signoff-check use github actions| -|[#593](https://github.com/NVIDIA/spark-rapids/pull/593)|Add support for isNaN and datetime related instructions in UDF compiler| -|[#666](https://github.com/NVIDIA/spark-rapids/pull/666)|[window] Disable GPU for COUNT(exp) queries| -|[#655](https://github.com/NVIDIA/spark-rapids/pull/655)|Implement AQE unit test for InsertAdaptiveSparkPlan| -|[#614](https://github.com/NVIDIA/spark-rapids/pull/614)|Fix for aggregation with multiple distinct and non distinct functions| -|[#657](https://github.com/NVIDIA/spark-rapids/pull/657)|Fix verify build after integration tests are run| -|[#660](https://github.com/NVIDIA/spark-rapids/pull/660)|Add in neverReplaceExec and several rules for it| -|[#639](https://github.com/NVIDIA/spark-rapids/pull/639)|BooleanType test shouldn't xfail| -|[#652](https://github.com/NVIDIA/spark-rapids/pull/652)|Mark UVM config as internal until supported| -|[#653](https://github.com/NVIDIA/spark-rapids/pull/653)|Move to the cudf-0.15 release| -|[#647](https://github.com/NVIDIA/spark-rapids/pull/647)|Improve warnings about AQE nodes not supported on GPU| -|[#646](https://github.com/NVIDIA/spark-rapids/pull/646)|Stop reporting zero metrics for GpuCustomShuffleReader| -|[#644](https://github.com/NVIDIA/spark-rapids/pull/644)|Small fix for race in catalog where a buffer could get spilled while …| -|[#623](https://github.com/NVIDIA/spark-rapids/pull/623)|Fix issues with canonicalization| -|[#599](https://github.com/NVIDIA/spark-rapids/pull/599)|[FEA] changelog generator| -|[#563](https://github.com/NVIDIA/spark-rapids/pull/563)|cudf and spark version info in artifacts| -|[#633](https://github.com/NVIDIA/spark-rapids/pull/633)|Fix leak if RebaseHelper throws during Parquet read| -|[#632](https://github.com/NVIDIA/spark-rapids/pull/632)|Copy function isSearchableType from Spark because signature changed in 3.0.1| -|[#583](https://github.com/NVIDIA/spark-rapids/pull/583)|Add udf compiler unit tests| -|[#617](https://github.com/NVIDIA/spark-rapids/pull/617)|Documentation updates for branch 0.2| -|[#616](https://github.com/NVIDIA/spark-rapids/pull/616)|Add config to reserve GPU memory| -|[#612](https://github.com/NVIDIA/spark-rapids/pull/612)|[REVIEW] Fix incorrect output from averages with filters in partial only mode| -|[#609](https://github.com/NVIDIA/spark-rapids/pull/609)|fix minor issues with instructions for building ucx| -|[#611](https://github.com/NVIDIA/spark-rapids/pull/611)|Added in profile to enable shims for SNAPSHOT releases| -|[#595](https://github.com/NVIDIA/spark-rapids/pull/595)|Parquet small file reading optimization| -|[#582](https://github.com/NVIDIA/spark-rapids/pull/582)|fix #579 Auto-merge between branches| -|[#536](https://github.com/NVIDIA/spark-rapids/pull/536)|Add test for skewed join optimization when AQE is enabled| -|[#603](https://github.com/NVIDIA/spark-rapids/pull/603)|Fix data size metric always 0 when using RAPIDS shuffle| -|[#600](https://github.com/NVIDIA/spark-rapids/pull/600)|Fix calculation of string data for compressed batches| -|[#597](https://github.com/NVIDIA/spark-rapids/pull/597)|Remove the xfail for parquet test_read_merge_schema on Databricks| -|[#591](https://github.com/NVIDIA/spark-rapids/pull/591)|Add ucx license in NOTICE-binary| -|[#596](https://github.com/NVIDIA/spark-rapids/pull/596)|Add Spark 3.0.2 to Shim layer| -|[#594](https://github.com/NVIDIA/spark-rapids/pull/594)|Filter nulls from joins where possible to improve performance.| -|[#590](https://github.com/NVIDIA/spark-rapids/pull/590)|Move GpuParquetScan/GpuOrcScan into Shim| -|[#588](https://github.com/NVIDIA/spark-rapids/pull/588)|xfail the tpch spark 3.1.0 tests that fail| -|[#572](https://github.com/NVIDIA/spark-rapids/pull/572)|Update buffer store to return compressed batches directly, add compression NVTX ranges| -|[#558](https://github.com/NVIDIA/spark-rapids/pull/558)|Fix unit tests when AQE is enabled| -|[#580](https://github.com/NVIDIA/spark-rapids/pull/580)|xfail the Spark 3.1.0 integration tests that fail | -|[#565](https://github.com/NVIDIA/spark-rapids/pull/565)|Minor improvements to TPC-DS benchmarking code| -|[#567](https://github.com/NVIDIA/spark-rapids/pull/567)|Explicitly disable AQE in one test| -|[#571](https://github.com/NVIDIA/spark-rapids/pull/571)|Fix Databricks shim layer for GpuFileSourceScanExec and GpuBroadcastExchangeExec| -|[#564](https://github.com/NVIDIA/spark-rapids/pull/564)|Add GPU decode time metric to scans| -|[#562](https://github.com/NVIDIA/spark-rapids/pull/562)|getCatalog can be called from the driver, and can return null| -|[#555](https://github.com/NVIDIA/spark-rapids/pull/555)|Fix build warnings for ColumnViewAccess| -|[#560](https://github.com/NVIDIA/spark-rapids/pull/560)|Fix databricks build for AQE support| -|[#557](https://github.com/NVIDIA/spark-rapids/pull/557)|Fix tests failing on Spark 3.1| -|[#547](https://github.com/NVIDIA/spark-rapids/pull/547)|Add GPU metrics to GpuFileSourceScanExec| -|[#462](https://github.com/NVIDIA/spark-rapids/pull/462)|Implement optimized AQE support so that exchanges run on GPU where possible| -|[#550](https://github.com/NVIDIA/spark-rapids/pull/550)|Document Parquet and ORC compression support| -|[#539](https://github.com/NVIDIA/spark-rapids/pull/539)|Update script to audit multiple Spark versions| -|[#543](https://github.com/NVIDIA/spark-rapids/pull/543)|Add metrics to GpuUnion operator| -|[#549](https://github.com/NVIDIA/spark-rapids/pull/549)|Move spark shim properties to top level pom| -|[#497](https://github.com/NVIDIA/spark-rapids/pull/497)|Add UDF compiler implementations| -|[#487](https://github.com/NVIDIA/spark-rapids/pull/487)|Add framework for batch compression of shuffle partitions| -|[#544](https://github.com/NVIDIA/spark-rapids/pull/544)|Add in driverExtraClassPath for standalone mode docs| -|[#546](https://github.com/NVIDIA/spark-rapids/pull/546)|Fix Spark 3.1.0 shim build error in GpuHashJoin| -|[#537](https://github.com/NVIDIA/spark-rapids/pull/537)|Use fresh SparkSession when capturing to avoid late capture of previous query| -|[#538](https://github.com/NVIDIA/spark-rapids/pull/538)|Revert "Temporary workaround for RMM initial pool size bug (#530)"| -|[#517](https://github.com/NVIDIA/spark-rapids/pull/517)|Add config to limit maximum RMM pool size| -|[#527](https://github.com/NVIDIA/spark-rapids/pull/527)|Add support for split and getArrayIndex| -|[#534](https://github.com/NVIDIA/spark-rapids/pull/534)|Fixes bugs around GpuShuffleEnv initialization| -|[#529](https://github.com/NVIDIA/spark-rapids/pull/529)|[BUG] Degenerate table metas were not getting copied to the heap| -|[#530](https://github.com/NVIDIA/spark-rapids/pull/530)|Temporary workaround for RMM initial pool size bug| -|[#526](https://github.com/NVIDIA/spark-rapids/pull/526)|Fix bug with nullability reporting in GpuFilterExec| -|[#521](https://github.com/NVIDIA/spark-rapids/pull/521)|Fix typo with databricks shim classname SparkShimServiceProvider| -|[#522](https://github.com/NVIDIA/spark-rapids/pull/522)|Use SQLConf instead of SparkConf when looking up SQL configs| -|[#518](https://github.com/NVIDIA/spark-rapids/pull/518)|Fix init order issue in GpuShuffleEnv when RAPIDS shuffle configured| -|[#514](https://github.com/NVIDIA/spark-rapids/pull/514)|Added clarification of RegExpReplace, DateDiff, made descriptive text consistent| -|[#506](https://github.com/NVIDIA/spark-rapids/pull/506)|Add in basic support for running tpcds like queries| -|[#504](https://github.com/NVIDIA/spark-rapids/pull/504)|Add ability to ignore tests depending on spark shim version| -|[#503](https://github.com/NVIDIA/spark-rapids/pull/503)|Remove unused async buffer spill support| -|[#501](https://github.com/NVIDIA/spark-rapids/pull/501)|disable codegen in 3.1 shim for hash join| -|[#466](https://github.com/NVIDIA/spark-rapids/pull/466)|Optimize and fix Api validation script| -|[#481](https://github.com/NVIDIA/spark-rapids/pull/481)|Codeowners| -|[#439](https://github.com/NVIDIA/spark-rapids/pull/439)|Check a PR has been committed using git signoff| -|[#319](https://github.com/NVIDIA/spark-rapids/pull/319)|Update partitioning logic in ShuffledBatchRDD| -|[#491](https://github.com/NVIDIA/spark-rapids/pull/491)|Temporarily ignore AQE integration tests| -|[#490](https://github.com/NVIDIA/spark-rapids/pull/490)|Fix Spark 3.1.0 build for HashJoin changes| -|[#482](https://github.com/NVIDIA/spark-rapids/pull/482)|Prevent bad practice in python tests| -|[#485](https://github.com/NVIDIA/spark-rapids/pull/485)|Show plan in assertion message if test fails| -|[#480](https://github.com/NVIDIA/spark-rapids/pull/480)|Fix link from README to getting-started.md| -|[#448](https://github.com/NVIDIA/spark-rapids/pull/448)|Preliminary support for keeping broadcast exchanges on GPU when AQE is enabled| -|[#478](https://github.com/NVIDIA/spark-rapids/pull/478)|Fall back to CPU for binary as string in parquet| -|[#477](https://github.com/NVIDIA/spark-rapids/pull/477)|Fix special case joins in broadcast nested loop join| -|[#469](https://github.com/NVIDIA/spark-rapids/pull/469)|Update HashAggregateSuite to work with AQE| -|[#475](https://github.com/NVIDIA/spark-rapids/pull/475)|Udf compiler pom followup| -|[#434](https://github.com/NVIDIA/spark-rapids/pull/434)|Add UDF compiler skeleton| -|[#474](https://github.com/NVIDIA/spark-rapids/pull/474)|Re-enable noscaladoc check| -|[#461](https://github.com/NVIDIA/spark-rapids/pull/461)|Fix comments style to pass scala style check| -|[#468](https://github.com/NVIDIA/spark-rapids/pull/468)|fix broken link| -|[#456](https://github.com/NVIDIA/spark-rapids/pull/456)|Add closeOnExcept to clean up code that closes resources only on exceptions| -|[#464](https://github.com/NVIDIA/spark-rapids/pull/464)|Turn off noscaladoc rule until codebase is fixed| -|[#449](https://github.com/NVIDIA/spark-rapids/pull/449)|Enforce NoScalaDoc rule in scalastyle checks| -|[#450](https://github.com/NVIDIA/spark-rapids/pull/450)|Enable scalastyle for shuffle plugin| -|[#451](https://github.com/NVIDIA/spark-rapids/pull/451)|Databricks remove unneeded files and fix build to not fail on rm when file missing| -|[#442](https://github.com/NVIDIA/spark-rapids/pull/442)|Shim layer support for Spark 3.0.0 Databricks| -|[#447](https://github.com/NVIDIA/spark-rapids/pull/447)|Add scalastyle plugin to shim module| -|[#426](https://github.com/NVIDIA/spark-rapids/pull/426)|Update BufferMeta to support multiple codec buffers per table| -|[#440](https://github.com/NVIDIA/spark-rapids/pull/440)|Run mortgage test both with AQE on and off| -|[#445](https://github.com/NVIDIA/spark-rapids/pull/445)|Added in StringRPad and StringLPad| -|[#422](https://github.com/NVIDIA/spark-rapids/pull/422)|Documentation updates| -|[#437](https://github.com/NVIDIA/spark-rapids/pull/437)|Fix bug with InSet and Strings| -|[#435](https://github.com/NVIDIA/spark-rapids/pull/435)|Add in checks for Parquet LEGACY date/time rebase| -|[#432](https://github.com/NVIDIA/spark-rapids/pull/432)|Fix batch use-after-close in partitioning, shuffle env init| -|[#423](https://github.com/NVIDIA/spark-rapids/pull/423)|Fix duplicates includes in assembly jar| -|[#418](https://github.com/NVIDIA/spark-rapids/pull/418)|CI Add unit tests running for Spark 3.0.1| -|[#421](https://github.com/NVIDIA/spark-rapids/pull/421)|Make it easier to run TPCxBB benchmarks from spark shell| -|[#413](https://github.com/NVIDIA/spark-rapids/pull/413)|Fix download link| -|[#414](https://github.com/NVIDIA/spark-rapids/pull/414)|Shim Layer to support multiple Spark versions | -|[#406](https://github.com/NVIDIA/spark-rapids/pull/406)|Update cast handling to deal with new libcudf casting limitations| -|[#405](https://github.com/NVIDIA/spark-rapids/pull/405)|Change slave->worker| -|[#395](https://github.com/NVIDIA/spark-rapids/pull/395)|Databricks doc updates| -|[#401](https://github.com/NVIDIA/spark-rapids/pull/401)|Extended the FAQ| -|[#398](https://github.com/NVIDIA/spark-rapids/pull/398)|Add tests for GpuPartition| -|[#352](https://github.com/NVIDIA/spark-rapids/pull/352)|Change spark tgz package name| -|[#397](https://github.com/NVIDIA/spark-rapids/pull/397)|Fix small bug in ShuffleBufferCatalog.hasActiveShuffle| -|[#286](https://github.com/NVIDIA/spark-rapids/pull/286)|[REVIEW] Updated join tests for cache| -|[#393](https://github.com/NVIDIA/spark-rapids/pull/393)|Contributor license agreement| -|[#389](https://github.com/NVIDIA/spark-rapids/pull/389)|Added in support for RangeExec| -|[#390](https://github.com/NVIDIA/spark-rapids/pull/390)|Ucx getting started| -|[#391](https://github.com/NVIDIA/spark-rapids/pull/391)|Hide slack channel in Jenkins scripts| -|[#387](https://github.com/NVIDIA/spark-rapids/pull/387)|Remove the term whitelist| -|[#365](https://github.com/NVIDIA/spark-rapids/pull/365)|[REVIEW] Timesub tests| -|[#383](https://github.com/NVIDIA/spark-rapids/pull/383)|Test utility to compare SQL query results between CPU and GPU| -|[#380](https://github.com/NVIDIA/spark-rapids/pull/380)|Fix databricks notebook link| -|[#378](https://github.com/NVIDIA/spark-rapids/pull/378)|Added in FAQ and fixed spelling| -|[#377](https://github.com/NVIDIA/spark-rapids/pull/377)|Update heading in configs.md| -|[#373](https://github.com/NVIDIA/spark-rapids/pull/373)|Modifying branch name to conform with rapidsai branch name change| -|[#376](https://github.com/NVIDIA/spark-rapids/pull/376)|Add our session extension correctly if there are other extensions configured| -|[#374](https://github.com/NVIDIA/spark-rapids/pull/374)|Fix rat issue for notebooks| -|[#364](https://github.com/NVIDIA/spark-rapids/pull/364)|Update Databricks patch for changes to GpuSortMergeJoin| -|[#371](https://github.com/NVIDIA/spark-rapids/pull/371)|fix typo and use regional bucket per GCP's update| -|[#359](https://github.com/NVIDIA/spark-rapids/pull/359)|Karthik changes| -|[#353](https://github.com/NVIDIA/spark-rapids/pull/353)|Fix broadcast nested loop join for the no column case| -|[#313](https://github.com/NVIDIA/spark-rapids/pull/313)|Additional tests for broadcast hash join| -|[#342](https://github.com/NVIDIA/spark-rapids/pull/342)|Implement build-side rules for shuffle hash join| -|[#349](https://github.com/NVIDIA/spark-rapids/pull/349)|Updated join code to treat null equality properly| -|[#335](https://github.com/NVIDIA/spark-rapids/pull/335)|Integration tests on spark 3.0.1-SNAPSHOT & 3.1.0-SNAPSHOT| -|[#346](https://github.com/NVIDIA/spark-rapids/pull/346)|Update the Title Header for Fine Tuning| -|[#344](https://github.com/NVIDIA/spark-rapids/pull/344)|Fix small typo in readme| -|[#331](https://github.com/NVIDIA/spark-rapids/pull/331)|Adds iterator and client unit tests, and prepares for more fetch failure handling| -|[#337](https://github.com/NVIDIA/spark-rapids/pull/337)|Fix Scala compile phase to allow Java classes referencing Scala classes| -|[#332](https://github.com/NVIDIA/spark-rapids/pull/332)|Match GPU overwritten functions with SQL functions from FunctionRegistry| -|[#339](https://github.com/NVIDIA/spark-rapids/pull/339)|Fix databricks build| -|[#338](https://github.com/NVIDIA/spark-rapids/pull/338)|Move GpuPartitioning to a separate file| -|[#310](https://github.com/NVIDIA/spark-rapids/pull/310)|Update release Jenkinsfile for Databricks| -|[#330](https://github.com/NVIDIA/spark-rapids/pull/330)|Hide private info in Jenkins scripts| -|[#324](https://github.com/NVIDIA/spark-rapids/pull/324)|Add in basic support for GpuCartesianProductExec| -|[#328](https://github.com/NVIDIA/spark-rapids/pull/328)|Enable slack notification for Databricks build| -|[#321](https://github.com/NVIDIA/spark-rapids/pull/321)|update databricks patch for GpuBroadcastNestedLoopJoinExec| -|[#322](https://github.com/NVIDIA/spark-rapids/pull/322)|Add oss.sonatype.org to download the cudf jar| -|[#320](https://github.com/NVIDIA/spark-rapids/pull/320)|Don't mount passwd/group to the container| -|[#258](https://github.com/NVIDIA/spark-rapids/pull/258)|Enable running TPCH tests with AQE enabled| -|[#318](https://github.com/NVIDIA/spark-rapids/pull/318)|Build docker image with Dockerfile| -|[#309](https://github.com/NVIDIA/spark-rapids/pull/309)|Update databricks patch to latest changes| -|[#312](https://github.com/NVIDIA/spark-rapids/pull/312)|Trigger branch-0.2 integration test| -|[#307](https://github.com/NVIDIA/spark-rapids/pull/307)|[Jenkins] Update the release script and Jenkinsfile| -|[#304](https://github.com/NVIDIA/spark-rapids/pull/304)|[DOC][Minor] Fix typo in spark config name.| -|[#303](https://github.com/NVIDIA/spark-rapids/pull/303)|Update compatibility doc for -0.0 issues| -|[#301](https://github.com/NVIDIA/spark-rapids/pull/301)|Add info about branches in README.md| -|[#296](https://github.com/NVIDIA/spark-rapids/pull/296)|Added in basic support for broadcast nested loop join| -|[#297](https://github.com/NVIDIA/spark-rapids/pull/297)|Databricks CI improvements and support runtime env parameter to xfail certain tests| -|[#292](https://github.com/NVIDIA/spark-rapids/pull/292)|Move artifacts version in version-def.sh| -|[#254](https://github.com/NVIDIA/spark-rapids/pull/254)|Cleanup QA tests| -|[#289](https://github.com/NVIDIA/spark-rapids/pull/289)|Clean up GpuCollectLimitMeta and add in metrics| -|[#287](https://github.com/NVIDIA/spark-rapids/pull/287)|Add in support for right join and fix issues build right| -|[#273](https://github.com/NVIDIA/spark-rapids/pull/273)|Added releases to the README.md| -|[#285](https://github.com/NVIDIA/spark-rapids/pull/285)|modify run_pyspark_from_build.sh to be bash 3 friendly| -|[#281](https://github.com/NVIDIA/spark-rapids/pull/281)|Add in support for Full Outer Join on non-null keys| -|[#274](https://github.com/NVIDIA/spark-rapids/pull/274)|Add RapidsDiskStore tests| -|[#259](https://github.com/NVIDIA/spark-rapids/pull/259)|Add RapidsHostMemoryStore tests| -|[#282](https://github.com/NVIDIA/spark-rapids/pull/282)|Update Databricks patch for 0.2 branch| -|[#261](https://github.com/NVIDIA/spark-rapids/pull/261)|Add conditional xfail test for DISTINCT aggregates with NaN| -|[#263](https://github.com/NVIDIA/spark-rapids/pull/263)|More time ops| -|[#256](https://github.com/NVIDIA/spark-rapids/pull/256)|Remove special cases for contains, startsWith, and endWith| -|[#253](https://github.com/NVIDIA/spark-rapids/pull/253)|Remove GpuAttributeReference and GpuSortOrder| -|[#271](https://github.com/NVIDIA/spark-rapids/pull/271)|Update the versions for 0.2.0 properly for the databricks build| -|[#162](https://github.com/NVIDIA/spark-rapids/pull/162)|Integration tests for corner cases in window functions.| -|[#264](https://github.com/NVIDIA/spark-rapids/pull/264)|Add a local mvn repo for nightly pipeline| -|[#262](https://github.com/NVIDIA/spark-rapids/pull/262)|Refer to branch-0.2| -|[#255](https://github.com/NVIDIA/spark-rapids/pull/255)|Revert change to make dependencies of shaded jar optional| -|[#257](https://github.com/NVIDIA/spark-rapids/pull/257)|Fix link to RAPIDS cudf in index.md| -|[#252](https://github.com/NVIDIA/spark-rapids/pull/252)|Update to 0.2.0-SNAPSHOT and cudf-0.15-SNAPSHOT| - -## Release 0.1 - -### Features -||| -|:---|:---| -|[#74](https://github.com/NVIDIA/spark-rapids/issues/74)|[FEA] Support ToUnixTimestamp| -|[#21](https://github.com/NVIDIA/spark-rapids/issues/21)|[FEA] NormalizeNansAndZeros| -|[#105](https://github.com/NVIDIA/spark-rapids/issues/105)|[FEA] integration tests for equi-joins| - -### Bugs Fixed -||| -|:---|:---| -|[#116](https://github.com/NVIDIA/spark-rapids/issues/116)|[BUG] calling replace with a NULL throws an exception| -|[#168](https://github.com/NVIDIA/spark-rapids/issues/168)|[BUG] GpuUnitTests Date tests leak column vectors| -|[#209](https://github.com/NVIDIA/spark-rapids/issues/209)|[BUG] Developers section in pom need to be updated| -|[#204](https://github.com/NVIDIA/spark-rapids/issues/204)|[BUG] Code coverage docs are out of date| -|[#154](https://github.com/NVIDIA/spark-rapids/issues/154)|[BUG] Incorrect output from partial-only averages with nulls| -|[#61](https://github.com/NVIDIA/spark-rapids/issues/61)|[BUG] Cannot disable Parquet, ORC, CSV reading when using FileSourceScanExec| - -### PRs -||| -|:---|:---| -|[#249](https://github.com/NVIDIA/spark-rapids/pull/249)|Compatability -> Compatibility| -|[#247](https://github.com/NVIDIA/spark-rapids/pull/247)|Add index.md for default doc page, fix table formatting for configs| -|[#241](https://github.com/NVIDIA/spark-rapids/pull/241)|Let default branch to master per the release rule| -|[#177](https://github.com/NVIDIA/spark-rapids/pull/177)|Fixed leaks in unit test and use ColumnarBatch for testing| -|[#243](https://github.com/NVIDIA/spark-rapids/pull/243)|Jenkins file for Databricks release| -|[#225](https://github.com/NVIDIA/spark-rapids/pull/225)|Make internal project dependencies optional for shaded artifact| -|[#242](https://github.com/NVIDIA/spark-rapids/pull/242)|Add site pages| -|[#221](https://github.com/NVIDIA/spark-rapids/pull/221)|Databricks Build Support| -|[#215](https://github.com/NVIDIA/spark-rapids/pull/215)|Remove CudfColumnVector| -|[#213](https://github.com/NVIDIA/spark-rapids/pull/213)|Add RapidsDeviceMemoryStore tests| -|[#214](https://github.com/NVIDIA/spark-rapids/pull/214)|[REVIEW] Test failure to pass Attribute as GpuAttribute| -|[#211](https://github.com/NVIDIA/spark-rapids/pull/211)|Add project leads to pom developer list| -|[#210](https://github.com/NVIDIA/spark-rapids/pull/210)|Updated coverage docs| -|[#195](https://github.com/NVIDIA/spark-rapids/pull/195)|Support public release for plugin jar| -|[#208](https://github.com/NVIDIA/spark-rapids/pull/208)|Remove unneeded comment from pom.xml| -|[#191](https://github.com/NVIDIA/spark-rapids/pull/191)|WindowExec handle different spark distributions| -|[#181](https://github.com/NVIDIA/spark-rapids/pull/181)|Remove INCOMPAT for NormalizeNanAndZero, KnownFloatingPointNormalized| -|[#196](https://github.com/NVIDIA/spark-rapids/pull/196)|Update Spark dependency to the released 3.0.0 artifacts| -|[#206](https://github.com/NVIDIA/spark-rapids/pull/206)|Change groupID to 'com.nvidia' in IT scripts| -|[#202](https://github.com/NVIDIA/spark-rapids/pull/202)|Fixed issue for contains when searching for an empty string| -|[#201](https://github.com/NVIDIA/spark-rapids/pull/201)|Fix name of scan| -|[#200](https://github.com/NVIDIA/spark-rapids/pull/200)|Fix issue with GpuAttributeReference not overrideing references| -|[#197](https://github.com/NVIDIA/spark-rapids/pull/197)|Fix metrics for writes| -|[#186](https://github.com/NVIDIA/spark-rapids/pull/186)|Fixed issue with nullability on concat| -|[#193](https://github.com/NVIDIA/spark-rapids/pull/193)|Add RapidsBufferCatalog tests| -|[#188](https://github.com/NVIDIA/spark-rapids/pull/188)|rebrand to com.nvidia instead of ai.rapids| -|[#189](https://github.com/NVIDIA/spark-rapids/pull/189)|Handle AggregateExpression having resultIds parameter instead of a single resultId| -|[#190](https://github.com/NVIDIA/spark-rapids/pull/190)|FileSourceScanExec can have logicalRelation parameter on some distributions| -|[#185](https://github.com/NVIDIA/spark-rapids/pull/185)|Update type of parameter of GpuExpandExec to make it consistent| -|[#172](https://github.com/NVIDIA/spark-rapids/pull/172)|Merge qa test to integration test| -|[#180](https://github.com/NVIDIA/spark-rapids/pull/180)|Add MetaUtils unit tests| -|[#171](https://github.com/NVIDIA/spark-rapids/pull/171)|Cleanup scaladoc warnings about missing links| -|[#176](https://github.com/NVIDIA/spark-rapids/pull/176)|Updated join tests to cover more data.| -|[#169](https://github.com/NVIDIA/spark-rapids/pull/169)|Remove dependency on shaded Spark artifact| -|[#174](https://github.com/NVIDIA/spark-rapids/pull/174)|Added in fallback tests| -|[#165](https://github.com/NVIDIA/spark-rapids/pull/165)|Move input metadata tests to pyspark| -|[#173](https://github.com/NVIDIA/spark-rapids/pull/173)|Fix setting local mode for tests| -|[#160](https://github.com/NVIDIA/spark-rapids/pull/160)|Integration tests for normalizing NaN/zeroes.| -|[#163](https://github.com/NVIDIA/spark-rapids/pull/163)|Ignore the order locally for repartition tests| -|[#157](https://github.com/NVIDIA/spark-rapids/pull/157)|Add partial and final only hash aggregate tests and fix nulls corner case for Average| -|[#159](https://github.com/NVIDIA/spark-rapids/pull/159)|Add integration tests for joins| -|[#158](https://github.com/NVIDIA/spark-rapids/pull/158)|Orc merge schema fallback and FileScan format configs| -|[#164](https://github.com/NVIDIA/spark-rapids/pull/164)|Fix compiler warnings| -|[#152](https://github.com/NVIDIA/spark-rapids/pull/152)|Moved cudf to 0.14 for CI| -|[#151](https://github.com/NVIDIA/spark-rapids/pull/151)|Switch CICD pipelines to Github| - -## Older Releases -Changelog of older releases can be found at [docs/archives](/docs/archives) diff --git a/docs/archives/CHANGELOG_21.06_to_21.12.md b/docs/archives/CHANGELOG_21.06_to_21.12.md deleted file mode 100644 index dcf201a246d..00000000000 --- a/docs/archives/CHANGELOG_21.06_to_21.12.md +++ /dev/null @@ -1,1237 +0,0 @@ -# Change log -Generated on 2022-08-05 - -## Release 21.12 - -### Features -||| -|:---|:---| -|[#1571](https://github.com/NVIDIA/spark-rapids/issues/1571)|[FEA] Better precision range for decimal multiply, and possibly others| -|[#3953](https://github.com/NVIDIA/spark-rapids/issues/3953)|[FEA] Audit: Add array support to union by name | -|[#4085](https://github.com/NVIDIA/spark-rapids/issues/4085)|[FEA] Decimal 128 Support: Concat| -|[#4073](https://github.com/NVIDIA/spark-rapids/issues/4073)|[FEA] Decimal 128 Support: MapKeys, MapValues, MapEntries| -|[#3432](https://github.com/NVIDIA/spark-rapids/issues/3432)|[FEA] Qualification tool checks if there is any "Scan JDBCRelation" and count it as "problematic"| -|[#3824](https://github.com/NVIDIA/spark-rapids/issues/3824)|[FEA] Support MapType in ParquetCachedBatchSerializer| -|[#4048](https://github.com/NVIDIA/spark-rapids/issues/4048)|[FEA] WindowExpression support for Decimal 128 in Spark 320| -|[#4047](https://github.com/NVIDIA/spark-rapids/issues/4047)|[FEA] Literal support for Decimal 128 in Spark 320| -|[#3863](https://github.com/NVIDIA/spark-rapids/issues/3863)|[FEA] Add Spark 3.3.0-SNAPSHOT Shim | -|[#3814](https://github.com/NVIDIA/spark-rapids/issues/3814)|[FEA] stddev stddev_samp and std should be supported over a window| -|[#3370](https://github.com/NVIDIA/spark-rapids/issues/3370)|[FEA] Add support for Databricks 9.1 runtime| -|[#3876](https://github.com/NVIDIA/spark-rapids/issues/3876)|[FEA] Support REGEXP_REPLACE to replace null values| -|[#3784](https://github.com/NVIDIA/spark-rapids/issues/3784)|[FEA] Support ORC write Map column(single level)| -|[#3470](https://github.com/NVIDIA/spark-rapids/issues/3470)|[FEA] Add shims for 3.2.1-SNAPSHOT| -|[#3855](https://github.com/NVIDIA/spark-rapids/issues/3855)|[FEA] CPU based UDF to run efficiently and transfer data back to GPU for supported operations| -|[#3739](https://github.com/NVIDIA/spark-rapids/issues/3739)|[FEA] Provide an explicit config for fallback on CPU if plan rewrite fails| -|[#3888](https://github.com/NVIDIA/spark-rapids/issues/3888)|[FEA] Decimal 128 Support: Add a "Trust me I know it will not overflow config"| -|[#3088](https://github.com/NVIDIA/spark-rapids/issues/3088)|[FEA] Profile tool print problematic operations| -|[#3886](https://github.com/NVIDIA/spark-rapids/issues/3886)|[FEA] Decimal 128 Support: Extend the range for Decimal Multiply and Divide| -|[#79](https://github.com/NVIDIA/spark-rapids/issues/79)|[FEA] Support Size operation| -|[#3880](https://github.com/NVIDIA/spark-rapids/issues/3880)|[FEA] Decimal 128 Support: Average aggregation| -|[#3659](https://github.com/NVIDIA/spark-rapids/issues/3659)|[FEA] External tool integration with Qualification tool| -|[#2](https://github.com/NVIDIA/spark-rapids/issues/2)|[FEA] RLIKE support| -|[#3192](https://github.com/NVIDIA/spark-rapids/issues/3192)|[FEA] Support decimal type in ORC writer| -|[#3419](https://github.com/NVIDIA/spark-rapids/issues/3419)|[FEA] Add support for org.apache.spark.sql.execution.SampleExec| -|[#3535](https://github.com/NVIDIA/spark-rapids/issues/3535)|[FEA] Qualification tool can detect RDD APIs in SQL plan| -|[#3494](https://github.com/NVIDIA/spark-rapids/issues/3494)|[FEA] Support structs in ORC writer| -|[#3514](https://github.com/NVIDIA/spark-rapids/issues/3514)|[FEA] Support collect_set on struct in aggregation context| -|[#3515](https://github.com/NVIDIA/spark-rapids/issues/3515)|[FEA] Support CreateArray to produce array(struct)| -|[#3116](https://github.com/NVIDIA/spark-rapids/issues/3116)|[FEA] Support Maps, Lists, and Structs as non-key columns on joins| -|[#2054](https://github.com/NVIDIA/spark-rapids/issues/2054)|[FEA] Add support for Arrays to ParquetCachedBatchSerializer| -|[#3573](https://github.com/NVIDIA/spark-rapids/issues/3573)|[FEA] Support Cache(PCBS) Array-of-Struct| - -### Performance -||| -|:---|:---| -|[#3768](https://github.com/NVIDIA/spark-rapids/issues/3768)|[DOC] document databricks init script required for UCX| -|[#2867](https://github.com/NVIDIA/spark-rapids/issues/2867)|[FEA] Make LZ4_CHUNK_SIZE configurable| -|[#3832](https://github.com/NVIDIA/spark-rapids/issues/3832)|[FEA] AST enabled GpuBroadcastNestedLoopJoin left side can't be small| -|[#3798](https://github.com/NVIDIA/spark-rapids/issues/3798)|[FEA] bounds checking in joins can be expensive| -|[#3603](https://github.com/NVIDIA/spark-rapids/issues/3603)|[FEA] Allocate UCX bounce buffers outside of RMM if ASYNC allocator is enabled| - -### Bugs Fixed -||| -|:---|:---| -|[#4253](https://github.com/NVIDIA/spark-rapids/issues/4253)|[BUG] Dependencies missing of spark-rapids v21.12.0 release jars| -|[#4216](https://github.com/NVIDIA/spark-rapids/issues/4216)|[BUG] AQE Crashing Spark RAPIDS when using filter() and union()| -|[#4188](https://github.com/NVIDIA/spark-rapids/issues/4188)|[BUG] data corruption in GpuBroadcastNestedLoopJoin with empty relations edge case| -|[#4191](https://github.com/NVIDIA/spark-rapids/issues/4191)|[BUG] failed to read DECIMAL128 within MapType from ORC| -|[#4175](https://github.com/NVIDIA/spark-rapids/issues/4175)|[BUG] arithmetic_ops_test failed in spark 3.2.0| -|[#4162](https://github.com/NVIDIA/spark-rapids/issues/4162)|[BUG] isCastDecimalToStringEnabled is never called| -|[#3894](https://github.com/NVIDIA/spark-rapids/issues/3894)|[BUG] test_pandas_scalar_udf and test_pandas_map_udf failed in UCX standalone CI run| -|[#3970](https://github.com/NVIDIA/spark-rapids/issues/3970)|[BUG] mismatching timezone settings on executor and driver can cause ORC read data corruption| -|[#4141](https://github.com/NVIDIA/spark-rapids/issues/4141)|[BUG] Unable to start the RapidsShuffleManager in databricks 9.1| -|[#4102](https://github.com/NVIDIA/spark-rapids/issues/4102)|[BUG] udf-example build failed: Unknown CMake command "cpm_check_if_package_already_added".| -|[#4084](https://github.com/NVIDIA/spark-rapids/issues/4084)|[BUG] window on unbounded preceeding and unbounded following can produce incorrect results.| -|[#3990](https://github.com/NVIDIA/spark-rapids/issues/3990)|[BUG] Scaladoc link warnings in ParquetCachedBatchSerializer and ExplainPlan| -|[#4108](https://github.com/NVIDIA/spark-rapids/issues/4108)|[BUG] premerge fails due to Spark 3.3.0 HadoopFsRelation after SPARK-37289| -|[#4042](https://github.com/NVIDIA/spark-rapids/issues/4042)|[BUG] cudf_udf tests fail on nightly Integration test run| -|[#3743](https://github.com/NVIDIA/spark-rapids/issues/3743)|[BUG] Implicitly catching all exceptions warning in GpuOverrides| -|[#4069](https://github.com/NVIDIA/spark-rapids/issues/4069)|[BUG] parquet_test.py pytests FAILED on Databricks-9.1-ML-spark-3.1.2| -|[#3461](https://github.com/NVIDIA/spark-rapids/issues/3461)|[BUG] Cannot build project from a sub-directory| -|[#4053](https://github.com/NVIDIA/spark-rapids/issues/4053)|[BUG] buildall uses a stale aggregator dependency during test compilation| -|[#3703](https://github.com/NVIDIA/spark-rapids/issues/3703)|[BUG] test_hash_groupby_approx_percentile_long_repeated_keys failed with TypeError| -|[#3706](https://github.com/NVIDIA/spark-rapids/issues/3706)|[BUG] approx_percentile returns array of zero percentiles instead of null in some cases| -|[#4017](https://github.com/NVIDIA/spark-rapids/issues/4017)|[BUG] Why is the hash aggregate not handling empty result expressions| -|[#3994](https://github.com/NVIDIA/spark-rapids/issues/3994)|[BUG] can't open notebook 'docs/demo/GCP/mortgage-xgboost4j-gpu-scala.ipynb'| -|[#3996](https://github.com/NVIDIA/spark-rapids/issues/3996)|[BUG] Exception happened when getting a null row| -|[#3999](https://github.com/NVIDIA/spark-rapids/issues/3999)|[BUG] Integration cache_test failures - ArrayIndexOutOfBoundsException| -|[#3532](https://github.com/NVIDIA/spark-rapids/issues/3532)|[BUG] DatabricksShimVersion must carry runtime version info| -|[#3834](https://github.com/NVIDIA/spark-rapids/issues/3834)|[BUG] Approx_percentile deserialize error when calling "show" rather than "collect"| -|[#3992](https://github.com/NVIDIA/spark-rapids/issues/3992)|[BUG] failed create-parallel-world in databricks build| -|[#3987](https://github.com/NVIDIA/spark-rapids/issues/3987)|[BUG] "mvn clean package -DskipTests" is no longer working| -|[#3866](https://github.com/NVIDIA/spark-rapids/issues/3866)|[BUG] RLike integration tests failing on Azure Databricks 7.3| -|[#3980](https://github.com/NVIDIA/spark-rapids/issues/3980)|[BUG] udf-example build failed due to maven-antrun-plugin upgrade| -|[#3966](https://github.com/NVIDIA/spark-rapids/issues/3966)|[BUG] udf-examples module fails on `mvn compile` and `mvn test`| -|[#3977](https://github.com/NVIDIA/spark-rapids/issues/3977)|[BUG] databricks aggregator jar deployed failed| -|[#3915](https://github.com/NVIDIA/spark-rapids/issues/3915)|[BUG] typo in verify_same_sha_for_unshimmed prevents the offending class file name from being logged. | -|[#1304](https://github.com/NVIDIA/spark-rapids/issues/1304)|[BUG] Query fails with HostColumnarToGpu doesn't support Structs| -|[#3924](https://github.com/NVIDIA/spark-rapids/issues/3924)|[BUG] ExpressionEncoder does not work for input in `GpuScalaUDF` | -|[#3911](https://github.com/NVIDIA/spark-rapids/issues/3911)|[BUG] CI fails on an inconsistent set of partial builds| -|[#2896](https://github.com/NVIDIA/spark-rapids/issues/2896)|[BUG] Extra GpuColumnarToRow when using ParquetCachedBatchSerializer on databricks| -|[#3864](https://github.com/NVIDIA/spark-rapids/issues/3864)|[BUG] test_sample_produce_empty_batch failed in dataproc| -|[#3823](https://github.com/NVIDIA/spark-rapids/issues/3823)|[BUG]binary-dedup.sh script fails on mac| -|[#3658](https://github.com/NVIDIA/spark-rapids/issues/3658)|[BUG] DataFrame actions failing with error: Error : java.lang.NoClassDefFoundError: Could not initialize class com.nvidia.spark.rapids.GpuOverrides withlatest 21.10 jars| -|[#3857](https://github.com/NVIDIA/spark-rapids/issues/3857)|[BUG] nightly build push dist packge w/ single version of spark| -|[#3854](https://github.com/NVIDIA/spark-rapids/issues/3854)|[BUG] not found: type PoissonDistribution in databricks build| -|[#3852](https://github.com/NVIDIA/spark-rapids/issues/3852)|spark-nightly-build deploys all modules due to typo in `-pl`| -|[#3844](https://github.com/NVIDIA/spark-rapids/issues/3844)|[BUG] nightly spark311cdh build failed| -|[#3843](https://github.com/NVIDIA/spark-rapids/issues/3843)|[BUG] databricks nightly deploy failed| -|[#3705](https://github.com/NVIDIA/spark-rapids/issues/3705)|[BUG] Change `nullOnDivideByZero` from runtime parameter to aggregate expression for `stddev` and `variance` aggregation families| -|[#3614](https://github.com/NVIDIA/spark-rapids/issues/3614)|[BUG] ParquetMaterializer.scala appears in both v1 and v2 shims| -|[#3430](https://github.com/NVIDIA/spark-rapids/issues/3430)|[BUG] Profiling tool silently stops without producing any output on a Synapse Spark event log| -|[#3311](https://github.com/NVIDIA/spark-rapids/issues/3311)|[BUG] cache_test.py failed w/ cache.serializer in spark 3.1.2| -|[#3710](https://github.com/NVIDIA/spark-rapids/issues/3710)|[BUG] Usage of Class.forName without specifying a classloader| -|[#3462](https://github.com/NVIDIA/spark-rapids/issues/3462)|[BUG] IDE complains about duplicate ShimBasePythonRunner instances| -|[#3476](https://github.com/NVIDIA/spark-rapids/issues/3476)|[BUG] test_non_empty_ctas fails on yarn| - -### PRs -||| -|:---|:---| -|[#4391](https://github.com/NVIDIA/spark-rapids/pull/4391)|update gcp custom dataproc image version to avoid log4j issue[skip ci]| -|[#4379](https://github.com/NVIDIA/spark-rapids/pull/4379)|update hot fix cudf link v21.12.2| -|[#4367](https://github.com/NVIDIA/spark-rapids/pull/4367)|update 21.12 branch for doc [skip ci]| -|[#4245](https://github.com/NVIDIA/spark-rapids/pull/4245)|Update changelog 21.12 to latest [skip ci]| -|[#4258](https://github.com/NVIDIA/spark-rapids/pull/4258)|Sanitize column names in ParquetCachedBatchSerializer before writing to Parquet| -|[#4308](https://github.com/NVIDIA/spark-rapids/pull/4308)|Bump up GPU reserve memory to 640MB| -|[#4307](https://github.com/NVIDIA/spark-rapids/pull/4307)|Update Download page for 21.12 [skip ci]| -|[#4261](https://github.com/NVIDIA/spark-rapids/pull/4261)|Update cudfjni version to released 21.12.0| -|[#4265](https://github.com/NVIDIA/spark-rapids/pull/4265)|Remove aggregator dependency before deploying dist artifact| -|[#4030](https://github.com/NVIDIA/spark-rapids/pull/4030)|Support code coverage report with single version jar [skip ci]| -|[#4287](https://github.com/NVIDIA/spark-rapids/pull/4287)|Update 21.12 compatibility guide for known regexp issue [skip ci]| -|[#4242](https://github.com/NVIDIA/spark-rapids/pull/4242)|Fix indentation issue in getting-started-k8s guide [skip ci]| -|[#4263](https://github.com/NVIDIA/spark-rapids/pull/4263)|Add missing ORC write tests on Map of Decimal| -|[#4257](https://github.com/NVIDIA/spark-rapids/pull/4257)|Implement getShuffleRDD and fixup mismatched output types on shuffle reuse| -|[#4250](https://github.com/NVIDIA/spark-rapids/pull/4250)|Update the release script [skip ci]| -|[#4222](https://github.com/NVIDIA/spark-rapids/pull/4222)|Add arguments support to 'databricks/run-tests.py'| -|[#4233](https://github.com/NVIDIA/spark-rapids/pull/4233)|Add databricks init script for UCX| -|[#4231](https://github.com/NVIDIA/spark-rapids/pull/4231)|RAPIDS Shuffle Manager fallback if security is enabled| -|[#4228](https://github.com/NVIDIA/spark-rapids/pull/4228)|Fix unconditional nested loop joins on empty tables| -|[#4217](https://github.com/NVIDIA/spark-rapids/pull/4217)|Enable event log for qualification & profiling tools testing from IT| -|[#4202](https://github.com/NVIDIA/spark-rapids/pull/4202)|Parameter for the Databricks zone-id [skip ci]| -|[#4199](https://github.com/NVIDIA/spark-rapids/pull/4199)|modify some words for synapse getting started guide[skip ci]| -|[#4200](https://github.com/NVIDIA/spark-rapids/pull/4200)|Disable approx percentile tests that intermittently fail| -|[#4187](https://github.com/NVIDIA/spark-rapids/pull/4187)|Added a getting started guide for Synapse[skip ci]| -|[#4192](https://github.com/NVIDIA/spark-rapids/pull/4192)|Fix ORC read DECIMAL128 inside MapType| -|[#4173](https://github.com/NVIDIA/spark-rapids/pull/4173)|Update approx percentile docs to link to issue 4060 [skip ci]| -|[#4174](https://github.com/NVIDIA/spark-rapids/pull/4174)|Document Bloop, Metals and VS code as an IDE option [skip ci]| -|[#4181](https://github.com/NVIDIA/spark-rapids/pull/4181)|Fix element_at for 3.2.0 and array/struct cast| -|[#4110](https://github.com/NVIDIA/spark-rapids/pull/4110)|Add a getting started guide on workload qualification [skip ci]| -|[#4106](https://github.com/NVIDIA/spark-rapids/pull/4106)|Add docs for MIG on YARN [skip ci]| -|[#4100](https://github.com/NVIDIA/spark-rapids/pull/4100)|Add PCA example to ml-integration page [skip ci]| -|[#4177](https://github.com/NVIDIA/spark-rapids/pull/4177)|Decimal128: added missing decimal128 signature on Spark 32X| -|[#4161](https://github.com/NVIDIA/spark-rapids/pull/4161)|More integration tests with decimal128| -|[#4165](https://github.com/NVIDIA/spark-rapids/pull/4165)|Fix type checks for get array item in 3.2.0| -|[#4163](https://github.com/NVIDIA/spark-rapids/pull/4163)|Enable config to check for casting decimals to strings| -|[#4154](https://github.com/NVIDIA/spark-rapids/pull/4154)|Use num_slices to guarantee partition shape in the pandas udf tests| -|[#4129](https://github.com/NVIDIA/spark-rapids/pull/4129)|Check executor timezone is same as driver timezone when running on GPU| -|[#4139](https://github.com/NVIDIA/spark-rapids/pull/4139)|Decimal128 Support| -|[#4128](https://github.com/NVIDIA/spark-rapids/pull/4128)|Fix build errors in udf-examples native build| -|[#4063](https://github.com/NVIDIA/spark-rapids/pull/4063)|Regexp_replace support regexp| -|[#4125](https://github.com/NVIDIA/spark-rapids/pull/4125)|Remove unused imports| -|[#4052](https://github.com/NVIDIA/spark-rapids/pull/4052)|Support null safe host column vector| -|[#4116](https://github.com/NVIDIA/spark-rapids/pull/4116)|Add in tests to check for overflow in unbounded window| -|[#4111](https://github.com/NVIDIA/spark-rapids/pull/4111)|Added external doc links for JRE and Spark| -|[#4105](https://github.com/NVIDIA/spark-rapids/pull/4105)|Enforce checks for unused imports and missed interpolation| -|[#4107](https://github.com/NVIDIA/spark-rapids/pull/4107)|Set the task context in background reader threads| -|[#4114](https://github.com/NVIDIA/spark-rapids/pull/4114)|Refactoring cudf_udf test setup| -|[#4109](https://github.com/NVIDIA/spark-rapids/pull/4109)|Stop using redundant partitionSchemaOption dropped in 3.3.0| -|[#4097](https://github.com/NVIDIA/spark-rapids/pull/4097)|Enable auto-merge from branch-21.12 to branch-22.02 [skip ci]| -|[#4094](https://github.com/NVIDIA/spark-rapids/pull/4094)|Remove spark311db shim layer| -|[#4082](https://github.com/NVIDIA/spark-rapids/pull/4082)|Add abfs and abfss to the cloud scheme| -|[#4071](https://github.com/NVIDIA/spark-rapids/pull/4071)|Treat scalac warnings as errors| -|[#4043](https://github.com/NVIDIA/spark-rapids/pull/4043)|Promote cudf as dist direct dependency, mark aggregator provided| -|[#4076](https://github.com/NVIDIA/spark-rapids/pull/4076)|Sets the GPU device id in the UCX early start thread| -|[#4087](https://github.com/NVIDIA/spark-rapids/pull/4087)|Regex parser improvements and bug fixes| -|[#4079](https://github.com/NVIDIA/spark-rapids/pull/4079)|verify "Add array support to union by name " by adding an integration test| -|[#4090](https://github.com/NVIDIA/spark-rapids/pull/4090)|Update pre-merge expression for 2022+ CI [skip ci]| -|[#4049](https://github.com/NVIDIA/spark-rapids/pull/4049)|Change Databricks image from 8.2 to 9.1 [skip ci]| -|[#4051](https://github.com/NVIDIA/spark-rapids/pull/4051)|Upgrade ORC version from 1.5.8 to 1.5.10| -|[#4080](https://github.com/NVIDIA/spark-rapids/pull/4080)|Add case insensitive when clipping parquet blocks| -|[#4083](https://github.com/NVIDIA/spark-rapids/pull/4083)|Fix compiler warning in regex transpiler| -|[#4070](https://github.com/NVIDIA/spark-rapids/pull/4070)|Support building from sub directory| -|[#4072](https://github.com/NVIDIA/spark-rapids/pull/4072)|Fix overflow checking on optimized decimal sum| -|[#4067](https://github.com/NVIDIA/spark-rapids/pull/4067)|Append new authorized user to blossom-ci whitelist [skip ci]| -|[#4066](https://github.com/NVIDIA/spark-rapids/pull/4066)|Temply disable cudf_udf test| -|[#4057](https://github.com/NVIDIA/spark-rapids/pull/4057)|Restore original ASL 2.0 license text| -|[#3937](https://github.com/NVIDIA/spark-rapids/pull/3937)|Qualification tool: Detect JDBCRelation in eventlog| -|[#3925](https://github.com/NVIDIA/spark-rapids/pull/3925)|verify AQE and DPP both on| -|[#3982](https://github.com/NVIDIA/spark-rapids/pull/3982)|Fix the issue of parquet reading with case insensitive schema| -|[#4054](https://github.com/NVIDIA/spark-rapids/pull/4054)|Use install for the base version build thread [skip ci]| -|[#4008](https://github.com/NVIDIA/spark-rapids/pull/4008)|[Doc] Update the getting started guide for databricks: Change from 8.2 to 9.1 runtime [skip ci]| -|[#4010](https://github.com/NVIDIA/spark-rapids/pull/4010)|Enable MapType for ParquetCachedBatchSerializer| -|[#4046](https://github.com/NVIDIA/spark-rapids/pull/4046)|lower GPU memory reserve to 256MB| -|[#3770](https://github.com/NVIDIA/spark-rapids/pull/3770)|Enable approx percentile tests| -|[#4038](https://github.com/NVIDIA/spark-rapids/pull/4038)|Change the `catalystConverter` to be a Scala `val`.| -|[#4035](https://github.com/NVIDIA/spark-rapids/pull/4035)|Hash aggregate fix empty resultExpressions| -|[#3998](https://github.com/NVIDIA/spark-rapids/pull/3998)|Check for CPU cores and free memory in IT script| -|[#3984](https://github.com/NVIDIA/spark-rapids/pull/3984)|Check for data write command before inserting hash sort optimization| -|[#4019](https://github.com/NVIDIA/spark-rapids/pull/4019)|initialize RMM with a single pool size| -|[#3993](https://github.com/NVIDIA/spark-rapids/pull/3993)|Qualification tool: Remove "unsupported" word for nested complex types| -|[#4033](https://github.com/NVIDIA/spark-rapids/pull/4033)|skip spark 330 tests temporarily in nightly [skip ci]| -|[#4029](https://github.com/NVIDIA/spark-rapids/pull/4029)|Update buildall script and the build doc [skip ci]| -|[#4014](https://github.com/NVIDIA/spark-rapids/pull/4014)|fix can't open notebook 'docs/demo/GCP/mortgage-xgboost4j-gpu-scala.ipynb'[skip ci]| -|[#4024](https://github.com/NVIDIA/spark-rapids/pull/4024)|Allow using a custom Spark Resource Name for a GPU| -|[#4012](https://github.com/NVIDIA/spark-rapids/pull/4012)|Add Apache Spark 3.3.0-SNAPSHOT Shims| -|[#4021](https://github.com/NVIDIA/spark-rapids/pull/4021)|Explicitly use the public version of ParquetCachedBatchSerializer| -|[#3869](https://github.com/NVIDIA/spark-rapids/pull/3869)|Add Std dev samp for windowing| -|[#3960](https://github.com/NVIDIA/spark-rapids/pull/3960)|Use a fixed RMM pool size| -|[#3767](https://github.com/NVIDIA/spark-rapids/pull/3767)|Add shim for Databricks 9.1| -|[#3862](https://github.com/NVIDIA/spark-rapids/pull/3862)|Prevent approx_percentile aggregate from being split between CPU and GPU| -|[#3871](https://github.com/NVIDIA/spark-rapids/pull/3871)|Add integration test for RLike with embedded null in input| -|[#3968](https://github.com/NVIDIA/spark-rapids/pull/3968)|Allow null character in regexp_replace pattern| -|[#3821](https://github.com/NVIDIA/spark-rapids/pull/3821)|Support ORC write Map column| -|[#3991](https://github.com/NVIDIA/spark-rapids/pull/3991)|Fix aggregator jar copy logic| -|[#3973](https://github.com/NVIDIA/spark-rapids/pull/3973)|Add shims for Apache Spark 3.2.1-SNAPSHOT builds| -|[#3967](https://github.com/NVIDIA/spark-rapids/pull/3967)|Bring back AST support for BNLJ inner joins| -|[#3947](https://github.com/NVIDIA/spark-rapids/pull/3947)|Enable rlike tests on databricks| -|[#3981](https://github.com/NVIDIA/spark-rapids/pull/3981)|Replace tasks w/ target of maven-antrun-plugin in udf-example| -|[#3976](https://github.com/NVIDIA/spark-rapids/pull/3976)|Replace long artifact lists with an ant loop| -|[#3972](https://github.com/NVIDIA/spark-rapids/pull/3972)|Revert udf-examples dependency change to restore test build phase| -|[#3978](https://github.com/NVIDIA/spark-rapids/pull/3978)|Update aggregator jar name in databricks deploy script| -|[#3965](https://github.com/NVIDIA/spark-rapids/pull/3965)|Add how-to resolve auto-merge conflict [skip ci]| -|[#3963](https://github.com/NVIDIA/spark-rapids/pull/3963)|Add a dedicated RapidsConf option to tolerate GpuOverrides apply failures| -|[#3923](https://github.com/NVIDIA/spark-rapids/pull/3923)|Prepare for 3.2.1 shim, various shim build fixes and improvements| -|[#3969](https://github.com/NVIDIA/spark-rapids/pull/3969)|add doc on using compute-sanitizer| -|[#3964](https://github.com/NVIDIA/spark-rapids/pull/3964)|Qualification tool: Catch exception for invalid regex patterns| -|[#3961](https://github.com/NVIDIA/spark-rapids/pull/3961)|Avoid using HostColumnarToGpu for nested types| -|[#3910](https://github.com/NVIDIA/spark-rapids/pull/3910)|Refactor the aggregate API| -|[#3897](https://github.com/NVIDIA/spark-rapids/pull/3897)|Support running CPU based UDF efficiently| -|[#3950](https://github.com/NVIDIA/spark-rapids/pull/3950)|Fix failed auto-merge #3939| -|[#3946](https://github.com/NVIDIA/spark-rapids/pull/3946)|Document compatability of operations with side effects.| -|[#3945](https://github.com/NVIDIA/spark-rapids/pull/3945)|Update udf-examples dependencies to use dist jar| -|[#3938](https://github.com/NVIDIA/spark-rapids/pull/3938)|remove GDS alignment code| -|[#3943](https://github.com/NVIDIA/spark-rapids/pull/3943)|Add artifact revisions check for nightly tests [skip ci]| -|[#3933](https://github.com/NVIDIA/spark-rapids/pull/3933)|Profiling tool: Print potential problems| -|[#3926](https://github.com/NVIDIA/spark-rapids/pull/3926)|Add zip unzip to integration tests dockerfiles [skip ci]| -|[#3757](https://github.com/NVIDIA/spark-rapids/pull/3757)|Update to nvcomp-2.x JNI APIs| -|[#3922](https://github.com/NVIDIA/spark-rapids/pull/3922)|Stop using -U in build merges aggregator jars of nightly [skip ci]| -|[#3907](https://github.com/NVIDIA/spark-rapids/pull/3907)|Add version properties to integration tests modules| -|[#3912](https://github.com/NVIDIA/spark-rapids/pull/3912)|Stop using -U in the build that merges all aggregator jars| -|[#3909](https://github.com/NVIDIA/spark-rapids/pull/3909)|Fix warning when catching all throwables in GpuOverrides| -|[#3766](https://github.com/NVIDIA/spark-rapids/pull/3766)|Use JCudfSerialization to deserialize a table to host columns| -|[#3820](https://github.com/NVIDIA/spark-rapids/pull/3820)|Advertise CPU orderingSatisfies| -|[#3858](https://github.com/NVIDIA/spark-rapids/pull/3858)|update emr 6.4 getting started doc and pic[skip ci]| -|[#3899](https://github.com/NVIDIA/spark-rapids/pull/3899)|Fix sample test cases| -|[#3896](https://github.com/NVIDIA/spark-rapids/pull/3896)|Xfail the sample tests temporarily| -|[#3848](https://github.com/NVIDIA/spark-rapids/pull/3848)|Fix binary-dedupe failures and improve its performance on macOS| -|[#3867](https://github.com/NVIDIA/spark-rapids/pull/3867)|Disable rlike integration tests on Databricks| -|[#3850](https://github.com/NVIDIA/spark-rapids/pull/3850)|Add explain Plugin API for CPU plan| -|[#3868](https://github.com/NVIDIA/spark-rapids/pull/3868)|Fix incorrect schema of nested types of union - audit SPARK-36673| -|[#3860](https://github.com/NVIDIA/spark-rapids/pull/3860)|Add unit test for GpuKryoRegistrator| -|[#3847](https://github.com/NVIDIA/spark-rapids/pull/3847)|Add Running Qualification App API| -|[#3861](https://github.com/NVIDIA/spark-rapids/pull/3861)|Revert "Fix typo in nightly deploy project list (#3853)" [skip ci]| -|[#3796](https://github.com/NVIDIA/spark-rapids/pull/3796)|Add Rlike support| -|[#3856](https://github.com/NVIDIA/spark-rapids/pull/3856)|Fix not found: type PoissonDistribution in databricks build| -|[#3853](https://github.com/NVIDIA/spark-rapids/pull/3853)|Fix typo in nightly deploy project list| -|[#3831](https://github.com/NVIDIA/spark-rapids/pull/3831)|Support decimal type in ORC writer| -|[#3789](https://github.com/NVIDIA/spark-rapids/pull/3789)|GPU sample exec| -|[#3846](https://github.com/NVIDIA/spark-rapids/pull/3846)|Include pluginRepository for cdh build| -|[#3819](https://github.com/NVIDIA/spark-rapids/pull/3819)|Qualification tool: Detect RDD Api's in SQL plan| -|[#3835](https://github.com/NVIDIA/spark-rapids/pull/3835)|Minor cleanup: do not set cuda stream to null| -|[#3845](https://github.com/NVIDIA/spark-rapids/pull/3845)|Include 'DB_SHIM_NAME' from Databricks jar path to fix nightly deploy [skip ci]| -|[#3523](https://github.com/NVIDIA/spark-rapids/pull/3523)|Interpolate spark.version.classifier in build.dir| -|[#3813](https://github.com/NVIDIA/spark-rapids/pull/3813)|Change `nullOnDivideByZero` from runtime parameter to aggregate expression for `stddev` and `variance` aggregations| -|[#3791](https://github.com/NVIDIA/spark-rapids/pull/3791)|Add audit script to get list of commits from Apache Spark master branch| -|[#3744](https://github.com/NVIDIA/spark-rapids/pull/3744)|Add developer documentation for setting up Microk8s [skip ci]| -|[#3817](https://github.com/NVIDIA/spark-rapids/pull/3817)|Fix auto-merge conflict 3816 [skip ci]| -|[#3804](https://github.com/NVIDIA/spark-rapids/pull/3804)|Missing statistics in GpuBroadcastNestedLoopJoin| -|[#3799](https://github.com/NVIDIA/spark-rapids/pull/3799)|Optimize out bounds checking for joins when the gather map has only valid entries| -|[#3801](https://github.com/NVIDIA/spark-rapids/pull/3801)|Update premerge to use the combined snapshots jar | -|[#3696](https://github.com/NVIDIA/spark-rapids/pull/3696)|Support nested types in ORC writer| -|[#3790](https://github.com/NVIDIA/spark-rapids/pull/3790)|Fix overflow when casting integral to neg scale decimal| -|[#3779](https://github.com/NVIDIA/spark-rapids/pull/3779)|Enable some union of structs tests that were marked xfail| -|[#3787](https://github.com/NVIDIA/spark-rapids/pull/3787)|Fix auto-merge conflict 3786 from branch-21.10 [skip ci]| -|[#3782](https://github.com/NVIDIA/spark-rapids/pull/3782)|Fix auto-merge conflict 3781 [skip ci]| -|[#3778](https://github.com/NVIDIA/spark-rapids/pull/3778)|Remove extra ParquetMaterializer.scala file| -|[#3773](https://github.com/NVIDIA/spark-rapids/pull/3773)|Restore disabled ORC and Parquet tests| -|[#3714](https://github.com/NVIDIA/spark-rapids/pull/3714)|Qualification tool: Error handling while processing large event logs| -|[#3758](https://github.com/NVIDIA/spark-rapids/pull/3758)|Temporarily disable timestamp read tests for Parquet and ORC| -|[#3748](https://github.com/NVIDIA/spark-rapids/pull/3748)|Fix merge conflict with branch-21.10| -|[#3700](https://github.com/NVIDIA/spark-rapids/pull/3700)|CollectSet supports structs| -|[#3740](https://github.com/NVIDIA/spark-rapids/pull/3740)|Throw Exception if failure to load ParquetCachedBatchSerializer class| -|[#3726](https://github.com/NVIDIA/spark-rapids/pull/3726)|Replace Class.forName with ShimLoader.loadClass| -|[#3690](https://github.com/NVIDIA/spark-rapids/pull/3690)|Added support for Array[Struct] to GpuCreateArray| -|[#3728](https://github.com/NVIDIA/spark-rapids/pull/3728)|Qualification tool: Fix bug to process correct listeners| -|[#3734](https://github.com/NVIDIA/spark-rapids/pull/3734)|Fix squashed merge from #3725| -|[#3725](https://github.com/NVIDIA/spark-rapids/pull/3725)|Fix merge conflict with branch-21.10| -|[#3680](https://github.com/NVIDIA/spark-rapids/pull/3680)|cudaMalloc UCX bounce buffers when async allocator is used| -|[#3681](https://github.com/NVIDIA/spark-rapids/pull/3681)|Clean up and document metrics| -|[#3674](https://github.com/NVIDIA/spark-rapids/pull/3674)|Move file TestingV2Source.Scala| -|[#3617](https://github.com/NVIDIA/spark-rapids/pull/3617)|Update Version to 21.12.0-SNAPSHOT| -|[#3612](https://github.com/NVIDIA/spark-rapids/pull/3612)|Add support for nested types as non-key columns on joins | -|[#3619](https://github.com/NVIDIA/spark-rapids/pull/3619)|Added support for Array of Structs| - -## Release 21.10 - -### Features -||| -|:---|:---| -|[#1601](https://github.com/NVIDIA/spark-rapids/issues/1601)|[FEA] Support AggregationFunction StddevSamp| -|[#3223](https://github.com/NVIDIA/spark-rapids/issues/3223)|[FEA] Rework the shim layer to robustly handle ABI and API incompatibilities across Spark releases| -|[#13](https://github.com/NVIDIA/spark-rapids/issues/13)|[FEA] Percentile support| -|[#3606](https://github.com/NVIDIA/spark-rapids/issues/3606)|[FEA] Support approx_percentile on GPU with decimal type| -|[#3552](https://github.com/NVIDIA/spark-rapids/issues/3552)|[FEA] extend allowed datatypes for add and multiply in ANSI mode | -|[#3450](https://github.com/NVIDIA/spark-rapids/issues/3450)|[FEA] test the UCX shuffle with the new build changes| -|[#3043](https://github.com/NVIDIA/spark-rapids/issues/3043)|[FEA] Qualification tool: Add support to filter specific configuration values| -|[#3413](https://github.com/NVIDIA/spark-rapids/issues/3413)|[FEA] Add in support for transform_keys| -|[#3297](https://github.com/NVIDIA/spark-rapids/issues/3297)|[FEA] ORC reader supports reading Map columns.| -|[#3367](https://github.com/NVIDIA/spark-rapids/issues/3367)|[FEA] Support GpuRowToColumnConverter on BinaryType| -|[#3380](https://github.com/NVIDIA/spark-rapids/issues/3380)|[FEA] Support CollectList/CollectSet on nested input types in GroupBy aggregation| -|[#1923](https://github.com/NVIDIA/spark-rapids/issues/1923)|[FEA] Fall back to the CPU when LEAD/LAG wants to IGNORE NULLS| -|[#3044](https://github.com/NVIDIA/spark-rapids/issues/3044)|[FEA] Qualification tool: Report the nested data types| -|[#3045](https://github.com/NVIDIA/spark-rapids/issues/3045)|[FEA] Qualification tool: Report the write data formats.| -|[#3224](https://github.com/NVIDIA/spark-rapids/issues/3224)|[FEA] Add maven compile/package plugin executions, one for each supported Spark dependency version| -|[#3047](https://github.com/NVIDIA/spark-rapids/issues/3047)|[FEA] Profiling tool: Structured output format| -|[#2877](https://github.com/NVIDIA/spark-rapids/issues/2877)|[FEA] Support HashAggregate on struct and nested struct| -|[#2916](https://github.com/NVIDIA/spark-rapids/issues/2916)|[FEA] Support GpuCollectList and GpuCollectSet as TypedImperativeAggregate| -|[#463](https://github.com/NVIDIA/spark-rapids/issues/463)|[FEA] Support NESTED_SCHEMA_PRUNING_ENABLED for ORC| -|[#1481](https://github.com/NVIDIA/spark-rapids/issues/1481)|[FEA] ORC Predicate pushdown for Nested fields| -|[#2879](https://github.com/NVIDIA/spark-rapids/issues/2879)|[FEA] ORC reader supports reading Struct columns.| -|[#27](https://github.com/NVIDIA/spark-rapids/issues/27)|[FEA] test current_date and current_timestamp| -|[#3229](https://github.com/NVIDIA/spark-rapids/issues/3229)|[FEA] Improve CreateMap to support multiple key and value expressions| -|[#3111](https://github.com/NVIDIA/spark-rapids/issues/3111)|[FEA] Support conditional nested loop joins| -|[#3177](https://github.com/NVIDIA/spark-rapids/issues/3177)|[FEA] Support decimal type in ORC reader| -|[#3014](https://github.com/NVIDIA/spark-rapids/issues/3014)|[FEA] Add initial support for CreateMap| -|[#3110](https://github.com/NVIDIA/spark-rapids/issues/3110)|[FEA] Support Map as input to explode and pos_explode| -|[#3046](https://github.com/NVIDIA/spark-rapids/issues/3046)|[FEA] Profiling tool: Scale to run large number of event logs.| -|[#3156](https://github.com/NVIDIA/spark-rapids/issues/3156)|[FEA] Support casting struct to struct| -|[#2876](https://github.com/NVIDIA/spark-rapids/issues/2876)|[FEA] Support joins(SHJ and BHJ) on struct as join key with nested struct in the selected column list| -|[#68](https://github.com/NVIDIA/spark-rapids/issues/68)|[FEA] support StringRepeat| -|[#3042](https://github.com/NVIDIA/spark-rapids/issues/3042)|[FEA] Qualification tool: Add conjunction and disjunction filters.| -|[#2615](https://github.com/NVIDIA/spark-rapids/issues/2615)|[FEA] support collect_list and collect_set as groupby aggregation| -|[#2943](https://github.com/NVIDIA/spark-rapids/issues/2943)|[FEA] Support PreciseTimestampConversion when using windowing function| -|[#2878](https://github.com/NVIDIA/spark-rapids/issues/2878)|[FEA] Support Sort on nested struct| -|[#2133](https://github.com/NVIDIA/spark-rapids/issues/2133)|[FEA] Join support for passing MapType columns along when not join keys| -|[#3041](https://github.com/NVIDIA/spark-rapids/issues/3041)|[FEA] Qualification tool: Add filters based on Regex and user name.| -|[#576](https://github.com/NVIDIA/spark-rapids/issues/576)|[FEA] Spark 3.1 orc nested predicate pushdown support| - -### Performance -||| -|:---|:---| -|[#3651](https://github.com/NVIDIA/spark-rapids/issues/3651)|[DOC] Point users to UCX 1.11.2| -|[#2370](https://github.com/NVIDIA/spark-rapids/issues/2370)|[FEA] RAPIDS Shuffle Manager enable/disable config| -|[#2923](https://github.com/NVIDIA/spark-rapids/issues/2923)|[FEA] Move to dispatched binops instead of JIT binops| - -### Bugs Fixed -||| -|:---|:---| -|[#3929](https://github.com/NVIDIA/spark-rapids/issues/3929)|[BUG] published rapids-4-spark dist artifact references aggregator| -|[#3837](https://github.com/NVIDIA/spark-rapids/issues/3837)|[BUG] Spark-rapids v21.10.0 release candidate jars failed on the OSS validation check.| -|[#3769](https://github.com/NVIDIA/spark-rapids/issues/3769)|[BUG] dedupe fails with find: './parallel-world/spark301/ ...' No such file or directory| -|[#3783](https://github.com/NVIDIA/spark-rapids/issues/3783)|[BUG] spark-rapids v21.10.0 release build failed on script "dist/scripts/binary-dedupe.sh"| -|[#3775](https://github.com/NVIDIA/spark-rapids/issues/3775)|[BUG] Hash aggregate with structs crashes with IllegalArgumentException| -|[#3704](https://github.com/NVIDIA/spark-rapids/issues/3704)|[BUG] Executor-side ClassCastException when testing with Spark 3.2.1-SNAPSHOT in k8s environment| -|[#3760](https://github.com/NVIDIA/spark-rapids/issues/3760)|[BUG] Databricks class cast exception failure | -|[#3736](https://github.com/NVIDIA/spark-rapids/issues/3736)|[BUG] Crossjoin performance degraded a lot on RAPIDS 21.10 snapshot| -|[#3369](https://github.com/NVIDIA/spark-rapids/issues/3369)|[BUG] UDF compiler can cause crashes with unexpected class input| -|[#3713](https://github.com/NVIDIA/spark-rapids/issues/3713)|[BUG] AQE shuffle coalesce optimization is broken with Spark 3.2| -|[#3720](https://github.com/NVIDIA/spark-rapids/issues/3720)|[BUG] Qualification tool warnings| -|[#3718](https://github.com/NVIDIA/spark-rapids/issues/3718)|[BUG] plugin failing to build for CDH due to missing dependency| -|[#3653](https://github.com/NVIDIA/spark-rapids/issues/3653)|[BUG] Issue seen with AQE on in Q5 (possibly others) using Spark 3.2 rc3| -|[#3686](https://github.com/NVIDIA/spark-rapids/issues/3686)|[BUG] binary-dedupe doesn't fail the build on errors| -|[#3520](https://github.com/NVIDIA/spark-rapids/issues/3520)|[BUG] Scaladoc warnings emitted during build| -|[#3516](https://github.com/NVIDIA/spark-rapids/issues/3516)|[BUG] MultiFileParquetPartitionReader can fail while trying to write the footer| -|[#3648](https://github.com/NVIDIA/spark-rapids/issues/3648)|[BUG] test_cast_decimal_to failing in databricks 7.3| -|[#3670](https://github.com/NVIDIA/spark-rapids/issues/3670)|[BUG] mvn test failed compiling rapids-4-spark-tests-next-spark_2.12| -|[#3640](https://github.com/NVIDIA/spark-rapids/issues/3640)|[BUG] q82 regression after #3288| -|[#3642](https://github.com/NVIDIA/spark-rapids/issues/3642)|[BUG] Shims improperly overridden| -|[#3611](https://github.com/NVIDIA/spark-rapids/issues/3611)|[BUG] test_no_fallback_when_ansi_enabled failed in databricks| -|[#3601](https://github.com/NVIDIA/spark-rapids/issues/3601)|[BUG] Latest 21.10 snapshot jars failing with java.lang.ClassNotFoundException: com.nvidia.spark.rapids.ColumnarRdd with XGBoost| -|[#3589](https://github.com/NVIDIA/spark-rapids/issues/3589)|[BUG] Latest 21.10 snapshot jars failing with java.lang.ClassNotFoundException: com.nvidia.spark.ExclusiveModeGpuDiscoveryPlugin| -|[#3424](https://github.com/NVIDIA/spark-rapids/issues/3424)|[BUG] Aggregations in ANSI mode do not detect overflows| -|[#3592](https://github.com/NVIDIA/spark-rapids/issues/3592)|[BUG] Failed to find data source: com.nvidia.spark.rapids.tests.datasourcev2.parquet.ArrowColumnarDataSourceV2| -|[#3580](https://github.com/NVIDIA/spark-rapids/issues/3580)|[BUG] Class deduplication pulls wrong class for ProxyRapidsShuffleInternalManagerBase| -|[#3331](https://github.com/NVIDIA/spark-rapids/issues/3331)|[BUG] Failed to read file into buffer in `CuFile.readFromFile` in gds standalone test| -|[#3376](https://github.com/NVIDIA/spark-rapids/issues/3376)|[BUG] Unit test failures in Spark 3.2 shim build| -|[#3382](https://github.com/NVIDIA/spark-rapids/issues/3382)|[BUG] Support years with up to 7 digits when casting from String to Date in Spark 3.2| -|[#3266](https://github.com/NVIDIA/spark-rapids/issues/3266)|CDP - Flakiness in JoinSuite in Integration tests| -|[#3415](https://github.com/NVIDIA/spark-rapids/issues/3415)|[BUG] Fix regressions in WindowFunctionSuite with Spark 3.2.0| -|[#3548](https://github.com/NVIDIA/spark-rapids/issues/3548)|[BUG] GpuSum overflow on 3.2.0+| -|[#3472](https://github.com/NVIDIA/spark-rapids/issues/3472)|[BUG] GpuAdd and GpuMultiply do not include failOnError| -|[#3502](https://github.com/NVIDIA/spark-rapids/issues/3502)|[BUG] Spark 3.2.0 TimeAdd/TimeSub fail due to new DayTimeIntervalType| -|[#3511](https://github.com/NVIDIA/spark-rapids/issues/3511)|[BUG] "Sequence" function fails with "java.lang.UnsupportedOperationException: Not supported on UnsafeArrayData"| -|[#3518](https://github.com/NVIDIA/spark-rapids/issues/3518)|[BUG] Nightly tests failed with RMM outstanding allocations on shutdown| -|[#3383](https://github.com/NVIDIA/spark-rapids/issues/3383)|[BUG] ParseDateTime should not support special dates with Spark 3.2| -|[#3384](https://github.com/NVIDIA/spark-rapids/issues/3384)|[BUG] AQE does not work with Spark 3.2 due to unrecognized GPU partitioning| -|[#3478](https://github.com/NVIDIA/spark-rapids/issues/3478)|[BUG] CastOpSuite and ParseDateTimeSuite failures spark 302 and others| -|[#3495](https://github.com/NVIDIA/spark-rapids/issues/3495)|Fix shim override config| -|[#3482](https://github.com/NVIDIA/spark-rapids/issues/3482)|[BUG] ClassNotFound error when running a job| -|[#1867](https://github.com/NVIDIA/spark-rapids/issues/1867)|[BUG] In Spark 3.2.0 and above dynamic partition pruning and AQE are not mutually exclusive| -|[#3468](https://github.com/NVIDIA/spark-rapids/issues/3468)|[BUG] GpuKryoRegistrator ClassNotFoundException | -|[#3488](https://github.com/NVIDIA/spark-rapids/issues/3488)|[BUG] databricks 8.2 runtime build failed| -|[#3429](https://github.com/NVIDIA/spark-rapids/issues/3429)|[BUG] test_sortmerge_join_struct_mixed_key_with_null_filter LeftSemi/LeftAnti fails| -|[#3400](https://github.com/NVIDIA/spark-rapids/issues/3400)|[BUG] Canonicalized GPU plans sometimes not consistent when using Spark 3.2| -|[#3440](https://github.com/NVIDIA/spark-rapids/issues/3440)|[BUG] Followup comments from PR3411| -|[#3372](https://github.com/NVIDIA/spark-rapids/issues/3372)|[BUG] 3.2.0 shim: ShuffledBatchRDD.scala:141: match may not be exhaustive.| -|[#3434](https://github.com/NVIDIA/spark-rapids/issues/3434)|[BUG] Fix the unit test failure of KnownNotNull in Scala UDF for Spark 3.2| -|[#3084](https://github.com/NVIDIA/spark-rapids/issues/3084)|[AUDIT] [SPARK-32484][SQL] Fix log info BroadcastExchangeExec.scala| -|[#3463](https://github.com/NVIDIA/spark-rapids/issues/3463)|[BUG] 301+-nondb is named incorrectly| -|[#3435](https://github.com/NVIDIA/spark-rapids/issues/3435)|[BUG] tools - test dsv1 complex and decimal test fails| -|[#3388](https://github.com/NVIDIA/spark-rapids/issues/3388)|[BUG] maven scalastyle checks don't appear to work for alterneate source directories| -|[#3416](https://github.com/NVIDIA/spark-rapids/issues/3416)|[BUG] Resource cleanup issues with Spark 3.2| -|[#3339](https://github.com/NVIDIA/spark-rapids/issues/3339)|[BUG] Databricks test fails test_hash_groupby_collect_partial_replace_fallback| -|[#3375](https://github.com/NVIDIA/spark-rapids/issues/3375)|[BUG] SPARK-35742 Replace semanticEquals with canonicalize| -|[#3334](https://github.com/NVIDIA/spark-rapids/issues/3334)|[BUG] UCX join_test FAILED on spark standalone | -|[#3058](https://github.com/NVIDIA/spark-rapids/issues/3058)|[BUG] GPU ORC reader complains errors when specifying columns that do not exist in file schema.| -|[#3385](https://github.com/NVIDIA/spark-rapids/issues/3385)|[BUG] misc_expr_test FAILED on Dataproc| -|[#2052](https://github.com/NVIDIA/spark-rapids/issues/2052)|[BUG] Spark 3.2.0 test fails due to SPARK-34906 Refactor TreeNode's children handling methods into specialized traits| -|[#3401](https://github.com/NVIDIA/spark-rapids/issues/3401)|[BUG] Qualification tool failed with java.lang.ArrayIndexOutOfBoundsException| -|[#3333](https://github.com/NVIDIA/spark-rapids/issues/3333)|[BUG]Mortgage ETL input_file_name is not correct when using CPU's CsvScan| -|[#3391](https://github.com/NVIDIA/spark-rapids/issues/3391)|[BUG] UDF example build fail| -|[#3379](https://github.com/NVIDIA/spark-rapids/issues/3379)|[BUG] q93 failed w/ UCX| -|[#3364](https://github.com/NVIDIA/spark-rapids/issues/3364)|[BUG] analysis tool cannot handle a job with no tasks.| -|[#3235](https://github.com/NVIDIA/spark-rapids/issues/3235)|Classes directly in Apache Spark packages| -|[#3237](https://github.com/NVIDIA/spark-rapids/issues/3237)|BasicColumnWriteJobStatsTracker might be affected by spark change SPARK-34399| -|[#3134](https://github.com/NVIDIA/spark-rapids/issues/3134)|[BUG] Add more checkings before coalescing ORC files| -|[#3324](https://github.com/NVIDIA/spark-rapids/issues/3324)|[BUG] Databricks builds failing with missing dependency issue| -|[#3244](https://github.com/NVIDIA/spark-rapids/issues/3244)|[BUG] join_test LeftAnti failing on Databricks| -|[#3268](https://github.com/NVIDIA/spark-rapids/issues/3268)|[BUG] CDH ParquetCachedBatchSerializer fails to build due to api change in VectorizedColumnReader| -|[#3305](https://github.com/NVIDIA/spark-rapids/issues/3305)|[BUG] test_case_when failed on Databricks 7.3 nightly build| -|[#3139](https://github.com/NVIDIA/spark-rapids/issues/3139)|[BUG] case when on some nested types can produce a crash| -|[#3253](https://github.com/NVIDIA/spark-rapids/issues/3253)|[BUG] ClassCastException for unsupported TypedImperativeAggregate functions| -|[#3256](https://github.com/NVIDIA/spark-rapids/issues/3256)|[BUG] udf-examples native build broken | -|[#3271](https://github.com/NVIDIA/spark-rapids/issues/3271)|[BUG] Databricks 301 shim compilation error| -|[#3255](https://github.com/NVIDIA/spark-rapids/issues/3255)|[BUG] GpuRunningWindowExecMeta is missing ExecChecks for partitionSpec in databricks runtime| -|[#3222](https://github.com/NVIDIA/spark-rapids/issues/3222)|[BUG] `test_running_window_function_exec_for_all_aggs` failed in the UCX EGX run| -|[#3195](https://github.com/NVIDIA/spark-rapids/issues/3195)|[BUG] failures parquet_test test:read_round_trip| -|[#3176](https://github.com/NVIDIA/spark-rapids/issues/3176)|[BUG] test_window_aggs_for_rows_collect_list[IGNORE_ORDER({'local': True})] FAILED on EGX Yarn cluster| -|[#3187](https://github.com/NVIDIA/spark-rapids/issues/3187)|[BUG] NullPointerException in SLF4J on startup| -|[#3166](https://github.com/NVIDIA/spark-rapids/issues/3166)|[BUG] Unable to build rapids-4-spark jar from source due to missing 3.0.3-SNAPSHOT for spark-sql| -|[#3131](https://github.com/NVIDIA/spark-rapids/issues/3131)|[BUG] hash_aggregate_test TypedImperativeAggregate tests failed| -|[#3147](https://github.com/NVIDIA/spark-rapids/issues/3147)|[BUG] window_function_test.py::test_window_ride_along failed in databricks runtime| -|[#3094](https://github.com/NVIDIA/spark-rapids/issues/3094)|[BUG] join_test.py::test_sortmerge_join_with_conditionals failed in databricks 8.2 runtime| -|[#3078](https://github.com/NVIDIA/spark-rapids/issues/3078)|[BUG] test_hash_join_map, test_sortmerge_join_map failed in databricks runtime| -|[#3059](https://github.com/NVIDIA/spark-rapids/issues/3059)|[BUG] orc_test:test_pred_push_round_trip failed| - -### PRs -||| -|:---|:---| -|[#3940](https://github.com/NVIDIA/spark-rapids/pull/3940)|Update changelog [skip ci]| -|[#3930](https://github.com/NVIDIA/spark-rapids/pull/3930)|Dist artifact with provided aggregator dependency| -|[#3918](https://github.com/NVIDIA/spark-rapids/pull/3918)|Update changelog [skip ci]| -|[#3906](https://github.com/NVIDIA/spark-rapids/pull/3906)|Doc updated for v2110[skip ci]| -|[#3840](https://github.com/NVIDIA/spark-rapids/pull/3840)|Update changelog [skip ci]| -|[#3838](https://github.com/NVIDIA/spark-rapids/pull/3838)|Update deploy script [skip ci]| -|[#3827](https://github.com/NVIDIA/spark-rapids/pull/3827)|Update changelog 21.10 to latest [skip ci]| -|[#3808](https://github.com/NVIDIA/spark-rapids/pull/3808)|Rewording qualification and profiling tools doc files[skip ci]| -|[#3815](https://github.com/NVIDIA/spark-rapids/pull/3815)|Correct 21.10 docs such as PCBS related FAQ [skip ci]| -|[#3807](https://github.com/NVIDIA/spark-rapids/pull/3807)|Update 21.10.0 release doc [skip ci]| -|[#3800](https://github.com/NVIDIA/spark-rapids/pull/3800)|Update approximate percentile documentation| -|[#3810](https://github.com/NVIDIA/spark-rapids/pull/3810)|Update to include Spark 3.2.0 in nosnapshots target so it gets released officially.| -|[#3806](https://github.com/NVIDIA/spark-rapids/pull/3806)|Update spark320.version to 3.2.0| -|[#3795](https://github.com/NVIDIA/spark-rapids/pull/3795)|Reduce usage of escaping in xargs| -|[#3785](https://github.com/NVIDIA/spark-rapids/pull/3785)|[BUG] Update cudf version in version-dev script [skip ci]| -|[#3771](https://github.com/NVIDIA/spark-rapids/pull/3771)|Update cudfjni version to 21.10.0| -|[#3777](https://github.com/NVIDIA/spark-rapids/pull/3777)|Ignore nullability when checking for need to cast aggregation input| -|[#3763](https://github.com/NVIDIA/spark-rapids/pull/3763)|Force parallel world in Shim caller's classloader| -|[#3756](https://github.com/NVIDIA/spark-rapids/pull/3756)|Simplify shim classloader logic| -|[#3746](https://github.com/NVIDIA/spark-rapids/pull/3746)|Avoid using AST on inner joins and avoid coalesce after nested loop join filter| -|[#3719](https://github.com/NVIDIA/spark-rapids/pull/3719)|Advertise CPU sort order and partitioning expressions to Catalyst| -|[#3737](https://github.com/NVIDIA/spark-rapids/pull/3737)|Add note referencing known issues in approx_percentile implementation| -|[#3729](https://github.com/NVIDIA/spark-rapids/pull/3729)|Update to ucx 1.11.2 for 21.10| -|[#3711](https://github.com/NVIDIA/spark-rapids/pull/3711)|Surface problems with overrides and fallback| -|[#3722](https://github.com/NVIDIA/spark-rapids/pull/3722)|CDH build stopped working due to missing jars in maven repo| -|[#3691](https://github.com/NVIDIA/spark-rapids/pull/3691)|Fix issues with AQE and DPP enabled on Spark 3.2| -|[#3373](https://github.com/NVIDIA/spark-rapids/pull/3373)|Support `stddev` and `variance` aggregations families| -|[#3708](https://github.com/NVIDIA/spark-rapids/pull/3708)|disable percentile approx tests| -|[#3695](https://github.com/NVIDIA/spark-rapids/pull/3695)|Remove duplicated data types for collect_list tests| -|[#3687](https://github.com/NVIDIA/spark-rapids/pull/3687)|Improve dedupe script| -|[#3646](https://github.com/NVIDIA/spark-rapids/pull/3646)|Debug utility method to dump a table or columnar batch to Parquet| -|[#3683](https://github.com/NVIDIA/spark-rapids/pull/3683)|Change deploy scripts for new build system| -|[#3301](https://github.com/NVIDIA/spark-rapids/pull/3301)|Approx Percentile| -|[#3673](https://github.com/NVIDIA/spark-rapids/pull/3673)|Add the Scala jar as an external lib for a linkage warning| -|[#3668](https://github.com/NVIDIA/spark-rapids/pull/3668)|Improve the diagnostics in udf compiler for try-and-catch.| -|[#3666](https://github.com/NVIDIA/spark-rapids/pull/3666)|Recompute Parquet block metadata when estimating footer from multiple file input| -|[#3671](https://github.com/NVIDIA/spark-rapids/pull/3671)|Fix tests-spark310+ dependency| -|[#3663](https://github.com/NVIDIA/spark-rapids/pull/3663)|Add back the tests-spark310+| -|[#3657](https://github.com/NVIDIA/spark-rapids/pull/3657)|Revert "Use cudf to compute exact hash join output row sizes (#3288)"| -|[#3643](https://github.com/NVIDIA/spark-rapids/pull/3643)|Properly override Shims for int96Rebase| -|[#3645](https://github.com/NVIDIA/spark-rapids/pull/3645)|Verify unshimmed classes are bitwise-identical| -|[#3650](https://github.com/NVIDIA/spark-rapids/pull/3650)|Fix dist copy dependencies| -|[#3649](https://github.com/NVIDIA/spark-rapids/pull/3649)|Add ignore_order to other fallback tests for the aggregate| -|[#3631](https://github.com/NVIDIA/spark-rapids/pull/3631)|Change premerge to build all Spark versions| -|[#3630](https://github.com/NVIDIA/spark-rapids/pull/3630)|Fix CDH Build | -|[#3636](https://github.com/NVIDIA/spark-rapids/pull/3636)|Change nightly build to not deploy dist for each classifier version [skip ci]| -|[#3632](https://github.com/NVIDIA/spark-rapids/pull/3632)|Revert disabling of ctas test| -|[#3628](https://github.com/NVIDIA/spark-rapids/pull/3628)|Fix 313 ShuffleManager build| -|[#3618](https://github.com/NVIDIA/spark-rapids/pull/3618)|Update changelog script to strip ambiguous annotation [skip ci]| -|[#3626](https://github.com/NVIDIA/spark-rapids/pull/3626)|Add in support for casting decimal to other number types| -|[#3615](https://github.com/NVIDIA/spark-rapids/pull/3615)|Ignore order for the test_no_fallback_when_ansi_enabled| -|[#3602](https://github.com/NVIDIA/spark-rapids/pull/3602)|Dedupe proxy rapids shuffle manager byte code| -|[#3330](https://github.com/NVIDIA/spark-rapids/pull/3330)|Support `int96RebaseModeInWrite` and `int96RebaseModeInRead`| -|[#3438](https://github.com/NVIDIA/spark-rapids/pull/3438)|Parquet read unsigned int: uint8, uin16, uint32| -|[#3607](https://github.com/NVIDIA/spark-rapids/pull/3607)|com.nvidia.spark.rapids.ColumnarRdd not exposed to user for XGBoost| -|[#3566](https://github.com/NVIDIA/spark-rapids/pull/3566)|Enable String Array Max and Min| -|[#3590](https://github.com/NVIDIA/spark-rapids/pull/3590)|Unshim ExclusiveModeGpuDiscoveryPlugin| -|[#3597](https://github.com/NVIDIA/spark-rapids/pull/3597)|ANSI check for aggregates| -|[#3595](https://github.com/NVIDIA/spark-rapids/pull/3595)|Update the overflow check algorithm for Subtract| -|[#3588](https://github.com/NVIDIA/spark-rapids/pull/3588)|Disable test_non_empty_ctas test| -|[#3577](https://github.com/NVIDIA/spark-rapids/pull/3577)|Commonize more shim module files| -|[#3594](https://github.com/NVIDIA/spark-rapids/pull/3594)|Fix nightly integration test script for specfic artifacts| -|[#3544](https://github.com/NVIDIA/spark-rapids/pull/3544)|Add test for nested grouping sets, rollup, cube| -|[#3587](https://github.com/NVIDIA/spark-rapids/pull/3587)|Revert shared class list modifications in PR#3545| -|[#3570](https://github.com/NVIDIA/spark-rapids/pull/3570)|ANSI Support for Abs, UnaryMinus, and Subtract| -|[#3574](https://github.com/NVIDIA/spark-rapids/pull/3574)|Add in ANSI date time fallback| -|[#3578](https://github.com/NVIDIA/spark-rapids/pull/3578)|Deploy all of the classifier versions of the jars [skip ci]| -|[#3569](https://github.com/NVIDIA/spark-rapids/pull/3569)|Add commons-lang3 dependency to tests| -|[#3568](https://github.com/NVIDIA/spark-rapids/pull/3568)|Enable 3.2.0 unit test in premerge and nightly| -|[#3559](https://github.com/NVIDIA/spark-rapids/pull/3559)|Commonize shim module join and shuffle files| -|[#3565](https://github.com/NVIDIA/spark-rapids/pull/3565)|Auto-dedupe ASM-relocated shim dependencies| -|[#3531](https://github.com/NVIDIA/spark-rapids/pull/3531)|Fall back to the CPU for date/time parsing we cannot support yet| -|[#3561](https://github.com/NVIDIA/spark-rapids/pull/3561)|Follow on to ANSI Add| -|[#3557](https://github.com/NVIDIA/spark-rapids/pull/3557)|Add IDEA profile switch workarounds| -|[#3504](https://github.com/NVIDIA/spark-rapids/pull/3504)|Fix reserialization of broadcasted tables| -|[#3556](https://github.com/NVIDIA/spark-rapids/pull/3556)|Fix databricks test.sh script for passing spark shim version| -|[#3545](https://github.com/NVIDIA/spark-rapids/pull/3545)|Dynamic class file deduplication across shims in dist jar build | -|[#3551](https://github.com/NVIDIA/spark-rapids/pull/3551)|Fix window sum overflow for 3.2.0+| -|[#3537](https://github.com/NVIDIA/spark-rapids/pull/3537)|GpuAdd supports ANSI mode.| -|[#3533](https://github.com/NVIDIA/spark-rapids/pull/3533)|Define a SPARK_SHIM_VER to pick up specific rapids-4-spark-integration-tests jars| -|[#3547](https://github.com/NVIDIA/spark-rapids/pull/3547)|Range window supports DayTime on 3.2+| -|[#3534](https://github.com/NVIDIA/spark-rapids/pull/3534)|Fix package name and sql string issue for GpuTimeAdd| -|[#3536](https://github.com/NVIDIA/spark-rapids/pull/3536)|Enable auto-merge from branch 21.10 to 21.12 [skip ci]| -|[#3521](https://github.com/NVIDIA/spark-rapids/pull/3521)|Qualification tool: Report nested complex types in Potential Problems and improve write csv identification.| -|[#3507](https://github.com/NVIDIA/spark-rapids/pull/3507)|TimeAdd supports DayTimeIntervalType| -|[#3529](https://github.com/NVIDIA/spark-rapids/pull/3529)|Support UnsafeArrayData in scalars| -|[#3528](https://github.com/NVIDIA/spark-rapids/pull/3528)|Update NOTICE copyrights to 2021| -|[#3527](https://github.com/NVIDIA/spark-rapids/pull/3527)|Ignore CBO tests that fail against Spark 3.2.0| -|[#3439](https://github.com/NVIDIA/spark-rapids/pull/3439)|Stop parsing special dates for Spark 3.2+| -|[#3524](https://github.com/NVIDIA/spark-rapids/pull/3524)|Update hashing to normalize -0.0 on 3.2+| -|[#3508](https://github.com/NVIDIA/spark-rapids/pull/3508)|Auto abort dup pre-merge builds [skip ci]| -|[#3501](https://github.com/NVIDIA/spark-rapids/pull/3501)|Add limitations for Databricks doc| -|[#3517](https://github.com/NVIDIA/spark-rapids/pull/3517)|Update empty CTAS testing to avoid Hive if possible| -|[#3513](https://github.com/NVIDIA/spark-rapids/pull/3513)|Allow spark320 tests to run with 320 or 321| -|[#3493](https://github.com/NVIDIA/spark-rapids/pull/3493)|Initialze RAPIDS Shuffle Manager at driver/executor startup| -|[#3496](https://github.com/NVIDIA/spark-rapids/pull/3496)|Update parse date to leverage cuDF support for single digit components| -|[#3454](https://github.com/NVIDIA/spark-rapids/pull/3454)|Catch UDF compiler exceptions and fallback to CPU| -|[#3505](https://github.com/NVIDIA/spark-rapids/pull/3505)|Remove doc references to cudf JIT| -|[#3503](https://github.com/NVIDIA/spark-rapids/pull/3503)|Have average support nulls for 3.2.0| -|[#3500](https://github.com/NVIDIA/spark-rapids/pull/3500)|Fix GpuSum type to match resultType| -|[#3485](https://github.com/NVIDIA/spark-rapids/pull/3485)|Fix regressions in cast from string to date and timestamp| -|[#3487](https://github.com/NVIDIA/spark-rapids/pull/3487)|Add databricks build tests to pre-merge CI [skip ci]| -|[#3497](https://github.com/NVIDIA/spark-rapids/pull/3497)|Re-enable spark.rapids.shims-provider-override| -|[#3499](https://github.com/NVIDIA/spark-rapids/pull/3499)|Fix Spark 3.2.0 test_div_by_zero_ansi failures| -|[#3418](https://github.com/NVIDIA/spark-rapids/pull/3418)|Qualification tool: Add filtering based on configuration parameters| -|[#3498](https://github.com/NVIDIA/spark-rapids/pull/3498)|Update the scala repl loader to avoid issues with broadcast.| -|[#3479](https://github.com/NVIDIA/spark-rapids/pull/3479)|Test with Spark 3.2.1-SNAPSHOT| -|[#3474](https://github.com/NVIDIA/spark-rapids/pull/3474)|Build fixes and IDE instructions| -|[#3460](https://github.com/NVIDIA/spark-rapids/pull/3460)|Add DayTimeIntervalType/YearMonthIntervalType support| -|[#3491](https://github.com/NVIDIA/spark-rapids/pull/3491)|Shim GpuKryoRegistrator| -|[#3489](https://github.com/NVIDIA/spark-rapids/pull/3489)|Fix 311 databricks shim for AnsiCastOpSuite failures| -|[#3456](https://github.com/NVIDIA/spark-rapids/pull/3456)|Fallback to CPU when datasource v2 enables RuntimeFiltering| -|[#3417](https://github.com/NVIDIA/spark-rapids/pull/3417)|Adds pre/post steps for merge and update aggregate| -|[#3431](https://github.com/NVIDIA/spark-rapids/pull/3431)|Reinstate test_sortmerge_join_struct_mixed_key_with_null_filter| -|[#3477](https://github.com/NVIDIA/spark-rapids/pull/3477)|Update supported docs to clarify casting floating point to string| -|[#3447](https://github.com/NVIDIA/spark-rapids/pull/3447)|Add CUDA async memory resource as an option| -|[#3473](https://github.com/NVIDIA/spark-rapids/pull/3473)|Create non-shim specific version of ParquetCachedBatchSerializer| -|[#3471](https://github.com/NVIDIA/spark-rapids/pull/3471)|Fix canonicalization of GpuScalarSubquery| -|[#3480](https://github.com/NVIDIA/spark-rapids/pull/3480)|Temporarily disable failing cast string to date tests| -|[#3377](https://github.com/NVIDIA/spark-rapids/pull/3377)|Fix AnsiCastOpSuite failures with Spark 3.2| -|[#3467](https://github.com/NVIDIA/spark-rapids/pull/3467)|Update docs to better describe support for floating point aggregation and NaNs| -|[#3459](https://github.com/NVIDIA/spark-rapids/pull/3459)|Use Shims v2 for ShuffledBatchRDD| -|[#3457](https://github.com/NVIDIA/spark-rapids/pull/3457)|Update the children unpacking pattern for GpuIf.| -|[#3464](https://github.com/NVIDIA/spark-rapids/pull/3464)|Add test for empty relation propagation| -|[#3458](https://github.com/NVIDIA/spark-rapids/pull/3458)|Fix log info GPU BroadcastExchangeExec| -|[#3466](https://github.com/NVIDIA/spark-rapids/pull/3466)|Databricks build fixes for missing shouldFailDivOverflow and removal of needed imports| -|[#3465](https://github.com/NVIDIA/spark-rapids/pull/3465)|Fix name of 301+-nondb directory to stop at Spark 3.2.0| -|[#3452](https://github.com/NVIDIA/spark-rapids/pull/3452)|Enable AQE/DPP test for Spark 3.2| -|[#3436](https://github.com/NVIDIA/spark-rapids/pull/3436)|Qualification tool: Update expected result for test| -|[#3455](https://github.com/NVIDIA/spark-rapids/pull/3455)|Decrease pre_merge_ci parallelism to 4 and reordering time-consuming tests| -|[#3420](https://github.com/NVIDIA/spark-rapids/pull/3420)|`IntegralDivide` throws an exception on overflow in ANSI mode| -|[#3433](https://github.com/NVIDIA/spark-rapids/pull/3433)|Batch scalastyle checks across all modules upfront| -|[#3453](https://github.com/NVIDIA/spark-rapids/pull/3453)|Fix spark-tests script for classifier| -|[#3445](https://github.com/NVIDIA/spark-rapids/pull/3445)|Update nightly build to pull Databricks jars| -|[#3446](https://github.com/NVIDIA/spark-rapids/pull/3446)|Format aggregator pom and commonize some configuration| -|[#3444](https://github.com/NVIDIA/spark-rapids/pull/3444)|Add in tests for unaligned parquet pages| -|[#3451](https://github.com/NVIDIA/spark-rapids/pull/3451)|Fix typo in spark-tests.sh| -|[#3443](https://github.com/NVIDIA/spark-rapids/pull/3443)|Remove 301emr shim| -|[#3441](https://github.com/NVIDIA/spark-rapids/pull/3441)|update deploy script for Databricks| -|[#3414](https://github.com/NVIDIA/spark-rapids/pull/3414)|Add in support for transform_keys| -|[#3320](https://github.com/NVIDIA/spark-rapids/pull/3320)|Add AST support for logical AND and logical OR| -|[#3425](https://github.com/NVIDIA/spark-rapids/pull/3425)|Throw an error by default if CREATE TABLE AS SELECT overwrites data| -|[#3422](https://github.com/NVIDIA/spark-rapids/pull/3422)|Stop double closing SerializeBatchDeserializeHostBuffer host buffers when running with Spark 3.2| -|[#3411](https://github.com/NVIDIA/spark-rapids/pull/3411)|Make new build default and combine into dist package| -|[#3368](https://github.com/NVIDIA/spark-rapids/pull/3368)|Extend TagForReplaceMode to adapt Databricks runtime | -|[#3428](https://github.com/NVIDIA/spark-rapids/pull/3428)|Remove commented-out semanticEquals overrides| -|[#3421](https://github.com/NVIDIA/spark-rapids/pull/3421)|Revert to CUDA runtime image for build| -|[#3381](https://github.com/NVIDIA/spark-rapids/pull/3381)|Implement per-shim parallel world jar classloader| -|[#3303](https://github.com/NVIDIA/spark-rapids/pull/3303)|Update to cudf conditional join change that removes null equality argument| -|[#3408](https://github.com/NVIDIA/spark-rapids/pull/3408)|Add leafNodeDefaultParallelism support| -|[#3426](https://github.com/NVIDIA/spark-rapids/pull/3426)|Correct grammar in qualification tool doc| -|[#3423](https://github.com/NVIDIA/spark-rapids/pull/3423)|Fix hash_aggregate tests that leaked configs| -|[#3412](https://github.com/NVIDIA/spark-rapids/pull/3412)|Restore AST conditional join tests| -|[#3403](https://github.com/NVIDIA/spark-rapids/pull/3403)|Fix canonicalization regression with Spark 3.2| -|[#3394](https://github.com/NVIDIA/spark-rapids/pull/3394)|Orc read map| -|[#3392](https://github.com/NVIDIA/spark-rapids/pull/3392)|Support transforming BinaryType between Row and Columnar| -|[#3393](https://github.com/NVIDIA/spark-rapids/pull/3393)|Fill with null columns for the names exist only in read schema in ORC reader| -|[#3399](https://github.com/NVIDIA/spark-rapids/pull/3399)|Fix collect_list test so it covers nested types properly| -|[#3410](https://github.com/NVIDIA/spark-rapids/pull/3410)|Specify number of RDD slices for ID tests| -|[#3363](https://github.com/NVIDIA/spark-rapids/pull/3363)|Add AST support for null literals| -|[#3396](https://github.com/NVIDIA/spark-rapids/pull/3396)|Throw exception on parse error in ANSI mode when casting String to Date| -|[#3315](https://github.com/NVIDIA/spark-rapids/pull/3315)|Add in reporting of time taken to transition plan to GPU| -|[#3409](https://github.com/NVIDIA/spark-rapids/pull/3409)|Use devel cuda image for premerge CI| -|[#3405](https://github.com/NVIDIA/spark-rapids/pull/3405)|Qualification tool: Filter empty strings from Read Schema| -|[#3387](https://github.com/NVIDIA/spark-rapids/pull/3387)|Fallback to the CPU for IGNORE NULLS on lead and lag| -|[#3398](https://github.com/NVIDIA/spark-rapids/pull/3398)|Fix NPE on string repeat when there is no data buffer| -|[#3366](https://github.com/NVIDIA/spark-rapids/pull/3366)|Fix input_file_xxx issue when FileScan is running on CPU| -|[#3397](https://github.com/NVIDIA/spark-rapids/pull/3397)|Add tests for GpuInSet| -|[#3395](https://github.com/NVIDIA/spark-rapids/pull/3395)|Fix UDF native example build| -|[#3389](https://github.com/NVIDIA/spark-rapids/pull/3389)|Bring back setRapidsShuffleManager in the driver side| -|[#3263](https://github.com/NVIDIA/spark-rapids/pull/3263)|Qualification tool: Report write data format and nested types| -|[#3378](https://github.com/NVIDIA/spark-rapids/pull/3378)|Make Dockerfile.cuda consistent with getting-started-kubernetes.md| -|[#3359](https://github.com/NVIDIA/spark-rapids/pull/3359)|UnionExec array and nested array support| -|[#3342](https://github.com/NVIDIA/spark-rapids/pull/3342)|Profiling tool add CSV output option and add new combined mode| -|[#3365](https://github.com/NVIDIA/spark-rapids/pull/3365)|fix databricks builds| -|[#3323](https://github.com/NVIDIA/spark-rapids/pull/3323)|Enable optional Spark 3.2.0 shim build| -|[#3361](https://github.com/NVIDIA/spark-rapids/pull/3361)|Fix databricks 3.1.1 arrow dependency version| -|[#3354](https://github.com/NVIDIA/spark-rapids/pull/3354)|Support HashAggregate on struct and nested struct| -|[#3341](https://github.com/NVIDIA/spark-rapids/pull/3341)|ArrayMax and ArrayMin support plus map_entries, map_keys, map_values| -|[#3356](https://github.com/NVIDIA/spark-rapids/pull/3356)|Support Databricks 3.0.1 with new build profiles| -|[#3344](https://github.com/NVIDIA/spark-rapids/pull/3344)|Move classes out of Apache Spark packages| -|[#3345](https://github.com/NVIDIA/spark-rapids/pull/3345)|Add job commit time to task tracker stats| -|[#3357](https://github.com/NVIDIA/spark-rapids/pull/3357)|Avoid RAT checks on any CSV file| -|[#3355](https://github.com/NVIDIA/spark-rapids/pull/3355)|Add new authorized user to blossom-ci whitelist [skip ci]| -|[#3340](https://github.com/NVIDIA/spark-rapids/pull/3340)|xfail AST nested loop join tests until cudf empty left table bug is fixed| -|[#3276](https://github.com/NVIDIA/spark-rapids/pull/3276)|Use child type in some places to make it more clear| -|[#3346](https://github.com/NVIDIA/spark-rapids/pull/3346)|Mark more tests as premerge_ci_1| -|[#3353](https://github.com/NVIDIA/spark-rapids/pull/3353)|Fix automerge conflict 3349 [skip ci]| -|[#3335](https://github.com/NVIDIA/spark-rapids/pull/3335)|Support Databricks 3.1.1 in new build profiles| -|[#3317](https://github.com/NVIDIA/spark-rapids/pull/3317)|Adds in support for the transform_values SQL function| -|[#3299](https://github.com/NVIDIA/spark-rapids/pull/3299)|Insert buffer converters for TypedImperativeAggregate| -|[#3325](https://github.com/NVIDIA/spark-rapids/pull/3325)|Fix spark version classifier being applied properly| -|[#3288](https://github.com/NVIDIA/spark-rapids/pull/3288)|Use cudf to compute exact hash join output row sizes| -|[#3318](https://github.com/NVIDIA/spark-rapids/pull/3318)|Fix LeftAnti nested loop join missing condition case| -|[#3316](https://github.com/NVIDIA/spark-rapids/pull/3316)|Fix GpuProjectAstExec when projecting only literals| -|[#3262](https://github.com/NVIDIA/spark-rapids/pull/3262)|Re-enable the struct support for the ORC reader.| -|[#3312](https://github.com/NVIDIA/spark-rapids/pull/3312)|Fix inconsistent function name and add backward compatibility support for premerge job [skip ci]| -|[#3319](https://github.com/NVIDIA/spark-rapids/pull/3319)|Temporarily disable cache test except for spark 3.1.1| -|[#3308](https://github.com/NVIDIA/spark-rapids/pull/3308)|Branch 21.10 FAQ update forward compatibility, update Spark and CUDA versions| -|[#3309](https://github.com/NVIDIA/spark-rapids/pull/3309)|Prepare Spark 3.2.0 related changes| -|[#3289](https://github.com/NVIDIA/spark-rapids/pull/3289)|Support for ArrayTransform| -|[#3307](https://github.com/NVIDIA/spark-rapids/pull/3307)|Fix generation of null scalars in tests| -|[#3306](https://github.com/NVIDIA/spark-rapids/pull/3306)|Update guava to be 30.0-jre| -|[#3304](https://github.com/NVIDIA/spark-rapids/pull/3304)|Fix nested cast type checks| -|[#3302](https://github.com/NVIDIA/spark-rapids/pull/3302)|Fix shim aggregator dependencies when snapshot-shims profile provided| -|[#3291](https://github.com/NVIDIA/spark-rapids/pull/3291)|Bump guava from 28.0-jre to 29.0-jre in /tests| -|[#3292](https://github.com/NVIDIA/spark-rapids/pull/3292)|Bump guava from 28.0-jre to 29.0-jre in /integration_tests| -|[#3293](https://github.com/NVIDIA/spark-rapids/pull/3293)|Bump guava from 28.0-jre to 29.0-jre in /udf-compiler| -|[#3294](https://github.com/NVIDIA/spark-rapids/pull/3294)|Update Qualification and Profiling tool documentation for gh-pages| -|[#3282](https://github.com/NVIDIA/spark-rapids/pull/3282)|Test for `current_date`, `current_timestamp` and `now`| -|[#3298](https://github.com/NVIDIA/spark-rapids/pull/3298)|Minor parent pom fixes| -|[#3296](https://github.com/NVIDIA/spark-rapids/pull/3296)|Support map type in case when expression| -|[#3295](https://github.com/NVIDIA/spark-rapids/pull/3295)|Rename pytest 'slow_test' tag as 'premerge_ci_1' to avoid confusion| -|[#3274](https://github.com/NVIDIA/spark-rapids/pull/3274)|Add m2 cache to fast premerge build| -|[#3283](https://github.com/NVIDIA/spark-rapids/pull/3283)|Fix ClassCastException for unsupported TypedImperativeAggregate functions| -|[#3251](https://github.com/NVIDIA/spark-rapids/pull/3251)|CreateMap support for multiple key-value pairs| -|[#3234](https://github.com/NVIDIA/spark-rapids/pull/3234)|Parquet support for MapType| -|[#3277](https://github.com/NVIDIA/spark-rapids/pull/3277)|Build changes for Spark 3.0.3, 3.0.4, 3.1.1, 3.1.2, 3.1.3, 3.1.1cdh and 3.0.1emr| -|[#3275](https://github.com/NVIDIA/spark-rapids/pull/3275)|Improve over-estimating for ORC coalescing reading| -|[#3280](https://github.com/NVIDIA/spark-rapids/pull/3280)|Update project URL to the public doc website| -|[#3285](https://github.com/NVIDIA/spark-rapids/pull/3285)|Qualification tool: Check for metadata being null| -|[#3281](https://github.com/NVIDIA/spark-rapids/pull/3281)|Decrease parallelism for pre-merge pod to avoid potential OOM kill| -|[#3264](https://github.com/NVIDIA/spark-rapids/pull/3264)|Add parallel support to nightly spark standalone tests| -|[#3257](https://github.com/NVIDIA/spark-rapids/pull/3257)|Add maven compile/package plugin executions for Spark302 and Spark301| -|[#3272](https://github.com/NVIDIA/spark-rapids/pull/3272)|Fix Databricks shim build| -|[#3270](https://github.com/NVIDIA/spark-rapids/pull/3270)|Remove reference to old maven-scala-plugin| -|[#3259](https://github.com/NVIDIA/spark-rapids/pull/3259)|Generate docs for AST from checks| -|[#3164](https://github.com/NVIDIA/spark-rapids/pull/3164)|Support Union on Map types| -|[#3261](https://github.com/NVIDIA/spark-rapids/pull/3261)|Fix some typos[skip ci]| -|[#3242](https://github.com/NVIDIA/spark-rapids/pull/3242)|Support for LeftOuter/BuildRight and RightOuter/BuildLeft nested loop joins| -|[#3239](https://github.com/NVIDIA/spark-rapids/pull/3239)|Support decimal type in orc reader| -|[#3258](https://github.com/NVIDIA/spark-rapids/pull/3258)|Add ExecChecks to Databricks shims for RunningWindowFunctionExec| -|[#3230](https://github.com/NVIDIA/spark-rapids/pull/3230)|Initial support for CreateMap on GPU| -|[#3252](https://github.com/NVIDIA/spark-rapids/pull/3252)|Update to new cudf AST API| -|[#3249](https://github.com/NVIDIA/spark-rapids/pull/3249)|Fix typo in Spark311dbShims| -|[#3183](https://github.com/NVIDIA/spark-rapids/pull/3183)|Add TypeSig checks for join keys and other special cases| -|[#3246](https://github.com/NVIDIA/spark-rapids/pull/3246)|Disable test_broadcast_nested_loop_join_condition_missing_count on Databricks| -|[#3241](https://github.com/NVIDIA/spark-rapids/pull/3241)|Split pytest by 'slow_test' tag and run from different k8s pods to reduce premerge job duration| -|[#3184](https://github.com/NVIDIA/spark-rapids/pull/3184)|Support broadcast nested loop join for LeftSemi and LeftAnti| -|[#3236](https://github.com/NVIDIA/spark-rapids/pull/3236)|Fix Scaladoc warnings in GpuScalaUDF and BufferSendState| -|[#2846](https://github.com/NVIDIA/spark-rapids/pull/2846)|default rmm alloc fraction to the max to avoid unnecessary fragmentation| -|[#3231](https://github.com/NVIDIA/spark-rapids/pull/3231)|Fix some resource leaks in GpuCast and RapidsShuffleServerSuite| -|[#3179](https://github.com/NVIDIA/spark-rapids/pull/3179)|Support GpuFirst/GpuLast on more data types| -|[#3228](https://github.com/NVIDIA/spark-rapids/pull/3228)|Fix unreachable code warnings in GpuCast| -|[#3200](https://github.com/NVIDIA/spark-rapids/pull/3200)|Enable a smoke test for UCX in pre-merge| -|[#3203](https://github.com/NVIDIA/spark-rapids/pull/3203)|Fix Parquet test_round_trip to avoid CPU write exception| -|[#3220](https://github.com/NVIDIA/spark-rapids/pull/3220)|Use LongRangeGen instead of IntegerGen| -|[#3218](https://github.com/NVIDIA/spark-rapids/pull/3218)|Add UCX 1.11.0 to the pre-merge Docker image| -|[#3204](https://github.com/NVIDIA/spark-rapids/pull/3204)|Decrease parallelism for pre-merge integration tests| -|[#3212](https://github.com/NVIDIA/spark-rapids/pull/3212)|Fix merge conflict 3211 [skip ci]| -|[#3188](https://github.com/NVIDIA/spark-rapids/pull/3188)|Exclude slf4j classes from the spark-rapids jar| -|[#3189](https://github.com/NVIDIA/spark-rapids/pull/3189)|Disable snapshot shims by default| -|[#3178](https://github.com/NVIDIA/spark-rapids/pull/3178)|Fix hash_aggregate test failures due to TypedImperativeAggregate| -|[#3190](https://github.com/NVIDIA/spark-rapids/pull/3190)|Update GpuInSet for SPARK-35422 changes| -|[#3193](https://github.com/NVIDIA/spark-rapids/pull/3193)|Append res-life to blossom-ci whitelist [skip ci]| -|[#3175](https://github.com/NVIDIA/spark-rapids/pull/3175)|Add in support for explode on maps| -|[#3171](https://github.com/NVIDIA/spark-rapids/pull/3171)|Refine upload log stage naming in workflow file [skip ci]| -|[#3173](https://github.com/NVIDIA/spark-rapids/pull/3173)|Profile tool: Fix reporting app contains Dataset| -|[#3165](https://github.com/NVIDIA/spark-rapids/pull/3165)|Add optional projection via AST expression evaluation| -|[#3113](https://github.com/NVIDIA/spark-rapids/pull/3113)|Fix order of operations when using mkString in typeConversionInfo| -|[#3161](https://github.com/NVIDIA/spark-rapids/pull/3161)|Rework Profile tool to not require Spark to run and process files faster| -|[#3169](https://github.com/NVIDIA/spark-rapids/pull/3169)|Fix auto-merge conflict 3167 [skip ci]| -|[#3162](https://github.com/NVIDIA/spark-rapids/pull/3162)|Add in more generalized support for casting nested types| -|[#3158](https://github.com/NVIDIA/spark-rapids/pull/3158)|Enable joins on nested structs| -|[#3099](https://github.com/NVIDIA/spark-rapids/pull/3099)|Decimal_128 type checks| -|[#3155](https://github.com/NVIDIA/spark-rapids/pull/3155)|Simple nested additions v2| -|[#2728](https://github.com/NVIDIA/spark-rapids/pull/2728)|Support string `repeat` SQL| -|[#3148](https://github.com/NVIDIA/spark-rapids/pull/3148)|Updated RunningWindow to support extended types too| -|[#3112](https://github.com/NVIDIA/spark-rapids/pull/3112)|Qualification tool: Add conjunction and disjunction filters| -|[#3117](https://github.com/NVIDIA/spark-rapids/pull/3117)|First pass at enabling structs, arrays, and maps for more parts of the plan| -|[#3109](https://github.com/NVIDIA/spark-rapids/pull/3109)|Cudf agg type changes| -|[#2971](https://github.com/NVIDIA/spark-rapids/pull/2971)|Support GpuCollectList and GpuCollectSet as TypedImperativeAggregate| -|[#3107](https://github.com/NVIDIA/spark-rapids/pull/3107)|Add setting to enable/disable RAPIDS Shuffle Manager dynamically| -|[#3105](https://github.com/NVIDIA/spark-rapids/pull/3105)|Add filter in query plan for conditional nested loop and cartesian joins| -|[#3096](https://github.com/NVIDIA/spark-rapids/pull/3096)|add spark311db GpuSortMergeJoinExec conditional joins filter| -|[#3086](https://github.com/NVIDIA/spark-rapids/pull/3086)|Fix Support of MapType in joins on Databricks| -|[#3089](https://github.com/NVIDIA/spark-rapids/pull/3089)|Add filter node in the query plan for conditional joins| -|[#3074](https://github.com/NVIDIA/spark-rapids/pull/3074)|Partial support for time windows| -|[#3061](https://github.com/NVIDIA/spark-rapids/pull/3061)|Support Union on Struct of Map| -|[#3034](https://github.com/NVIDIA/spark-rapids/pull/3034)| Support Sort on nested struct | -|[#3011](https://github.com/NVIDIA/spark-rapids/pull/3011)|Support MapType in joins| -|[#3031](https://github.com/NVIDIA/spark-rapids/pull/3031)|add doc for PR status checks [skip ci]| -|[#3028](https://github.com/NVIDIA/spark-rapids/pull/3028)|Enable parallel build for pre-merge job to reduce overall duration [skip ci]| -|[#3025](https://github.com/NVIDIA/spark-rapids/pull/3025)|Qualification tool: Add regex and username filters.| -|[#2980](https://github.com/NVIDIA/spark-rapids/pull/2980)|Init version 21.10.0| -|[#3000](https://github.com/NVIDIA/spark-rapids/pull/3000)|Merge branch-21.08 to branch-21.10| - -## Release 21.08.1 - -### Bugs Fixed -||| -|:---|:---| -|[#3350](https://github.com/NVIDIA/spark-rapids/issues/3350)|[BUG] Qualification tool: check for metadata being null| - -### PRs -||| -|:---|:---| -|[#3351](https://github.com/NVIDIA/spark-rapids/pull/3351)|Update changelog for tools v21.08.1 release [skip CI]| -|[#3348](https://github.com/NVIDIA/spark-rapids/pull/3348)|Change tool version to 21.08.1 [skip ci]| -|[#3343](https://github.com/NVIDIA/spark-rapids/pull/3343)|Qualification tool backport: Check for metadata being null (#3285)| - -## Release 21.08 - -### Features -||| -|:---|:---| -|[#1584](https://github.com/NVIDIA/spark-rapids/issues/1584)|[FEA] Support rank as window function| -|[#1859](https://github.com/NVIDIA/spark-rapids/issues/1859)|[FEA] Optimize row_number/rank for memory usage| -|[#2976](https://github.com/NVIDIA/spark-rapids/issues/2976)|[FEA] support for arrays in BroadcastNestedLoopJoinExec and CartesianProductExec| -|[#2398](https://github.com/NVIDIA/spark-rapids/issues/2398)|[FEA] `GpuIf ` and `GpuCoalesce` supports `ArrayType`| -|[#2445](https://github.com/NVIDIA/spark-rapids/issues/2445)|[FEA] Support literal arrays in case/when statements| -|[#2757](https://github.com/NVIDIA/spark-rapids/issues/2757)|[FEA] Profiling tool display input data types| -|[#2860](https://github.com/NVIDIA/spark-rapids/issues/2860)|[FEA] Minimal support for LEGACY timeParserPolicy| -|[#2693](https://github.com/NVIDIA/spark-rapids/issues/2693)|[FEA] Profiling Tool: Print GDS + UCX related parameters | -|[#2334](https://github.com/NVIDIA/spark-rapids/issues/2334)|[FEA] Record GPU time and Fetch time separately, instead of recording Total Time| -|[#2685](https://github.com/NVIDIA/spark-rapids/issues/2685)|[FEA] Profiling compare mode for table SQL Duration and Executor CPU Time Percent| -|[#2742](https://github.com/NVIDIA/spark-rapids/issues/2742)|[FEA] include App Name from profiling tool output| -|[#2712](https://github.com/NVIDIA/spark-rapids/issues/2712)|[FEA] Display job and stage info in the dot graph for profiling tool| -|[#2562](https://github.com/NVIDIA/spark-rapids/issues/2562)|[FEA] Implement KnownNotNull on the GPU| -|[#2557](https://github.com/NVIDIA/spark-rapids/issues/2557)|[FEA] support sort_array on GPU| -|[#2307](https://github.com/NVIDIA/spark-rapids/issues/2307)|[FEA] Enable Parquet writing for arrays| -|[#1856](https://github.com/NVIDIA/spark-rapids/issues/1856)|[FEA] Create a batch chunking iterator and integrate it with GpuWindowExec| - -### Performance -||| -|:---|:---| -|[#866](https://github.com/NVIDIA/spark-rapids/issues/866)|[FEA] combine window operations into single call| -|[#2800](https://github.com/NVIDIA/spark-rapids/issues/2800)|[FEA] Support ORC small files coalescing reading| -|[#737](https://github.com/NVIDIA/spark-rapids/issues/737)|[FEA] handle peer timeouts in shuffle| -|[#1590](https://github.com/NVIDIA/spark-rapids/issues/1590)|Rapids Shuffle - UcpListener| -|[#2275](https://github.com/NVIDIA/spark-rapids/issues/2275)|[FEA] UCP error callback deal with cleanup| -|[#2799](https://github.com/NVIDIA/spark-rapids/issues/2799)|[FEA] Support ORC multi-file cloud reading| - -### Bugs Fixed -||| -|:---|:---| -|[#3135](https://github.com/NVIDIA/spark-rapids/issues/3135)|[BUG] Regression seen in `concatenate` in NDS with RAPIDS Shuffle Manager enabled| -|[#3017](https://github.com/NVIDIA/spark-rapids/issues/3017)|[BUG] orc_write_test failed in databricks runtime| -|[#3060](https://github.com/NVIDIA/spark-rapids/issues/3060)|[BUG] ORC read can corrupt data when specified schema does not match file schema ordering| -|[#3065](https://github.com/NVIDIA/spark-rapids/issues/3065)|[BUG] window exec tries to do too much on the GPU| -|[#3066](https://github.com/NVIDIA/spark-rapids/issues/3066)|[BUG] Profiling tool generate dot file fails to convert| -|[#3038](https://github.com/NVIDIA/spark-rapids/issues/3038)|[BUG] leak in `getDeviceMemoryBuffer` for the unspill case| -|[#3007](https://github.com/NVIDIA/spark-rapids/issues/3007)|[BUG] data mess up reading from ORC| -|[#3029](https://github.com/NVIDIA/spark-rapids/issues/3029)|[BUG] udf_test failed in ucx standalone env| -|[#2723](https://github.com/NVIDIA/spark-rapids/issues/2723)|[BUG] test failures in CI build (observed in UCX job) after starting to use 21.08| -|[#3016](https://github.com/NVIDIA/spark-rapids/issues/3016)|[BUG] databricks script failed to return correct exit code| -|[#3002](https://github.com/NVIDIA/spark-rapids/issues/3002)|[BUG] writing parquet with partitionBy() loses sort order| -|[#2959](https://github.com/NVIDIA/spark-rapids/issues/2959)|[BUG] Resolve common code source incompatibility with supported Spark versions| -|[#2589](https://github.com/NVIDIA/spark-rapids/issues/2589)|[BUG] RapidsShuffleHeartbeatManager needs to remove executors that are stale| -|[#2964](https://github.com/NVIDIA/spark-rapids/issues/2964)|[BUG] IGNORE ORDER, WITH DECIMALS: [Window] [MIXED WINDOW SPECS] FAILED in spark 3.0.3+| -|[#2942](https://github.com/NVIDIA/spark-rapids/issues/2942)|[BUG] Cache of Array using ParquetCachedBatchSerializer failed with "DATA ACCESS MUST BE ON A HOST VECTOR"| -|[#2965](https://github.com/NVIDIA/spark-rapids/issues/2965)|[BUG] test_round_robin_sort_fallback failed with ValueError: 'a_1' is not in list| -|[#2891](https://github.com/NVIDIA/spark-rapids/issues/2891)|[BUG] Discrepancy in getting count before and after caching| -|[#2972](https://github.com/NVIDIA/spark-rapids/issues/2972)|[BUG] When using timeout option(-t) of qualification tool, it does not print anything in output after timeout.| -|[#2958](https://github.com/NVIDIA/spark-rapids/issues/2958)|[BUG] When AQE=on, SMJ with a Map in SELECTed list fails with "key not found: numPartitions"| -|[#2929](https://github.com/NVIDIA/spark-rapids/issues/2929)|[BUG] No validation of format strings when formatting dates in legacy timeParserPolicy mode| -|[#2900](https://github.com/NVIDIA/spark-rapids/issues/2900)|[BUG] CAST string to float/double produces incorrect results in some cases| -|[#2957](https://github.com/NVIDIA/spark-rapids/issues/2957)|[BUG] Builds failing due to breaking changes in SPARK-36034| -|[#2901](https://github.com/NVIDIA/spark-rapids/issues/2901)|[BUG] `GpuCompressedColumnVector` cannot be cast to `GpuColumnVector` with AQE| -|[#2899](https://github.com/NVIDIA/spark-rapids/issues/2899)|[BUG] CAST string to integer produces incorrect results in some cases| -|[#2937](https://github.com/NVIDIA/spark-rapids/issues/2937)|[BUG] Fix more edge cases when parsing dates in legacy timeParserPolicy| -|[#2939](https://github.com/NVIDIA/spark-rapids/issues/2939)|[BUG] Window integration tests failing with `Lead expected at least 3 but found 0`| -|[#2912](https://github.com/NVIDIA/spark-rapids/issues/2912)|[BUG] Profiling compare mode fails when comparing spark 2 eventlog to spark 3 event log| -|[#2892](https://github.com/NVIDIA/spark-rapids/issues/2892)|[BUG] UCX error `Message truncated` observed with UCX 1.11 RC in Q77 NDS| -|[#2807](https://github.com/NVIDIA/spark-rapids/issues/2807)|[BUG] Use UCP_AM_FLAG_WHOLE_MSG and UCP_AM_FLAG_PERSISTENT_DATA for receive handlers| -|[#2930](https://github.com/NVIDIA/spark-rapids/issues/2930)|[BUG] Profiling tool does not show "Potential Problems" for dataset API in section "SQL Duration and Executor CPU Time Percent"| -|[#2902](https://github.com/NVIDIA/spark-rapids/issues/2902)|[BUG] CAST string to bool produces incorrect results in some cases| -|[#2850](https://github.com/NVIDIA/spark-rapids/issues/2850)|[BUG] "java.io.InterruptedIOException: getFileStatus on s3a://xxx" for ORC reading in Databricks 8.2 runtime| -|[#2856](https://github.com/NVIDIA/spark-rapids/issues/2856)|[BUG] cache of struct does not work on databricks 8.2ML| -|[#2790](https://github.com/NVIDIA/spark-rapids/issues/2790)|[BUG] In Comparison mode health check does not show the application id| -|[#2713](https://github.com/NVIDIA/spark-rapids/issues/2713)|[BUG] profiling tool does not error or warn if incompatible options are given| -|[#2477](https://github.com/NVIDIA/spark-rapids/issues/2477)|[BUG] test_single_sort_in_part is failed in nightly UCX and AQE (no UCX) integration | -|[#2868](https://github.com/NVIDIA/spark-rapids/issues/2868)|[BUG] to_date produces wrong value on GPU for some corner cases| -|[#2907](https://github.com/NVIDIA/spark-rapids/issues/2907)|[BUG] incorrect expression to detect previously set `--master`| -|[#2893](https://github.com/NVIDIA/spark-rapids/issues/2893)|[BUG] TransferRequest request transactions are getting leaked| -|[#120](https://github.com/NVIDIA/spark-rapids/issues/120)|[BUG] GPU InitCap supports too much white space.| -|[#2786](https://github.com/NVIDIA/spark-rapids/issues/2786)|[BUG][initCap function]There is an issue converting the uppercase character to lowercase on GPU.| -|[#2754](https://github.com/NVIDIA/spark-rapids/issues/2754)|[BUG] cudf_udf tests failed w/ 21.08| -|[#2820](https://github.com/NVIDIA/spark-rapids/issues/2820)|[BUG] Metrics are inconsistent for GpuRowToColumnarToExec| -|[#2710](https://github.com/NVIDIA/spark-rapids/issues/2710)|[BUG] dot file generation can go over the limits of dot| -|[#2772](https://github.com/NVIDIA/spark-rapids/issues/2772)|[BUG] new integration test failures w/ maxFailures=1| -|[#2739](https://github.com/NVIDIA/spark-rapids/issues/2739)|[BUG] CBO causes less efficient plan for NDS q84| -|[#2717](https://github.com/NVIDIA/spark-rapids/issues/2717)|[BUG] CBO forces joins back onto CPU in some cases| -|[#2718](https://github.com/NVIDIA/spark-rapids/issues/2718)|[BUG] CBO falls back to CPU to write to Parquet in some cases| -|[#2692](https://github.com/NVIDIA/spark-rapids/issues/2692)|[BUG] Profiling tool: Add error handling for comparison functions | -|[#2711](https://github.com/NVIDIA/spark-rapids/issues/2711)|[BUG] reused stages should not appear multiple times in dot| -|[#2746](https://github.com/NVIDIA/spark-rapids/issues/2746)|[BUG] test_single_nested_sort_in_part integration test failure 21.08| -|[#2690](https://github.com/NVIDIA/spark-rapids/issues/2690)|[BUG] Profiling tool doesn't properly read rolled log files| -|[#2546](https://github.com/NVIDIA/spark-rapids/issues/2546)|[BUG] Build Failure when building from source| -|[#2750](https://github.com/NVIDIA/spark-rapids/issues/2750)|[BUG] nightly test failed with lists: `testStringReplaceWithBackrefs`| -|[#2644](https://github.com/NVIDIA/spark-rapids/issues/2644)|[BUG] test event logs should be compressed| -|[#2725](https://github.com/NVIDIA/spark-rapids/issues/2725)|[BUG] Heartbeat from unknown executor when running with UCX shuffle in local mode| -|[#2715](https://github.com/NVIDIA/spark-rapids/issues/2715)|[BUG] Part of the plan is not columnar class com.databricks.sql.execution.window.RunningWindowFunc| -|[#2521](https://github.com/NVIDIA/spark-rapids/issues/2521)|[BUG] cudf_udf failed in all spark release intermittently| -|[#1712](https://github.com/NVIDIA/spark-rapids/issues/1712)|[BUG] Scala UDF compiler can decompile UDFs with RAPIDS implementation| - -### PRs -||| -|:---|:---| -|[#3216](https://github.com/NVIDIA/spark-rapids/pull/3216)|Update changelog to include download doc update [skip ci]| -|[#3214](https://github.com/NVIDIA/spark-rapids/pull/3214)|Update download and databricks doc for 21.06.2 [skip ci]| -|[#3210](https://github.com/NVIDIA/spark-rapids/pull/3210)|Update 21.08.0 changelog to latest [skip ci]| -|[#3197](https://github.com/NVIDIA/spark-rapids/pull/3197)|Databricks parquetFilters api change in db 8.2 runtime| -|[#3168](https://github.com/NVIDIA/spark-rapids/pull/3168)|Update 21.08 changelog to latest [skip ci]| -|[#3146](https://github.com/NVIDIA/spark-rapids/pull/3146)|update cudf Java binding version to 21.08.2| -|[#3080](https://github.com/NVIDIA/spark-rapids/pull/3080)|Update docs for 21.08 release| -|[#3136](https://github.com/NVIDIA/spark-rapids/pull/3136)|Update tool docs to explain default filesystem [skip ci]| -|[#3128](https://github.com/NVIDIA/spark-rapids/pull/3128)|Fix merge conflict 3126 from branch-21.06 [skip ci]| -|[#3124](https://github.com/NVIDIA/spark-rapids/pull/3124)|Fix merge conflict 3122 from branch-21.06 [skip ci]| -|[#3100](https://github.com/NVIDIA/spark-rapids/pull/3100)|Update databricks 3.0.1 shim to new ParquetFilter api| -|[#3083](https://github.com/NVIDIA/spark-rapids/pull/3083)|Initial CHANGELOG.md update for 21.08| -|[#3079](https://github.com/NVIDIA/spark-rapids/pull/3079)|Remove the struct support in ORC reader| -|[#3062](https://github.com/NVIDIA/spark-rapids/pull/3062)|Fix ORC read corruption when specified schema does not match file order| -|[#3064](https://github.com/NVIDIA/spark-rapids/pull/3064)|Tweak scaladoc to callout the GDS+unspill case in copyBuffer| -|[#3049](https://github.com/NVIDIA/spark-rapids/pull/3049)|Handle mmap exception more gracefully in RapidsShuffleServer| -|[#3067](https://github.com/NVIDIA/spark-rapids/pull/3067)|Update to UCX 1.11.0| -|[#3024](https://github.com/NVIDIA/spark-rapids/pull/3024)|Check validity of any() or all() results that could be null| -|[#3069](https://github.com/NVIDIA/spark-rapids/pull/3069)|Fall back to the CPU on window partition by struct or array| -|[#3068](https://github.com/NVIDIA/spark-rapids/pull/3068)|Profiling tool generate dot file fails on unescaped html characters| -|[#3048](https://github.com/NVIDIA/spark-rapids/pull/3048)|Apply unique committer job ID fix from SPARK-33230| -|[#3050](https://github.com/NVIDIA/spark-rapids/pull/3050)|Updates for google analytics [skip ci]| -|[#3015](https://github.com/NVIDIA/spark-rapids/pull/3015)|Fix ORC read error when read schema reorders file schema columns| -|[#3053](https://github.com/NVIDIA/spark-rapids/pull/3053)|cherry-pick #3028 [skip ci]| -|[#2887](https://github.com/NVIDIA/spark-rapids/pull/2887)|ORC reader supports struct| -|[#3032](https://github.com/NVIDIA/spark-rapids/pull/3032)|Add disorder read schema test case for Parquet| -|[#3022](https://github.com/NVIDIA/spark-rapids/pull/3022)|Add in docs to describe window performance| -|[#3018](https://github.com/NVIDIA/spark-rapids/pull/3018)|[BUG] fix db script hides error issue| -|[#2953](https://github.com/NVIDIA/spark-rapids/pull/2953)|Add in support for rank and dense_rank| -|[#3009](https://github.com/NVIDIA/spark-rapids/pull/3009)|Propagate child output ordering in GpuCoalesceBatches| -|[#2989](https://github.com/NVIDIA/spark-rapids/pull/2989)|Re-enable Array support in Cartesian Joins, Broadcast Nested Loop Joins| -|[#2999](https://github.com/NVIDIA/spark-rapids/pull/2999)|Remove unused configuration setting spark.rapids.sql.castStringToInteger.enabled| -|[#2967](https://github.com/NVIDIA/spark-rapids/pull/2967)|Resolve hidden source incompatibility between Spark30x and Spark31x Shims| -|[#2982](https://github.com/NVIDIA/spark-rapids/pull/2982)|Add FAQ entry for timezone error| -|[#2839](https://github.com/NVIDIA/spark-rapids/pull/2839)|GpuIf and GpuCoalesce support array and struct types| -|[#2987](https://github.com/NVIDIA/spark-rapids/pull/2987)|Update documentation for unsupported edge cases when casting from string to timestamp| -|[#2977](https://github.com/NVIDIA/spark-rapids/pull/2977)|Expire executors from the RAPIDS shuffle heartbeat manager on timeout| -|[#2985](https://github.com/NVIDIA/spark-rapids/pull/2985)|Move tools README to docs/additional-functionality/qualification-profiling-tools.md with some modification| -|[#2992](https://github.com/NVIDIA/spark-rapids/pull/2992)|Remove commented/redundant window-function tests.| -|[#2994](https://github.com/NVIDIA/spark-rapids/pull/2994)|Tweak RAPIDS Shuffle Manager configs for 21.08| -|[#2984](https://github.com/NVIDIA/spark-rapids/pull/2984)|Avoid comparing window range canonicalized plans on Spark 3.0.x| -|[#2970](https://github.com/NVIDIA/spark-rapids/pull/2970)|Put the GPU data back on host before processing cache on CPU| -|[#2986](https://github.com/NVIDIA/spark-rapids/pull/2986)|Avoid struct aliasing in test_round_robin_sort_fallback| -|[#2935](https://github.com/NVIDIA/spark-rapids/pull/2935)|Read the complete batch before returning when selectedAttributes is empty| -|[#2826](https://github.com/NVIDIA/spark-rapids/pull/2826)|CaseWhen supports scalar of list and struct| -|[#2978](https://github.com/NVIDIA/spark-rapids/pull/2978)|enable auto-merge from branch 21.08 to 21.10 [skip ci]| -|[#2946](https://github.com/NVIDIA/spark-rapids/pull/2946)|ORC reader supports list| -|[#2947](https://github.com/NVIDIA/spark-rapids/pull/2947)|Qualification tool: Filter based on timestamp in event logs| -|[#2973](https://github.com/NVIDIA/spark-rapids/pull/2973)|Assert that CPU and GPU row fields match when present| -|[#2974](https://github.com/NVIDIA/spark-rapids/pull/2974)|Qualification tool: fix performance regression| -|[#2948](https://github.com/NVIDIA/spark-rapids/pull/2948)|Remove unnecessary copies of ParquetCachedBatchSerializer| -|[#2968](https://github.com/NVIDIA/spark-rapids/pull/2968)|Fix AQE CustomShuffleReaderExec not seeing ShuffleQueryStageExec| -|[#2969](https://github.com/NVIDIA/spark-rapids/pull/2969)|Make the dir for spark301 shuffle shim match package name| -|[#2933](https://github.com/NVIDIA/spark-rapids/pull/2933)|Improve CAST string to float implementation to handle more edge cases| -|[#2963](https://github.com/NVIDIA/spark-rapids/pull/2963)|Add override getParquetFilters for shim 304| -|[#2956](https://github.com/NVIDIA/spark-rapids/pull/2956)|Profile Tool: make order consistent between runs| -|[#2924](https://github.com/NVIDIA/spark-rapids/pull/2924)|Fix bug when collecting directly from a GPU shuffle query stage with AQE on| -|[#2950](https://github.com/NVIDIA/spark-rapids/pull/2950)|Fix shutdown bugs in the RAPIDS Shuffle Manager| -|[#2922](https://github.com/NVIDIA/spark-rapids/pull/2922)|Improve UCX assertion to show the failed assertion| -|[#2961](https://github.com/NVIDIA/spark-rapids/pull/2961)|Fix ParquetFilters issue| -|[#2951](https://github.com/NVIDIA/spark-rapids/pull/2951)|Qualification tool: Allow app start and app name filtering and test with filesystem filters| -|[#2941](https://github.com/NVIDIA/spark-rapids/pull/2941)|Make test event log compression codec configurable| -|[#2919](https://github.com/NVIDIA/spark-rapids/pull/2919)|Fix bugs in CAST string to integer| -|[#2944](https://github.com/NVIDIA/spark-rapids/pull/2944)|Fix childExprs list for GpuWindowExpression, for Spark 3.1.x.| -|[#2917](https://github.com/NVIDIA/spark-rapids/pull/2917)|Refine GpuHashAggregateExec.setupReference| -|[#2909](https://github.com/NVIDIA/spark-rapids/pull/2909)|Support orc coalescing reading| -|[#2938](https://github.com/NVIDIA/spark-rapids/pull/2938)|Qualification tool: Add negation filter| -|[#2940](https://github.com/NVIDIA/spark-rapids/pull/2940)|qualification tool: add filtering by app start time| -|[#2928](https://github.com/NVIDIA/spark-rapids/pull/2928)|Qualification tool support recognizing decimal operations| -|[#2934](https://github.com/NVIDIA/spark-rapids/pull/2934)|Qualification tool: Add filter based on appName| -|[#2904](https://github.com/NVIDIA/spark-rapids/pull/2904)|Qualification and Profiling tool handle Read formats and datatypes| -|[#2927](https://github.com/NVIDIA/spark-rapids/pull/2927)|Restore aggregation sorted data hint| -|[#2932](https://github.com/NVIDIA/spark-rapids/pull/2932)|Profiling tool: Fix comparing spark2 and spark3 event logs| -|[#2926](https://github.com/NVIDIA/spark-rapids/pull/2926)|GPU Active Messages for all buffer types| -|[#2888](https://github.com/NVIDIA/spark-rapids/pull/2888)|Type check with the information from RapidsMeta| -|[#2903](https://github.com/NVIDIA/spark-rapids/pull/2903)|Fix cast string to bool| -|[#2895](https://github.com/NVIDIA/spark-rapids/pull/2895)|Add in running window optimization using scan| -|[#2859](https://github.com/NVIDIA/spark-rapids/pull/2859)|Add spillable batch caching and sort fallback to hash aggregation| -|[#2898](https://github.com/NVIDIA/spark-rapids/pull/2898)|Add fuzz tests for cast from string to other types| -|[#2881](https://github.com/NVIDIA/spark-rapids/pull/2881)|fix orc readers leak issue for ORC PERFILE type| -|[#2842](https://github.com/NVIDIA/spark-rapids/pull/2842)|Support STRUCT/STRING for LEAD()/LAG()| -|[#2880](https://github.com/NVIDIA/spark-rapids/pull/2880)|Added ParquetCachedBatchSerializer support for Databricks| -|[#2911](https://github.com/NVIDIA/spark-rapids/pull/2911)|Add in ID as sort for Job + Stage level aggregated task metrics| -|[#2914](https://github.com/NVIDIA/spark-rapids/pull/2914)|Profiling tool: add app index to tables that don't have it| -|[#2906](https://github.com/NVIDIA/spark-rapids/pull/2906)|Fix compiler warning| -|[#2890](https://github.com/NVIDIA/spark-rapids/pull/2890)|Fix cast to date bug| -|[#2908](https://github.com/NVIDIA/spark-rapids/pull/2908)|Fixes bad string contains in run_pyspark_from_build| -|[#2886](https://github.com/NVIDIA/spark-rapids/pull/2886)|Use UCP Listener for UCX connections and enable peer error handling| -|[#2875](https://github.com/NVIDIA/spark-rapids/pull/2875)|Add support for timeParserPolicy=LEGACY| -|[#2894](https://github.com/NVIDIA/spark-rapids/pull/2894)|Fixes a JVM leak for UCX TransactionRequests| -|[#2854](https://github.com/NVIDIA/spark-rapids/pull/2854)|Qualification Tool to output only the 'k' highest-ranked or 'k' lowest-ranked applications | -|[#2873](https://github.com/NVIDIA/spark-rapids/pull/2873)|Fix infinite loop in MultiFileCloudPartitionReaderBase| -|[#2838](https://github.com/NVIDIA/spark-rapids/pull/2838)|Replace `toTitle` with `capitalize` for GpuInitCap| -|[#2870](https://github.com/NVIDIA/spark-rapids/pull/2870)|Avoid readers acquiring GPU on next batch query if not first batch| -|[#2882](https://github.com/NVIDIA/spark-rapids/pull/2882)|Refactor window operations to do them in the exec| -|[#2874](https://github.com/NVIDIA/spark-rapids/pull/2874)|Update audit script to clone branch-3.2 instead of master| -|[#2843](https://github.com/NVIDIA/spark-rapids/pull/2843)|Qualification/Profiling tool add tests for Spark2 event logs| -|[#2828](https://github.com/NVIDIA/spark-rapids/pull/2828)|add cloud reading for orc| -|[#2721](https://github.com/NVIDIA/spark-rapids/pull/2721)|Check-list for corner cases in testing.| -|[#2675](https://github.com/NVIDIA/spark-rapids/pull/2675)|Support for Decimals with negative scale for Parquet Cached Batch Serializer| -|[#2849](https://github.com/NVIDIA/spark-rapids/pull/2849)|Update release notes to include qualification and profiling tool| -|[#2852](https://github.com/NVIDIA/spark-rapids/pull/2852)|Fix hash aggregate tests leaking configs into other tests| -|[#2845](https://github.com/NVIDIA/spark-rapids/pull/2845)|Split window exec into multiple stages if needed| -|[#2853](https://github.com/NVIDIA/spark-rapids/pull/2853)|Tag last batch when coalescing| -|[#2851](https://github.com/NVIDIA/spark-rapids/pull/2851)|Fix build failure - update ucx profiling test to fix parameter type to getEventLogInfo| -|[#2785](https://github.com/NVIDIA/spark-rapids/pull/2785)|Profiling tool: Print UCX and GDS parameters| -|[#2840](https://github.com/NVIDIA/spark-rapids/pull/2840)|Fix Gpu -> GPU| -|[#2844](https://github.com/NVIDIA/spark-rapids/pull/2844)|Document Qualification tool Spark requirements| -|[#2787](https://github.com/NVIDIA/spark-rapids/pull/2787)|Add metrics definition link to tool README.md[skip ci] | -|[#2841](https://github.com/NVIDIA/spark-rapids/pull/2841)|Add a threadpool to Qualification tool to process logs in parallel| -|[#2833](https://github.com/NVIDIA/spark-rapids/pull/2833)|Stop running so many versions of Spark unit tests for premerge| -|[#2837](https://github.com/NVIDIA/spark-rapids/pull/2837)|Append new authorized user to blossom-ci whitelist [skip ci]| -|[#2822](https://github.com/NVIDIA/spark-rapids/pull/2822)|Rewrite Qualification tool for better performance| -|[#2823](https://github.com/NVIDIA/spark-rapids/pull/2823)|Add semaphoreWaitTime and gpuOpTime for GpuRowToColumnarExec| -|[#2829](https://github.com/NVIDIA/spark-rapids/pull/2829)|Fix filtering directories on compression extension match| -|[#2720](https://github.com/NVIDIA/spark-rapids/pull/2720)|Add metrics documentation to the tuning guide| -|[#2816](https://github.com/NVIDIA/spark-rapids/pull/2816)|Improve some existing collectTime handling| -|[#2821](https://github.com/NVIDIA/spark-rapids/pull/2821)|Truncate long plan labels and refer to "print-plans"| -|[#2827](https://github.com/NVIDIA/spark-rapids/pull/2827)|Update cmake to build udf native [skip ci]| -|[#2793](https://github.com/NVIDIA/spark-rapids/pull/2793)|Report equivilant stages/sql ids as a part of compare| -|[#2810](https://github.com/NVIDIA/spark-rapids/pull/2810)|Use SecureRandom for UCPListener TCP port choice| -|[#2798](https://github.com/NVIDIA/spark-rapids/pull/2798)|Mirror apache repos to urm| -|[#2788](https://github.com/NVIDIA/spark-rapids/pull/2788)|Update the type signatures for some expressions| -|[#2792](https://github.com/NVIDIA/spark-rapids/pull/2792)|Automatically set spark.task.maxFailures and local[*, maxFailures]| -|[#2805](https://github.com/NVIDIA/spark-rapids/pull/2805)|Revert "Use UCX Active Messages for all shuffle transfers (#2735)"| -|[#2796](https://github.com/NVIDIA/spark-rapids/pull/2796)|show disk bytes spilled when GDS spill is enabled| -|[#2801](https://github.com/NVIDIA/spark-rapids/pull/2801)|Update pre-merge to use reserved_pool [skip ci]| -|[#2795](https://github.com/NVIDIA/spark-rapids/pull/2795)|Improve CBO debug logging| -|[#2794](https://github.com/NVIDIA/spark-rapids/pull/2794)|Prevent integer overflow when estimating data sizes in cost-based optimizer| -|[#2784](https://github.com/NVIDIA/spark-rapids/pull/2784)|Make spark303 shim version w/o snapshot and add shim layer for spark304| -|[#2744](https://github.com/NVIDIA/spark-rapids/pull/2744)|Cost-based optimizer: Implement simple cost model that demonstrates benefits with NDS queries| -|[#2762](https://github.com/NVIDIA/spark-rapids/pull/2762)| Profiling tool: Update comparison mode output format and add error handling| -|[#2761](https://github.com/NVIDIA/spark-rapids/pull/2761)|Update dot graph to include stages and remove some duplication| -|[#2760](https://github.com/NVIDIA/spark-rapids/pull/2760)|Add in application timeline to profiling tool| -|[#2735](https://github.com/NVIDIA/spark-rapids/pull/2735)|Use UCX Active Messages for all shuffle transfers| -|[#2732](https://github.com/NVIDIA/spark-rapids/pull/2732)|qualification and profiling tool support rolled and compressed event logs for CSPs and Apache Spark| -|[#2768](https://github.com/NVIDIA/spark-rapids/pull/2768)|Make window function test results deterministic.| -|[#2769](https://github.com/NVIDIA/spark-rapids/pull/2769)|Add developer documentation for Adaptive Query Execution| -|[#2532](https://github.com/NVIDIA/spark-rapids/pull/2532)|date_format should not suggest enabling incompatibleDateFormats for formats we cannot support| -|[#2743](https://github.com/NVIDIA/spark-rapids/pull/2743)|Disable dynamicAllocation and set maxFailures to 1 in integration tests| -|[#2749](https://github.com/NVIDIA/spark-rapids/pull/2749)|Revert "Add in support for lists in some joins (#2702)"| -|[#2181](https://github.com/NVIDIA/spark-rapids/pull/2181)|abstract the parquet coalescing reading| -|[#2753](https://github.com/NVIDIA/spark-rapids/pull/2753)|Merge branch-21.06 to branch-21.08 [skip ci]| -|[#2751](https://github.com/NVIDIA/spark-rapids/pull/2751)|remove invalid blossom-ci users [skip ci]| -|[#2707](https://github.com/NVIDIA/spark-rapids/pull/2707)|Support `KnownNotNull` running on GPU| -|[#2747](https://github.com/NVIDIA/spark-rapids/pull/2747)|Fix num_slices for test_single_nested_sort_in_part| -|[#2729](https://github.com/NVIDIA/spark-rapids/pull/2729)|fix 301db-shim typecheck typo| -|[#2726](https://github.com/NVIDIA/spark-rapids/pull/2726)|Fix local mode starting RAPIDS shuffle heartbeats| -|[#2722](https://github.com/NVIDIA/spark-rapids/pull/2722)|Support aggregation on NullType in RunningWindowExec| -|[#2719](https://github.com/NVIDIA/spark-rapids/pull/2719)|Avoid executing child plan twice in CoalesceExec| -|[#2586](https://github.com/NVIDIA/spark-rapids/pull/2586)|Update metrics use in GpuUnionExec and GpuCoalesceExec| -|[#2716](https://github.com/NVIDIA/spark-rapids/pull/2716)|Add file size check to pre-merge CI| -|[#2554](https://github.com/NVIDIA/spark-rapids/pull/2554)|Upload build failure log to Github for external contributors access| -|[#2596](https://github.com/NVIDIA/spark-rapids/pull/2596)|Initial running window memory optimization| -|[#2702](https://github.com/NVIDIA/spark-rapids/pull/2702)|Add in support for arrays in BroadcastNestedLoopJoinExec and CartesianProductExec| -|[#2699](https://github.com/NVIDIA/spark-rapids/pull/2699)|Add a pre-commit hook to reject large files| -|[#2700](https://github.com/NVIDIA/spark-rapids/pull/2700)|Set numSlices and use parallelize to build dataframe for partition-se…| -|[#2548](https://github.com/NVIDIA/spark-rapids/pull/2548)|support collect_set in rolling window| -|[#2661](https://github.com/NVIDIA/spark-rapids/pull/2661)|Make tools inherit common dependency versions from parent pom| -|[#2668](https://github.com/NVIDIA/spark-rapids/pull/2668)|Remove CUDA 10.x from getting started guide [skip ci]| -|[#2676](https://github.com/NVIDIA/spark-rapids/pull/2676)|Profiling tool: Print Job Information in compare mode| -|[#2679](https://github.com/NVIDIA/spark-rapids/pull/2679)|Merge branch-21.06 to branch-21.08 [skip ci]| -|[#2677](https://github.com/NVIDIA/spark-rapids/pull/2677)|Add pre-merge independent stage timeout [skip ci]| -|[#2616](https://github.com/NVIDIA/spark-rapids/pull/2616)|support GpuSortArray| -|[#2582](https://github.com/NVIDIA/spark-rapids/pull/2582)|support parquet write arrays| -|[#2609](https://github.com/NVIDIA/spark-rapids/pull/2609)|Fix automerge failure from branch-21.06 to branch-21.08| -|[#2570](https://github.com/NVIDIA/spark-rapids/pull/2570)|Added nested structs to UnionExec| -|[#2581](https://github.com/NVIDIA/spark-rapids/pull/2581)|Fix merge conflict 2580 [skip ci]| -|[#2458](https://github.com/NVIDIA/spark-rapids/pull/2458)|Split batch by key for window operations| -|[#2565](https://github.com/NVIDIA/spark-rapids/pull/2565)|Merge branch-21.06 into branch-21.08| -|[#2563](https://github.com/NVIDIA/spark-rapids/pull/2563)|Document: git commit twice when copyright year updated by hook| -|[#2561](https://github.com/NVIDIA/spark-rapids/pull/2561)|Fixing the merge of 21.06 to 21.08 for comment changes in Profiling tool| -|[#2558](https://github.com/NVIDIA/spark-rapids/pull/2558)|Fix cdh shim version in 21.08 [skip ci]| -|[#2543](https://github.com/NVIDIA/spark-rapids/pull/2543)|Init branch-21.08| - -## Release 21.06.2 - -### Bugs Fixed -||| -|:---|:---| -|[#3191](https://github.com/NVIDIA/spark-rapids/issues/3191)|[BUG] Databricks parquetFilters build failure in db 8.2 runtime| - -### PRs -||| -|:---|:---| -|[#3209](https://github.com/NVIDIA/spark-rapids/pull/3209)|Update 21.06.2 changelog [skip ci]| -|[#3208](https://github.com/NVIDIA/spark-rapids/pull/3208)|Update rapids plugin version to 21.06.2 [skip ci]| -|[#3207](https://github.com/NVIDIA/spark-rapids/pull/3207)|Disable auto-merge from 21.06 to 21.08 [skip ci]| -|[#3205](https://github.com/NVIDIA/spark-rapids/pull/3205)|Branch 21.06 databricks update [skip ci]| -|[#3198](https://github.com/NVIDIA/spark-rapids/pull/3198)|Databricks parquetFilters api change in db 8.2 runtime| - -## Release 21.06.1 - -### Bugs Fixed -||| -|:---|:---| -|[#3098](https://github.com/NVIDIA/spark-rapids/issues/3098)|[BUG] Databricks parquetFilters build failure| - -### PRs -||| -|:---|:---| -|[#3127](https://github.com/NVIDIA/spark-rapids/pull/3127)|Update CHANGELOG for the release v21.06.1 [skip ci]| -|[#3123](https://github.com/NVIDIA/spark-rapids/pull/3123)|Update rapids plugin version to 21.06.1 [skip ci]| -|[#3118](https://github.com/NVIDIA/spark-rapids/pull/3118)|Fix databricks 3.0.1 for ParquetFilters api change| -|[#3119](https://github.com/NVIDIA/spark-rapids/pull/3119)|Branch 21.06 databricks update [skip ci]| - -## Release 21.06 - -### Features -||| -|:---|:---| -|[#2483](https://github.com/NVIDIA/spark-rapids/issues/2483)|[FEA] Profiling and qualification tool| -|[#951](https://github.com/NVIDIA/spark-rapids/issues/951)|[FEA] Create Cloudera shim layer| -|[#2481](https://github.com/NVIDIA/spark-rapids/issues/2481)|[FEA] Support Spark 3.1.2| -|[#2530](https://github.com/NVIDIA/spark-rapids/issues/2530)|[FEA] Add support for Struct columns in CoalesceExec| -|[#2512](https://github.com/NVIDIA/spark-rapids/issues/2512)|[FEA] Report gpuOpTime not totalTime for expand, generate, and range execs| -|[#63](https://github.com/NVIDIA/spark-rapids/issues/63)|[FEA] support ConcatWs sql function| -|[#2501](https://github.com/NVIDIA/spark-rapids/issues/2501)|[FEA] Add support for scalar structs to named_struct| -|[#2286](https://github.com/NVIDIA/spark-rapids/issues/2286)|[FEA] update UCX documentation for branch 21.06| -|[#2436](https://github.com/NVIDIA/spark-rapids/issues/2436)|[FEA] Support nested types in CreateNamedStruct| -|[#2461](https://github.com/NVIDIA/spark-rapids/issues/2461)|[FEA] Report gpuOpTime instead of totalTime for project, filter, window, limit| -|[#2465](https://github.com/NVIDIA/spark-rapids/issues/2465)|[FEA] GpuFilterExec should report gpuOpTime not totalTime| -|[#2013](https://github.com/NVIDIA/spark-rapids/issues/2013)|[FEA] Support concatenating ArrayType columns| -|[#2425](https://github.com/NVIDIA/spark-rapids/issues/2425)|[FEA] Support for casting array of floats to array of doubles| -|[#2012](https://github.com/NVIDIA/spark-rapids/issues/2012)|[FEA] Support Window functions(lead & lag) for ArrayType| -|[#2011](https://github.com/NVIDIA/spark-rapids/issues/2011)|[FEA] Support creation of 2D array type| -|[#1582](https://github.com/NVIDIA/spark-rapids/issues/1582)|[FEA] Allow StructType as input and output type to InMemoryTableScan and InMemoryRelation| -|[#216](https://github.com/NVIDIA/spark-rapids/issues/216)|[FEA] Range window-functions must support non-timestamp order-by expressions| -|[#2390](https://github.com/NVIDIA/spark-rapids/issues/2390)|[FEA] CI/CD for databricks 8.2 runtime| -|[#2273](https://github.com/NVIDIA/spark-rapids/issues/2273)|[FEA] Enable struct type columns for GpuHashAggregateExec| -|[#20](https://github.com/NVIDIA/spark-rapids/issues/20)|[FEA] Support out of core joins| -|[#2160](https://github.com/NVIDIA/spark-rapids/issues/2160)|[FEA] Support Databricks 8.2 ML Runtime| -|[#2330](https://github.com/NVIDIA/spark-rapids/issues/2330)|[FEA] Enable hash partitioning with arrays| -|[#1103](https://github.com/NVIDIA/spark-rapids/issues/1103)|[FEA] Support date_format on GPU| -|[#1125](https://github.com/NVIDIA/spark-rapids/issues/1125)|[FEA] explode() can take expressions that generate arrays| -|[#1605](https://github.com/NVIDIA/spark-rapids/issues/1605)|[FEA] Support sorting on struct type keys| - -### Performance -||| -|:---|:---| -|[#1445](https://github.com/NVIDIA/spark-rapids/issues/1445)|[FEA] GDS Integration| -|[#1588](https://github.com/NVIDIA/spark-rapids/issues/1588)|Rapids shuffle - UCX active messages| -|[#2367](https://github.com/NVIDIA/spark-rapids/issues/2367)|[FEA] CBO: Implement costs for memory access and launching kernels| -|[#2431](https://github.com/NVIDIA/spark-rapids/issues/2431)|[FEA] CBO should show benefits with q24b with decimals enabled| - -### Bugs Fixed -||| -|:---|:---| -|[#2652](https://github.com/NVIDIA/spark-rapids/issues/2652)|[BUG] No Job Found. Exiting.| -|[#2659](https://github.com/NVIDIA/spark-rapids/issues/2659)|[FEA] Group profiling tool "Potential Problems"| -|[#2680](https://github.com/NVIDIA/spark-rapids/issues/2680)|[BUG] cast can throw NPE| -|[#2628](https://github.com/NVIDIA/spark-rapids/issues/2628)|[BUG] failed to build plugin in databricks runtime 8.2| -|[#2605](https://github.com/NVIDIA/spark-rapids/issues/2605)|[BUG] test_pandas_map_udf_nested_type failed in Yarn integration| -|[#2622](https://github.com/NVIDIA/spark-rapids/issues/2622)|[BUG] compressed event logs are not processed| -|[#2478](https://github.com/NVIDIA/spark-rapids/issues/2478)|[BUG] When tasks complete, cancel pending UCX requests| -|[#1953](https://github.com/NVIDIA/spark-rapids/issues/1953)|[BUG] Could not allocate native memory when running DLRM ETL with --output_ordering input on A100| -|[#2495](https://github.com/NVIDIA/spark-rapids/issues/2495)|[BUG] scaladoc warning GpuParquetScan.scala:727 "discarding unmoored doc comment"| -|[#2368](https://github.com/NVIDIA/spark-rapids/issues/2368)|[BUG] Mismatched number of columns while performing `GpuSort`| -|[#2407](https://github.com/NVIDIA/spark-rapids/issues/2407)|[BUG] test_round_robin_sort_fallback failed| -|[#2497](https://github.com/NVIDIA/spark-rapids/issues/2497)|[BUG] GpuExec failed to find metric totalTime in databricks env| -|[#2473](https://github.com/NVIDIA/spark-rapids/issues/2473)|[BUG] enable test_window_aggs_for_rows_lead_lag_on_arrays and make the order unambiguous| -|[#2489](https://github.com/NVIDIA/spark-rapids/issues/2489)|[BUG] Queries with window expressions fail when cost-based optimizer is enabled| -|[#2457](https://github.com/NVIDIA/spark-rapids/issues/2457)|[BUG] test_window_aggs_for_rows_lead_lag_on_arrays failed| -|[#2371](https://github.com/NVIDIA/spark-rapids/issues/2371)|[BUG] Performance regression for crossjoin on 0.6 comparing to 0.5| -|[#2372](https://github.com/NVIDIA/spark-rapids/issues/2372)|[BUG] FAILED ../../src/main/python/udf_cudf_test.py::test_window| -|[#2404](https://github.com/NVIDIA/spark-rapids/issues/2404)|[BUG] test_hash_pivot_groupby_nan_fallback failed on Dataproc | -|[#2474](https://github.com/NVIDIA/spark-rapids/issues/2474)|[BUG] when ucp listener enabled we bind 16 times always| -|[#2427](https://github.com/NVIDIA/spark-rapids/issues/2427)|[BUG] test_union_struct_missing_children[(Struct(not_null) failed in databricks310 and spark 311| -|[#2455](https://github.com/NVIDIA/spark-rapids/issues/2455)|[BUG] CaseWhen crashes on literal arrays| -|[#2421](https://github.com/NVIDIA/spark-rapids/issues/2421)|[BUG] NPE when running mapInPandas Pandas UDF in 0.5GA| -|[#2428](https://github.com/NVIDIA/spark-rapids/issues/2428)|[BUG] Intermittent ValueError in test_struct_groupby_count| -|[#1628](https://github.com/NVIDIA/spark-rapids/issues/1628)|[BUG] TPC-DS-like query 24a and 24b at scale=3TB fails with OOM| -|[#2276](https://github.com/NVIDIA/spark-rapids/issues/2276)|[BUG] SPARK-33386 - ansi-mode changed ElementAt/Elt/GetArray behavior in Spark 3.1.1 - fallback to cpu| -|[#2309](https://github.com/NVIDIA/spark-rapids/issues/2309)|[BUG] legacy cast of a struct column to string with a single nested null column yields null instead of '[]' | -|[#2315](https://github.com/NVIDIA/spark-rapids/issues/2315)|[BUG] legacy struct cast to string crashes on a two field struct| -|[#2406](https://github.com/NVIDIA/spark-rapids/issues/2406)|[BUG] test_struct_groupby_count failed| -|[#2378](https://github.com/NVIDIA/spark-rapids/issues/2378)|[BUG] java.lang.ClassCastException: GpuCompressedColumnVector cannot be cast to GpuColumnVector| -|[#2355](https://github.com/NVIDIA/spark-rapids/issues/2355)|[BUG] convertDecimal64ToDecimal32Wrapper leaks ColumnView instances| -|[#2346](https://github.com/NVIDIA/spark-rapids/issues/2346)|[BUG] segfault when using `UcpListener` in TCP-only setup| -|[#2364](https://github.com/NVIDIA/spark-rapids/issues/2364)|[BUG] qa_nightly_select_test.py::test_select integration test fails | -|[#2302](https://github.com/NVIDIA/spark-rapids/issues/2302)|[BUG] Int96 are not being written as expected| -|[#2359](https://github.com/NVIDIA/spark-rapids/issues/2359)|[BUG] Alias is different in spark 3.1.0 but our canonicalization code doesn't handle| -|[#2277](https://github.com/NVIDIA/spark-rapids/issues/2277)|[BUG] spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED or LEGACY still fails to read LEGACY date from parquet| -|[#2320](https://github.com/NVIDIA/spark-rapids/issues/2320)|[BUG] TypeChecks diagnostics outputs column ids instead of unsupported types | -|[#2238](https://github.com/NVIDIA/spark-rapids/issues/2238)|[BUG] Unnecessary to cache the batches that will be sent to Python in `FlatMapGroupInPandas`.| -|[#1811](https://github.com/NVIDIA/spark-rapids/issues/1811)|[BUG] window_function_test.py::test_multi_types_window_aggs_for_rows_lead_lag[partBy failed| - -### PRs -||| -|:---|:---| -|[#2817](https://github.com/NVIDIA/spark-rapids/pull/2817)|Update changelog for v21.06.0 release [skip ci]|| -|[#2806](https://github.com/NVIDIA/spark-rapids/pull/2806)|Noted testing for A10, noted that min driver ver is HW specific| -|[#2797](https://github.com/NVIDIA/spark-rapids/pull/2797)|Update documentation for InitCap incompatibility| -|[#2774](https://github.com/NVIDIA/spark-rapids/pull/2774)|Update changelog for 21.06 release [skip ci]| -|[#2770](https://github.com/NVIDIA/spark-rapids/pull/2770)|[Doc] add more for Alluxio page [skip ci]| -|[#2745](https://github.com/NVIDIA/spark-rapids/pull/2745)|Add link to Mellanox RoCE documentation and mention --without-ucx installation option| -|[#2740](https://github.com/NVIDIA/spark-rapids/pull/2740)|Update cudf Java bindings to 21.06.1| -|[#2664](https://github.com/NVIDIA/spark-rapids/pull/2664)|Update changelog for 21.06 release [skip ci]| -|[#2697](https://github.com/NVIDIA/spark-rapids/pull/2697)|fix GDS spill bug when copying from the batch write buffer| -|[#2691](https://github.com/NVIDIA/spark-rapids/pull/2691)|Update properties to check if table there| -|[#2687](https://github.com/NVIDIA/spark-rapids/pull/2687)|Remove CUDA 10.x from getting started guide (#2668)| -|[#2686](https://github.com/NVIDIA/spark-rapids/pull/2686)|Profiling tool: Print Job Information in compare mode| -|[#2657](https://github.com/NVIDIA/spark-rapids/pull/2657)|Print CPU and GPU output when _assert_equal fails to help debug given…| -|[#2681](https://github.com/NVIDIA/spark-rapids/pull/2681)|Avoid NPE when casting empty strings to ints| -|[#2669](https://github.com/NVIDIA/spark-rapids/pull/2669)|Fix multiple problems reported and improve error handling| -|[#2666](https://github.com/NVIDIA/spark-rapids/pull/2666)|[DOC]Update custom image guide in GCP dataproc to reduce cluster startup time| -|[#2665](https://github.com/NVIDIA/spark-rapids/pull/2665)|Update docs to move RAPIDS Shuffle out of beta [skip ci]| -|[#2671](https://github.com/NVIDIA/spark-rapids/pull/2671)|Clean profiling&qualification tool README| -|[#2673](https://github.com/NVIDIA/spark-rapids/pull/2673)|Profiling tool: Enable tests and update compressed event log| -|[#2672](https://github.com/NVIDIA/spark-rapids/pull/2672)|Update cudfjni dependency version to 21.06.0| -|[#2663](https://github.com/NVIDIA/spark-rapids/pull/2663)|Qualification tool - add in estimating the App end time when the event log missing application end event| -|[#2600](https://github.com/NVIDIA/spark-rapids/pull/2600)|Accelerate `RunningWindow` queries on GPU| -|[#2651](https://github.com/NVIDIA/spark-rapids/pull/2651)|Profiling tool - fix reporting contains dataset when sql time 0| -|[#2623](https://github.com/NVIDIA/spark-rapids/pull/2623)|Fixed minor mistakes in documentation| -|[#2631](https://github.com/NVIDIA/spark-rapids/pull/2631)|Update docs for Databricks 8.2 ML| -|[#2638](https://github.com/NVIDIA/spark-rapids/pull/2638)|Add an init script for databricks 7.3ML with CUDA11.0 installed| -|[#2643](https://github.com/NVIDIA/spark-rapids/pull/2643)|Profiling tool: Health check follow on| -|[#2640](https://github.com/NVIDIA/spark-rapids/pull/2640)|Add physical plan to the dot file as the graph label| -|[#2637](https://github.com/NVIDIA/spark-rapids/pull/2637)|Fix databricks for 3.1.1| -|[#2577](https://github.com/NVIDIA/spark-rapids/pull/2577)|Update download.md and FAQ.md for 21.06.0| -|[#2636](https://github.com/NVIDIA/spark-rapids/pull/2636)|Profiling tool - Fix file writer for generating dot graphs, supporting writing sql plans to a file, change output to subdirectory| -|[#2625](https://github.com/NVIDIA/spark-rapids/pull/2625)|Exclude failed jobs/queries from Qualification tool output| -|[#2626](https://github.com/NVIDIA/spark-rapids/pull/2626)|Enable processing of compressed Spark event logs| -|[#2632](https://github.com/NVIDIA/spark-rapids/pull/2632)|Profiling tool: Add support for health check.| -|[#2627](https://github.com/NVIDIA/spark-rapids/pull/2627)|Ignore order for map udf test| -|[#2620](https://github.com/NVIDIA/spark-rapids/pull/2620)|Change aggregation of executor CPU and run time for Qualification tool to speed up query| -|[#2618](https://github.com/NVIDIA/spark-rapids/pull/2618)|Correct an issue for README for tools and also correct s3 solution in Args.scala| -|[#2612](https://github.com/NVIDIA/spark-rapids/pull/2612)|Profiling tool, add in job to stage, duration, executor cpu time, fix writing to HDFS| -|[#2614](https://github.com/NVIDIA/spark-rapids/pull/2614)|change rapids-4-spark-tools directory to tools in deploy script [skip ci]| -|[#2611](https://github.com/NVIDIA/spark-rapids/pull/2611)|Revert "disable cudf_udf tests for #2521"| -|[#2604](https://github.com/NVIDIA/spark-rapids/pull/2604)|Profile/qualification tool error handling improvements and support spark < 3.1.1| -|[#2598](https://github.com/NVIDIA/spark-rapids/pull/2598)|Rename rapids-4-spark-tools directory to tools| -|[#2576](https://github.com/NVIDIA/spark-rapids/pull/2576)|Add filter support for qualification and profiling tool.| -|[#2603](https://github.com/NVIDIA/spark-rapids/pull/2603)|Add the doc for -g option of the profiling tool.| -|[#2594](https://github.com/NVIDIA/spark-rapids/pull/2594)|Change the README of the qualification and profiling tool to match the current version.| -|[#2591](https://github.com/NVIDIA/spark-rapids/pull/2591)|Implement test for qualification tool sql metric aggregates| -|[#2590](https://github.com/NVIDIA/spark-rapids/pull/2590)|Profiling tool support for collection and analysis| -|[#2587](https://github.com/NVIDIA/spark-rapids/pull/2587)|Handle UCX connection timeouts from heartbeats more gracefully| -|[#2588](https://github.com/NVIDIA/spark-rapids/pull/2588)|Fix package name| -|[#2574](https://github.com/NVIDIA/spark-rapids/pull/2574)|Add Qualification tool support| -|[#2571](https://github.com/NVIDIA/spark-rapids/pull/2571)|Change test_single_sort_in_part to print source data frame on failure| -|[#2569](https://github.com/NVIDIA/spark-rapids/pull/2569)|Remove -SNAPSHOT in documentation in preparation for release| -|[#2429](https://github.com/NVIDIA/spark-rapids/pull/2429)|Change RMM_ALLOC_FRACTION to represent percentage of available memory, rather than total memory, for initial allocation| -|[#2553](https://github.com/NVIDIA/spark-rapids/pull/2553)|Cancel requests that are queued for a client/handler on error| -|[#2566](https://github.com/NVIDIA/spark-rapids/pull/2566)|expose unspill config option| -|[#2460](https://github.com/NVIDIA/spark-rapids/pull/2460)|align GDS reads/writes to 4 KiB| -|[#2515](https://github.com/NVIDIA/spark-rapids/pull/2515)|Remove fetchTime and standardize on collectTime| -|[#2523](https://github.com/NVIDIA/spark-rapids/pull/2523)|Not compile RapidsUDF when udf compiler is enabled| -|[#2538](https://github.com/NVIDIA/spark-rapids/pull/2538)|Fixed code indentation in ParquetCachedBatchSerializer| -|[#2559](https://github.com/NVIDIA/spark-rapids/pull/2559)|Release profiling tool jar to maven central| -|[#2423](https://github.com/NVIDIA/spark-rapids/pull/2423)|Add cloudera shim layer| -|[#2520](https://github.com/NVIDIA/spark-rapids/pull/2520)|Add event logs for integration tests| -|[#2525](https://github.com/NVIDIA/spark-rapids/pull/2525)|support interval.microseconds for range window TimeStampType| -|[#2536](https://github.com/NVIDIA/spark-rapids/pull/2536)|Don't do an extra shuffle in some TopN cases| -|[#2508](https://github.com/NVIDIA/spark-rapids/pull/2508)|Refactor the code for conditional expressions| -|[#2542](https://github.com/NVIDIA/spark-rapids/pull/2542)|enable auto-merge from 21.06 to 21.08 [skip ci]| -|[#2540](https://github.com/NVIDIA/spark-rapids/pull/2540)|Update spark 312 shim, and Add spark 313-SNAPSHOT shim| -|[#2539](https://github.com/NVIDIA/spark-rapids/pull/2539)|disable cudf_udf tests for #2521| -|[#2514](https://github.com/NVIDIA/spark-rapids/pull/2514)|Add Struct support for ParquetWriter| -|[#2534](https://github.com/NVIDIA/spark-rapids/pull/2534)|Remove scaladoc on an internal method to avoid warning during build| -|[#2537](https://github.com/NVIDIA/spark-rapids/pull/2537)|Add CentOS documentation and improve dockerfiles for UCX| -|[#2531](https://github.com/NVIDIA/spark-rapids/pull/2531)|Add nested types and decimals to CoalesceExec| -|[#2513](https://github.com/NVIDIA/spark-rapids/pull/2513)|Report opTime not totalTime for expand, range, and generate execs| -|[#2533](https://github.com/NVIDIA/spark-rapids/pull/2533)|Fix concat_ws test specifying only a separator for databricks| -|[#2528](https://github.com/NVIDIA/spark-rapids/pull/2528)|Make GenerateDot test more robust| -|[#2529](https://github.com/NVIDIA/spark-rapids/pull/2529)|Change Databricks 310 shim to be 311 to match reported spark.version| -|[#2479](https://github.com/NVIDIA/spark-rapids/pull/2479)|Support concat with separator on GPU| -|[#2507](https://github.com/NVIDIA/spark-rapids/pull/2507)|Improve test coverage for sorting structs| -|[#2526](https://github.com/NVIDIA/spark-rapids/pull/2526)|Improve debug print to include addresses and null counts| -|[#2463](https://github.com/NVIDIA/spark-rapids/pull/2463)|Add EMR 6.3 documentation| -|[#2516](https://github.com/NVIDIA/spark-rapids/pull/2516)|Avoid listener race collecting wrong plan in assert_gpu_fallback_collect| -|[#2505](https://github.com/NVIDIA/spark-rapids/pull/2505)|Qualification tool updates for datasets, udf, and misc fixes| -|[#2509](https://github.com/NVIDIA/spark-rapids/pull/2509)|Added in basic support for scalar structs to named_struct| -|[#2449](https://github.com/NVIDIA/spark-rapids/pull/2449)|Add code for generating dot file visualizations| -|[#2475](https://github.com/NVIDIA/spark-rapids/pull/2475)|Update shuffle documentation for branch-21.06 and UCX 1.10.1| -|[#2500](https://github.com/NVIDIA/spark-rapids/pull/2500)|Update Dockerfile for native UDF| -|[#2506](https://github.com/NVIDIA/spark-rapids/pull/2506)|Support creating Scalars/ColumnVectors from utf8 strings directly.| -|[#2502](https://github.com/NVIDIA/spark-rapids/pull/2502)|Remove work around for nulls in semi-anti joins| -|[#2503](https://github.com/NVIDIA/spark-rapids/pull/2503)|Remove temporary logging and adjust test column names| -|[#2499](https://github.com/NVIDIA/spark-rapids/pull/2499)|Fix regression in TOTAL_TIME metrics for Databricks| -|[#2498](https://github.com/NVIDIA/spark-rapids/pull/2498)|Add in basic support for scalar maps and allow nesting in named_struct| -|[#2496](https://github.com/NVIDIA/spark-rapids/pull/2496)|Add comments for lazy binding in WindowInPandas| -|[#2493](https://github.com/NVIDIA/spark-rapids/pull/2493)|improve window agg test for range numeric types| -|[#2491](https://github.com/NVIDIA/spark-rapids/pull/2491)|Fix regression in cost-based optimizer when calculating cost for Window operations| -|[#2482](https://github.com/NVIDIA/spark-rapids/pull/2482)|Window tests with smaller batches| -|[#2490](https://github.com/NVIDIA/spark-rapids/pull/2490)|Add temporary logging for Dataproc round robin fallback issue| -|[#2486](https://github.com/NVIDIA/spark-rapids/pull/2486)|Remove the null replacement in `computePredicate`| -|[#2469](https://github.com/NVIDIA/spark-rapids/pull/2469)|Adding additional functionalities to profiling tool | -|[#2462](https://github.com/NVIDIA/spark-rapids/pull/2462)|Report gpuOpTime instead of totalTime for project, filter, limit, and window| -|[#2484](https://github.com/NVIDIA/spark-rapids/pull/2484)|Fix the failing test `test_window` on Databricks| -|[#2472](https://github.com/NVIDIA/spark-rapids/pull/2472)|Fix hash_aggregate_test| -|[#2476](https://github.com/NVIDIA/spark-rapids/pull/2476)|Fix for UCP Listener created spark.port.maxRetries times| -|[#2471](https://github.com/NVIDIA/spark-rapids/pull/2471)|skip test_window_aggs_for_rows_lead_lag_on_arrays| -|[#2446](https://github.com/NVIDIA/spark-rapids/pull/2446)|Update plugin version to 21.06.0| -|[#2409](https://github.com/NVIDIA/spark-rapids/pull/2409)|Change shuffle metadata messages to use UCX Active Messages| -|[#2397](https://github.com/NVIDIA/spark-rapids/pull/2397)|Include memory access costs in cost models (cost-based optimizer)| -|[#2442](https://github.com/NVIDIA/spark-rapids/pull/2442)|fix GpuCreateNamedStruct not serializable issue| -|[#2379](https://github.com/NVIDIA/spark-rapids/pull/2379)|support GpuConcat on ArrayType| -|[#2456](https://github.com/NVIDIA/spark-rapids/pull/2456)|Fall back to the CPU for literal array values on case/when| -|[#2447](https://github.com/NVIDIA/spark-rapids/pull/2447)|Filter out the nulls after slicing the batches.| -|[#2426](https://github.com/NVIDIA/spark-rapids/pull/2426)|Implement cast of nested arrays| -|[#2299](https://github.com/NVIDIA/spark-rapids/pull/2299)|support creating array of array| -|[#2451](https://github.com/NVIDIA/spark-rapids/pull/2451)|Update tuning docs to add batch size recommendations.| -|[#2435](https://github.com/NVIDIA/spark-rapids/pull/2435)|support lead/lag on arrays| -|[#2448](https://github.com/NVIDIA/spark-rapids/pull/2448)|support creating list ColumnVector for Literal(ArrayType(NullType))| -|[#2402](https://github.com/NVIDIA/spark-rapids/pull/2402)|Add profiling tool| -|[#2313](https://github.com/NVIDIA/spark-rapids/pull/2313)|Supports `GpuLiteral` of array type| - -## Older Releases -Changelog of older releases can be found at [docs/archives](/docs/archives) diff --git a/docs/archives/CHANGELOG_22.02_to_22.12.md b/docs/archives/CHANGELOG_22.02_to_22.12.md deleted file mode 100644 index e10aeb2c1d5..00000000000 --- a/docs/archives/CHANGELOG_22.02_to_22.12.md +++ /dev/null @@ -1,1906 +0,0 @@ -# Change log -Generated on 2023-02-08 - -## Release 22.12 - -### Features -||| -|:---|:---| -|[#7275](https://github.com/NVIDIA/spark-rapids/issues/7275)|[FEA] Support SaveIntoDataSourceCommand for Delta Lake| -|[#5225](https://github.com/NVIDIA/spark-rapids/issues/5225)|[FEA] Support array_remove| -|[#6781](https://github.com/NVIDIA/spark-rapids/issues/6781)|[FEA] Create demo notebook on Databricks for qualification tool usage| -|[#6782](https://github.com/NVIDIA/spark-rapids/issues/6782)|[FEA] Create demo notebook on Databricks for profiler tool usage| -|[#6024](https://github.com/NVIDIA/spark-rapids/issues/6024)|[FEA] Add support for Spark 3.2.3 SNAPSHOT| -|[#6887](https://github.com/NVIDIA/spark-rapids/issues/6887)|[FEA] support expressions parameter in substr function| -|[#7078](https://github.com/NVIDIA/spark-rapids/issues/7078)|[FEA] Add shims for Spark 3.2.3| -|[#3037](https://github.com/NVIDIA/spark-rapids/issues/3037)|[FEA] Support ZSTD compression with Parquet and Orc| -|[#6916](https://github.com/NVIDIA/spark-rapids/issues/6916)|[FEA] Support Coalesce on map column(s)| -|[#6902](https://github.com/NVIDIA/spark-rapids/issues/6902)|[FEA] Add shims for Spark 3.3.2| -|[#6896](https://github.com/NVIDIA/spark-rapids/issues/6896)|[FEA] Support Apache Spark 3.3.1| -|[#6884](https://github.com/NVIDIA/spark-rapids/issues/6884)|[FEA] Support instr| -|[#6313](https://github.com/NVIDIA/spark-rapids/issues/6313)|[FEA] Support mapInArrow introduced by pyspark 3.3.0+ | -|[#6064](https://github.com/NVIDIA/spark-rapids/issues/6064)|[FEA] Qualification tool support parsing expressions (part 2)| -|[#6645](https://github.com/NVIDIA/spark-rapids/issues/6645)|[FEA] Qualification Tool: Print timestamp related functions. | - -### Performance -||| -|:---|:---| -|[#6794](https://github.com/NVIDIA/spark-rapids/issues/6794)|Investigate other compression codecs and other serializers.| -|[#6528](https://github.com/NVIDIA/spark-rapids/issues/6528)|[FEA] Identify additional opportunities for using tiered projections| -|[#6430](https://github.com/NVIDIA/spark-rapids/issues/6430)|[FEA] look into using the new CUDF like operator| -|[#7020](https://github.com/NVIDIA/spark-rapids/issues/7020)|Fallback to CPU for Delta lake delta_log parquet checkpoint files| -|[#6254](https://github.com/NVIDIA/spark-rapids/issues/6254)|[FEA] Support z-ordering acceleration| -|[#6524](https://github.com/NVIDIA/spark-rapids/issues/6524)|[FEA] Improve tiered project by eliminating eclipsed columns in each tier| -|[#6130](https://github.com/NVIDIA/spark-rapids/issues/6130)|[FEA] More efficient bound check for `GpuCast`| - -### Bugs Fixed -||| -|:---|:---| -|[#6455](https://github.com/NVIDIA/spark-rapids/issues/6455)|[BUG] Rapids tools test on Databricks fail| -|[#6890](https://github.com/NVIDIA/spark-rapids/issues/6890)|[BUG] RUN_DIR change fail some CI pipelines| -|[#7085](https://github.com/NVIDIA/spark-rapids/issues/7085)|[BUG] GPU Hive Text reader fails to read floating point input as integral types| -|[#7271](https://github.com/NVIDIA/spark-rapids/issues/7271)|[BUG] failed to build in Databricks runtime due to alluxio utils | -|[#6636](https://github.com/NVIDIA/spark-rapids/issues/6636)|[BUG] casting to string and list, and concat can cause overflow issues| -|[#7234](https://github.com/NVIDIA/spark-rapids/issues/7234)|[BUG] Integration test script failed on: '/tmp/20221204/python/lib': No such file or directory| -|[#7198](https://github.com/NVIDIA/spark-rapids/issues/7198)|[BUG] RapidsShuffleManager fails to unregister UCX-mode shuffle| -|[#7168](https://github.com/NVIDIA/spark-rapids/issues/7168)|[BUG] mismatch cpu and gpu result in test_aqe_join_reused_exchange_inequality_condition failed| -|[#7066](https://github.com/NVIDIA/spark-rapids/issues/7066)|[SPARK-39432][BUG] The test `test_array_element_at_zero_index_fail` fails on Spark 3.4| -|[#7179](https://github.com/NVIDIA/spark-rapids/issues/7179)|[BUG] Executors killed for out of memory with multithreaded RapidsShuffleManager| -|[#7054](https://github.com/NVIDIA/spark-rapids/issues/7054)|[BUG] Some tests in the `AdaptiveQueryExecSuite` fail on Spark 340| -|[#7037](https://github.com/NVIDIA/spark-rapids/issues/7037)|[BUG] AQE on Databricks failed the query with error "UnsupportedOperationException: ColumnarToRow does not implement doExecuteBroadcast"| -|[#7150](https://github.com/NVIDIA/spark-rapids/issues/7150)|[BUG] Spark 3.4 build fails| -|[#7092](https://github.com/NVIDIA/spark-rapids/issues/7092)|[BUG] java gateway crashed due to hash_aggregate_test case intermittently| -|[#7140](https://github.com/NVIDIA/spark-rapids/issues/7140)|[BUG] failed to echo PROJECT_VERSION in premerge CI| -|[#7111](https://github.com/NVIDIA/spark-rapids/issues/7111)|[BUG] Multithreaded shuffle keeps files around after RDDs are GCed| -|[#7059](https://github.com/NVIDIA/spark-rapids/issues/7059)|[BUG] Qualification - Incorrect parsing of conditional expressions | -|[#6983](https://github.com/NVIDIA/spark-rapids/issues/6983)|[BUG] query95 @ 30TB negative allocation from `BaseHashJoinIterator.countGroups` with default 200 partitions| -|[#7036](https://github.com/NVIDIA/spark-rapids/issues/7036)|[BUG] 30TB query95 fails on the join with illegal memory access with 200 partitions| -|[#7065](https://github.com/NVIDIA/spark-rapids/issues/7065)|[SPARK-38976][SPARK-40066][BUG] Some tests in the `array_test.py` fail on Spark 3.4 because the conf `strictIndexOperator` has been removed| -|[#7044](https://github.com/NVIDIA/spark-rapids/issues/7044)|[BUG] Qualification tool skips applications due to failure in expression parsing| -|[#7026](https://github.com/NVIDIA/spark-rapids/issues/7026)|[BUG] AnsiCastOpSuite 340 failures| -|[#7039](https://github.com/NVIDIA/spark-rapids/issues/7039)|[BUG] `nz timestamp (MILLIS AND MICROS)` fails on Spark 3.4| -|[#7033](https://github.com/NVIDIA/spark-rapids/issues/7033)|[BUG] GPU and CPU `substring` output different rows when `pos + len < 0 && len >= 0`| -|[#7041](https://github.com/NVIDIA/spark-rapids/issues/7041)|[BUG] regexp_test and many other test failures| -|[#6425](https://github.com/NVIDIA/spark-rapids/issues/6425)|[BUG] Host column leak detected in ParquetCachedBatchSerializer tests| -|[#6906](https://github.com/NVIDIA/spark-rapids/issues/6906)|[FEAT] Add tests for parquet reader code for all possible types| -|[#6963](https://github.com/NVIDIA/spark-rapids/issues/6963)|[BUG] Dynamic partition writer prevents GPU memory from being freed during write| -|[#7014](https://github.com/NVIDIA/spark-rapids/issues/7014)|[BUG] The unit test `avg literals bools fail` fails in Spark 340| -|[#7003](https://github.com/NVIDIA/spark-rapids/issues/7003)|[BUG] Alluxio config `pathsToReplace` does not overwrite `automount` config.| -|[#6779](https://github.com/NVIDIA/spark-rapids/issues/6779)|[BUG] Always read old data from alluxio regardless of S3 changes when using CONVERT_TIME replacement algorithm | -|[#7010](https://github.com/NVIDIA/spark-rapids/issues/7010)|[BUG] Parquet multi-threaded reader bufferTime is wrong| -|[#6949](https://github.com/NVIDIA/spark-rapids/issues/6949)|[BUG] Negative allocation error while stress testing with NDSv2 Query 9| -|[#6995](https://github.com/NVIDIA/spark-rapids/issues/6995)|[BUG] HostToGpuCoalesceIterator can sometimes close input batches| -|[#4884](https://github.com/NVIDIA/spark-rapids/issues/4884)|[BUG] Split by regular expressions with `?` and `*` repetition are not consistent with Spark| -|[#6452](https://github.com/NVIDIA/spark-rapids/issues/6452)|[BUG] GPU writes more records than `maxRecordsPerFile` limit while CPU performs well| -|[#6951](https://github.com/NVIDIA/spark-rapids/issues/6951)|[BUG] cast_test.py::test_cast_float_to_timestamp_ansi_for_nan_inf failed in spark 3.3.0+| -|[#6880](https://github.com/NVIDIA/spark-rapids/issues/6880)|[BUG] Regular expressions should support escaped forward slash `\/` (and any other "invalid" escape chars) | -|[#6537](https://github.com/NVIDIA/spark-rapids/issues/6537)|[BUG] per-sql unit-tests need to be added to the test generator| -|[#6933](https://github.com/NVIDIA/spark-rapids/issues/6933)|[BUG] Tools run with filter arguments should handle corrupted log that doesn't have SparkListenerApplicationStart event | -|[#3143](https://github.com/NVIDIA/spark-rapids/issues/3143)|[BUG] DPP is not working in Databricks env| -|[#6895](https://github.com/NVIDIA/spark-rapids/issues/6895)|[BUG] Profile tool fails in getMaxTaskInputSizeBytes| -|[#6871](https://github.com/NVIDIA/spark-rapids/issues/6871)|[BUG] Parquet reader - Found no metadata for schema index| -|[#6883](https://github.com/NVIDIA/spark-rapids/issues/6883)|[BUG] integration test fail in CDH env due us trying to change permissions on /tmp/hive| -|[#6752](https://github.com/NVIDIA/spark-rapids/issues/6752)|[BUG] StringOperatorsSuite failed when building with JDK17| -|[#6671](https://github.com/NVIDIA/spark-rapids/issues/6671)|[Audit][BUG] Handle updated messageParameters for any thrown Spark exceptions in Spark 3.4.x| -|[#6865](https://github.com/NVIDIA/spark-rapids/issues/6865)|[BUG] parquet_write_test is failing when reading on the CPU parquet that was written on the GPU| -|[#6856](https://github.com/NVIDIA/spark-rapids/issues/6856)|[BUG] Can not switch Alluxio auto-mount option on the fly| -|[#6869](https://github.com/NVIDIA/spark-rapids/issues/6869)|[BUG] Building databricks failed| -|[#6848](https://github.com/NVIDIA/spark-rapids/issues/6848)|[BUG] github workflow actions use deprecated API "to be removed soon"| -|[#6825](https://github.com/NVIDIA/spark-rapids/issues/6825)|[BUG] pytests should configure hive.scratch.dir under RUN_DIR | -|[#6818](https://github.com/NVIDIA/spark-rapids/issues/6818)|[BUG] `RapidsShuffleThreadedReader` is not found when building the plugin with Spark 340| -|[#6718](https://github.com/NVIDIA/spark-rapids/issues/6718)|[BUG] test_iceberg_parquet_read_round_trip FAILED "TypeError: object of type 'NoneType' has no len()"| -|[#6762](https://github.com/NVIDIA/spark-rapids/issues/6762)|[BUG] The concurrent writer throws a class casting error when enabling AQE.| -|[#6146](https://github.com/NVIDIA/spark-rapids/issues/6146)|[BUG] intermittent orc test_read_round_trip failed due to /tmp/hive location| -|[#2654](https://github.com/NVIDIA/spark-rapids/issues/2654)|[BUG] --help at the end does not print out help for tools| - -### PRs -||| -|:---|:---| -|[#7337](https://github.com/NVIDIA/spark-rapids/pull/7337)|Update 22.12 changelog to latest [skip ci]| -|[#7316](https://github.com/NVIDIA/spark-rapids/pull/7316)|Update jni version 22.12.0| -|[#7237](https://github.com/NVIDIA/spark-rapids/pull/7237)|[Doc]update download docs for v22.12 release[skip ci]| -|[#7330](https://github.com/NVIDIA/spark-rapids/pull/7330)|xfail all delta-write fallback cases [skip ci]| -|[#7288](https://github.com/NVIDIA/spark-rapids/pull/7288)|Add support for SaveIntoDataSource for Delta Lake 2.x| -|[#7306](https://github.com/NVIDIA/spark-rapids/pull/7306)|Cherry pick #7293 to 22.12 [skip ci]| -|[#7270](https://github.com/NVIDIA/spark-rapids/pull/7270)|Update 22.12 changelog to latest [skip ci]| -|[#7264](https://github.com/NVIDIA/spark-rapids/pull/7264)|Update columnar stats tracker API to pass file path for new batches| -|[#7273](https://github.com/NVIDIA/spark-rapids/pull/7273)|Fix AlluxioUtilsSuite build on Databricks for 22.12| -|[#7250](https://github.com/NVIDIA/spark-rapids/pull/7250)|Change tools hadoop version to 3.3.4| -|[#7172](https://github.com/NVIDIA/spark-rapids/pull/7172)|Add a document for how to view Alluxio metrics on UI [skip ci]| -|[#7238](https://github.com/NVIDIA/spark-rapids/pull/7238)|Add branch-specific premerge jenkinsfile| -|[#7243](https://github.com/NVIDIA/spark-rapids/pull/7243)|[Doc]fix broken links[skip ci]| -|[#7155](https://github.com/NVIDIA/spark-rapids/pull/7155)|Add unit tests for alluxio utils| -|[#7080](https://github.com/NVIDIA/spark-rapids/pull/7080)|[Doc] Document Alluxio does not sync metadata from S3 by default [skip ci]| -|[#7235](https://github.com/NVIDIA/spark-rapids/pull/7235)|Create tmp path to make python path explicit [skip ci]| -|[#7084](https://github.com/NVIDIA/spark-rapids/pull/7084)|[Doc]Update databricks doc for 22.12[skip ci]| -|[#7166](https://github.com/NVIDIA/spark-rapids/pull/7166)|Sync up spark2 explain code| -|[#6903](https://github.com/NVIDIA/spark-rapids/pull/6903)|Support projectV2 for changelog tooling [skip ci]| -|[#7203](https://github.com/NVIDIA/spark-rapids/pull/7203)|[Doc]add a Contact Us page at the top-level menu[skip ci]| -|[#7174](https://github.com/NVIDIA/spark-rapids/pull/7174)|Fix dependencies in jenkins-test script to support DB11.3| -|[#7034](https://github.com/NVIDIA/spark-rapids/pull/7034)|Read directly from S3 instead of reading from Alluxio caches if files are large and disk is slow| -|[#7199](https://github.com/NVIDIA/spark-rapids/pull/7199)|Fixes unregisterShuffle bugs in the driver and a missed match for the GpuResolver| -|[#7156](https://github.com/NVIDIA/spark-rapids/pull/7156)|Add scripts to run integration test on Databricks by leveraging Jenkins parallelism [skip ci]| -|[#7195](https://github.com/NVIDIA/spark-rapids/pull/7195)|Fix non-deterministic query in test_aqe_join_reused_exchange_inequality_condition| -|[#7176](https://github.com/NVIDIA/spark-rapids/pull/7176)|Copying common ThreadFactoryBuilder to tools to remove dependency| -|[#7188](https://github.com/NVIDIA/spark-rapids/pull/7188)|Remove "SNAPSHOT" for 323 shim| -|[#7189](https://github.com/NVIDIA/spark-rapids/pull/7189)|specify shim versions to build [skip ci]| -|[#7165](https://github.com/NVIDIA/spark-rapids/pull/7165)|Try/catch cudf file scan exceptions and re-throw with file metadata in message| -|[#7164](https://github.com/NVIDIA/spark-rapids/pull/7164)|Search for `CudaFatalException` in causes of `failureReason` in function `onTaskFailed`| -|[#7169](https://github.com/NVIDIA/spark-rapids/pull/7169)|Remove snapshot shims build in premerge script| -|[#7180](https://github.com/NVIDIA/spark-rapids/pull/7180)|multithreaded RapidsShuffleManager change when we release memory| -|[#7115](https://github.com/NVIDIA/spark-rapids/pull/7115)|Support `array_remove` operator| -|[#7171](https://github.com/NVIDIA/spark-rapids/pull/7171)|Add tests for 331 and 332| -|[#7099](https://github.com/NVIDIA/spark-rapids/pull/7099)|Update AQE tests to support Spark 3.4| -|[#7110](https://github.com/NVIDIA/spark-rapids/pull/7110)|Add GpuBroadcastToRowExec to handle columnar broadcast in cpu broadcast join with AQE enabled| -|[#7153](https://github.com/NVIDIA/spark-rapids/pull/7153)|Add `SchemaUtilsShims`| -|[#7142](https://github.com/NVIDIA/spark-rapids/pull/7142)|Restore hash aggregate tests after cub segmented sort fix| -|[#7141](https://github.com/NVIDIA/spark-rapids/pull/7141)|Get PROJECT_VERSION from version-def.sh [skip ci]| -|[#7123](https://github.com/NVIDIA/spark-rapids/pull/7123)|Reduce the duplication of `RegExpShim` and `getFileScanRDD`| -|[#7145](https://github.com/NVIDIA/spark-rapids/pull/7145)|Remove inaccurate warnings about fallbacks when using multithreaded shuffle| -|[#7135](https://github.com/NVIDIA/spark-rapids/pull/7135)|Revert "Suffix artifactId with amd64/arm64 for the dist jars [skip ci]] (#7070)(#7120)"| -|[#7103](https://github.com/NVIDIA/spark-rapids/pull/7103)|Add support to DB 11.3 ML LTS in databricks build script| -|[#7125](https://github.com/NVIDIA/spark-rapids/pull/7125)|Add missing cleanup of shuffle data when using multi-threaded shuffle| -|[#6934](https://github.com/NVIDIA/spark-rapids/pull/6934)|Add support for chunked parquet reading| -|[#7120](https://github.com/NVIDIA/spark-rapids/pull/7120)|Build noSnapshots without cdh shims on arm CPU [skip ci]| -|[#7013](https://github.com/NVIDIA/spark-rapids/pull/7013)|Hive delimited textfile read support| -|[#7077](https://github.com/NVIDIA/spark-rapids/pull/7077)|Add shims for Spark 3.2.3| -|[#7070](https://github.com/NVIDIA/spark-rapids/pull/7070)|Suffix artifactId with amd64/arm64 for the dist jars [skip ci]| -|[#7088](https://github.com/NVIDIA/spark-rapids/pull/7088)|Fix ConditionalExpr parser in Qualification tool| -|[#7097](https://github.com/NVIDIA/spark-rapids/pull/7097)|Use Databricks instance Spark version as default| -|[#7107](https://github.com/NVIDIA/spark-rapids/pull/7107)|Skip test_hash_groupby_collect_with_single_distinct [skip ci]| -|[#7102](https://github.com/NVIDIA/spark-rapids/pull/7102)|Skip test_hash_groupby_collect_partial_replace_with_distinct_fallback for #7092| -|[#7051](https://github.com/NVIDIA/spark-rapids/pull/7051)|Support non literal position and length for substring| -|[#7067](https://github.com/NVIDIA/spark-rapids/pull/7067)|Update the tests in `array_test.py` to adapt the removal of `strictIndexOperator` in Spark 3.4| -|[#7071](https://github.com/NVIDIA/spark-rapids/pull/7071)|Exception in SQLParser should not cause Qualification tool to skip app| -|[#7025](https://github.com/NVIDIA/spark-rapids/pull/7025)|Spark-3.4 - Fix cast unit tests| -|[#7045](https://github.com/NVIDIA/spark-rapids/pull/7045)|Fix parquet test for nztimestamp on spark 3.4.0| -|[#7048](https://github.com/NVIDIA/spark-rapids/pull/7048)|Enable tiered projections for GpuProjectExec| -|[#7055](https://github.com/NVIDIA/spark-rapids/pull/7055)|[Doc]update a typo for iceberg readme[skip ci]| -|[#7049](https://github.com/NVIDIA/spark-rapids/pull/7049)|add parenthesis around delta_log check to short circuit| -|[#7052](https://github.com/NVIDIA/spark-rapids/pull/7052)|Enable automerge 22.12 to 23.02 [skip ci]| -|[#7040](https://github.com/NVIDIA/spark-rapids/pull/7040)|Fix a substring issue for a corner case| -|[#6960](https://github.com/NVIDIA/spark-rapids/pull/6960)|Use cudf like operator in GpuLike operator| -|[#7027](https://github.com/NVIDIA/spark-rapids/pull/7027)|Include unit tests and integration tests in mvn-verify-check| -|[#7022](https://github.com/NVIDIA/spark-rapids/pull/7022)|Fallback to CPU when reading Delta delta_log parquet checkpoint files| -|[#7031](https://github.com/NVIDIA/spark-rapids/pull/7031)|Add skip test options [skip ci]| -|[#7002](https://github.com/NVIDIA/spark-rapids/pull/7002)|ParquetCachedBatchSerializer: Close the hostBatch in ColumnBatchToCachedBatchIterator when the iterator has exhausted| -|[#6914](https://github.com/NVIDIA/spark-rapids/pull/6914)|Add in tests to verify corner cases in parquet| -|[#7016](https://github.com/NVIDIA/spark-rapids/pull/7016)|Parse out positive and negative lookahead explicitly to fallback to GPU| -|[#6999](https://github.com/NVIDIA/spark-rapids/pull/6999)|Enable snapshot builds as optional PR checks | -|[#6977](https://github.com/NVIDIA/spark-rapids/pull/6977)|Close the `batch` in the `writeBatch` function of `GpuDynamicPartitionDataSingleWriter`| -|[#7015](https://github.com/NVIDIA/spark-rapids/pull/7015)|Fix the test failure of `avg literals bools fail` on Spark 3.4.0| -|[#6362](https://github.com/NVIDIA/spark-rapids/pull/6362)|[FEA] Add support for using nvcomp ZSTD compression| -|[#7004](https://github.com/NVIDIA/spark-rapids/pull/7004)|Alluxio pathsToReplace should has higher priority| -|[#6806](https://github.com/NVIDIA/spark-rapids/pull/6806)|Fix read old data from alluxio regardless of S3 changes when using CONVERT_TIME replacement algorithm| -|[#7006](https://github.com/NVIDIA/spark-rapids/pull/7006)|Revert "Fix a minor potential issue when rebatching for GpuArrowEvalP…| -|[#7011](https://github.com/NVIDIA/spark-rapids/pull/7011)|Fix buffertime for multi-threaded reader| -|[#6950](https://github.com/NVIDIA/spark-rapids/pull/6950)|Throw when onAllocFailure is invoked with invalid arguments| -|[#7009](https://github.com/NVIDIA/spark-rapids/pull/7009)|Work around column vectors reporting incorrect data type| -|[#6996](https://github.com/NVIDIA/spark-rapids/pull/6996)|Fix HostToGpuCoalesceIterator sometimes closing input batches| -|[#6998](https://github.com/NVIDIA/spark-rapids/pull/6998)|Make shim revision check opt-out| -|[#6976](https://github.com/NVIDIA/spark-rapids/pull/6976)|Update the docs of `write` and `writebatch` of `ColumnOutputWriter`| -|[#7000](https://github.com/NVIDIA/spark-rapids/pull/7000)|Spark-3.4: Update DecimalArithmeticOverrides to object| -|[#6937](https://github.com/NVIDIA/spark-rapids/pull/6937)|Removed PromotePrecision for Spark 3.4| -|[#6959](https://github.com/NVIDIA/spark-rapids/pull/6959)|Allow `*`, `?`, and `{0,...}` variants in StringSplit in non-empty match situations| -|[#6972](https://github.com/NVIDIA/spark-rapids/pull/6972)|Add regular expression support for `\d` inside character classes on the GPU| -|[#6922](https://github.com/NVIDIA/spark-rapids/pull/6922)|Fix CastBase issues not related to PromotePrecision and CheckOverflow| -|[#6966](https://github.com/NVIDIA/spark-rapids/pull/6966)|Extract pre/post projections from columnar transitions| -|[#6974](https://github.com/NVIDIA/spark-rapids/pull/6974)|Add doc for `mapInArrow` [skip ci]| -|[#6931](https://github.com/NVIDIA/spark-rapids/pull/6931)|mergeSort late batch materialization and free already merged batches eagerly| -|[#6971](https://github.com/NVIDIA/spark-rapids/pull/6971)|Spark-3.4 : Fix build error in DataSourceV2ScanExec| -|[#6901](https://github.com/NVIDIA/spark-rapids/pull/6901)|Add JDK11 to mvn-verify-check| -|[#6801](https://github.com/NVIDIA/spark-rapids/pull/6801)|Enable the config `MaxRecordsPerFile` on the `GpuDynamicDirectoryConcurrentWriter`| -|[#6952](https://github.com/NVIDIA/spark-rapids/pull/6952)|Fix the `failOnError not found` error when building Spark 3.4.0| -|[#6962](https://github.com/NVIDIA/spark-rapids/pull/6962)|Stop using deprecated JDK API javax.xml.bind| -|[#6957](https://github.com/NVIDIA/spark-rapids/pull/6957)|Fix leak in GpuBroadcastExchangeExec| -|[#6924](https://github.com/NVIDIA/spark-rapids/pull/6924)|Shim for shaded protobuf orc-core| -|[#6943](https://github.com/NVIDIA/spark-rapids/pull/6943)|Mechanism to reduce redundancy in Maven profiles for shims| -|[#6956](https://github.com/NVIDIA/spark-rapids/pull/6956)|Throw SparkDateTimeException for invalid cast in Spark3.3+ versions| -|[#6948](https://github.com/NVIDIA/spark-rapids/pull/6948)|Pass through escaped punctuation in Regular Expression Transpiler| -|[#6953](https://github.com/NVIDIA/spark-rapids/pull/6953)|Remove unsupported format when converting dates/timestamps to strings [skip ci]| -|[#6944](https://github.com/NVIDIA/spark-rapids/pull/6944)|Update to a valid cuda docker image for k8s run [skip ci]| -|[#6938](https://github.com/NVIDIA/spark-rapids/pull/6938)|Spark-3.4 - Fix build errors in DataSourceStrategy and SparkDateTimeException| -|[#6925](https://github.com/NVIDIA/spark-rapids/pull/6925)|Only warn when hive scratch creation fails| -|[#6923](https://github.com/NVIDIA/spark-rapids/pull/6923)|[BUG] Fix qualification-test-result generators and update csv files| -|[#6939](https://github.com/NVIDIA/spark-rapids/pull/6939)|Support Coalesce on map column| -|[#6936](https://github.com/NVIDIA/spark-rapids/pull/6936)|Fixing exception when appStartInfo isn't available due to incomplete event log| -|[#6824](https://github.com/NVIDIA/spark-rapids/pull/6824)|Use alluxio Java API to mount instead of cmd| -|[#6918](https://github.com/NVIDIA/spark-rapids/pull/6918)|Added shim for Spark 3.3.2| -|[#6919](https://github.com/NVIDIA/spark-rapids/pull/6919)|Enable DPP and DPP+AQE on| -|[#6920](https://github.com/NVIDIA/spark-rapids/pull/6920)|Support Spark 3.3.1| -|[#6905](https://github.com/NVIDIA/spark-rapids/pull/6905)|Fix Spark 340 build error related to `checkForNumericExpr`| -|[#6899](https://github.com/NVIDIA/spark-rapids/pull/6899)|Add ApplicationSummaryInfo wrapper to allow mock tests| -|[#6910](https://github.com/NVIDIA/spark-rapids/pull/6910)|[FEA] Support string Instr function| -|[#6913](https://github.com/NVIDIA/spark-rapids/pull/6913)|[BUG] GpuPartitioning should close CVs before releasing semaphore| -|[#6833](https://github.com/NVIDIA/spark-rapids/pull/6833)|Flatten simple 4+ nesting of withResource| -|[#6757](https://github.com/NVIDIA/spark-rapids/pull/6757)|Add startupOnly tag to configs| -|[#6893](https://github.com/NVIDIA/spark-rapids/pull/6893)|Add different codepoint for unicode 13.0| -|[#6892](https://github.com/NVIDIA/spark-rapids/pull/6892)|Fix the Spark340 build error related to `mapKeyNotExistError`| -|[#6897](https://github.com/NVIDIA/spark-rapids/pull/6897)|Avoid coalescing files with mismatched schemas| -|[#6889](https://github.com/NVIDIA/spark-rapids/pull/6889)|Create target folder before attempting to add unique RUN_DIR| -|[#6891](https://github.com/NVIDIA/spark-rapids/pull/6891)|Remove invalid members from allow list [skip ci]| -|[#6827](https://github.com/NVIDIA/spark-rapids/pull/6827)|Follow on from recent regexp fixes to reject patterns that cuDF no longer rejects| -|[#6876](https://github.com/NVIDIA/spark-rapids/pull/6876)|Fix Spark 3.4 build issues| -|[#6866](https://github.com/NVIDIA/spark-rapids/pull/6866)|Use a unique run directory for each run when testing in run_pyspark_from_build| -|[#6877](https://github.com/NVIDIA/spark-rapids/pull/6877)|Plugin fixes after cuDF removed INT8 for binary columns in parquet writer| -|[#6873](https://github.com/NVIDIA/spark-rapids/pull/6873)|Add in support for zorder operators on databricks| -|[#6857](https://github.com/NVIDIA/spark-rapids/pull/6857)|Fix bug that can not switch Alluxio auto-mount option on the fly| -|[#6860](https://github.com/NVIDIA/spark-rapids/pull/6860)|Adjust to cudf removal of checks in scatter and repeat| -|[#6823](https://github.com/NVIDIA/spark-rapids/pull/6823)|Support columnar processing for mapInArrow| -|[#6813](https://github.com/NVIDIA/spark-rapids/pull/6813)|Move `_databricks_internal` check to shim layer| -|[#6796](https://github.com/NVIDIA/spark-rapids/pull/6796)|Qualification tool: Parse expressions in Join execs| -|[#6861](https://github.com/NVIDIA/spark-rapids/pull/6861)|Add check for is_spark_330cdh and update orc test to skip zstd for cdh| -|[#6849](https://github.com/NVIDIA/spark-rapids/pull/6849)|Cuda.deviceSynchronize as a last resort if we cannot spill enough| -|[#6859](https://github.com/NVIDIA/spark-rapids/pull/6859)|Reduce memory usage in aggregate.scala| -|[#6870](https://github.com/NVIDIA/spark-rapids/pull/6870)|Update the db hadoop jars version to 0007 for 10.4| -|[#6867](https://github.com/NVIDIA/spark-rapids/pull/6867)|Temporarily disable the failing tests of parquet writing.| -|[#6855](https://github.com/NVIDIA/spark-rapids/pull/6855)|Add the `FileIndexOptions` shims for Spark340| -|[#6847](https://github.com/NVIDIA/spark-rapids/pull/6847)|Fix integration builds failing with current directory not found| -|[#6854](https://github.com/NVIDIA/spark-rapids/pull/6854)|Fix setup-java step of blossom-ci [skip ci]| -|[#6852](https://github.com/NVIDIA/spark-rapids/pull/6852)|Fix deprecated Github actions API [skip ci]| -|[#6700](https://github.com/NVIDIA/spark-rapids/pull/6700)|Support zorder for deltalake and improve perf of range partitioning| -|[#6826](https://github.com/NVIDIA/spark-rapids/pull/6826)|Place hive scratch files under pytest $RUN_DIR| -|[#6819](https://github.com/NVIDIA/spark-rapids/pull/6819)|Move the `RapidsShuffleThreadedReader` from 330~340 shims to 330+ shims| -|[#6810](https://github.com/NVIDIA/spark-rapids/pull/6810)|Dump stack traces for tasks with the semaphore held when OOM goes unhandled| -|[#6815](https://github.com/NVIDIA/spark-rapids/pull/6815)|Update castPartValue function to fix ClassCastException| -|[#6766](https://github.com/NVIDIA/spark-rapids/pull/6766)|Adding timestamp functions into potential problems for qual tool| -|[#6809](https://github.com/NVIDIA/spark-rapids/pull/6809)|Relocate Scala files placed in the java/ directory| -|[#6804](https://github.com/NVIDIA/spark-rapids/pull/6804)|Fix auto merge conflict 6802 [skip ci]| -|[#6751](https://github.com/NVIDIA/spark-rapids/pull/6751)|Support columnar processing for FlatMapCoGroupInPandas| -|[#6783](https://github.com/NVIDIA/spark-rapids/pull/6783)|Revert "Temporarily xfail failing test_iceberg_parquet_read_round_trip test"| -|[#6780](https://github.com/NVIDIA/spark-rapids/pull/6780)|Fix auto merge conflict 6776| -|[#6763](https://github.com/NVIDIA/spark-rapids/pull/6763)|Fix a class casting error in concurrent writer when enabling AQE| -|[#6760](https://github.com/NVIDIA/spark-rapids/pull/6760)|Clean run directory before running tests in run_pyspark_from_build| -|[#6716](https://github.com/NVIDIA/spark-rapids/pull/6716)|Improve tiered project by eliminating eclipsed columns in each tier| -|[#6764](https://github.com/NVIDIA/spark-rapids/pull/6764)|Add supervisor(like systemd stuff) to auto restart Alluxio processes … [skip ci]| -|[#6726](https://github.com/NVIDIA/spark-rapids/pull/6726)|Provision hive scratch dir before test execution| -|[#6730](https://github.com/NVIDIA/spark-rapids/pull/6730)|Fix an unchecked conversion warning| -|[#6756](https://github.com/NVIDIA/spark-rapids/pull/6756)|Temporarily xfail failing test_iceberg_parquet_read_round_trip test| -|[#6743](https://github.com/NVIDIA/spark-rapids/pull/6743)|Add spark-rapids pulls to GitHub project [skip ci]| -|[#6681](https://github.com/NVIDIA/spark-rapids/pull/6681)|Fixes for more efficient bound checks for GpuCast| -|[#6742](https://github.com/NVIDIA/spark-rapids/pull/6742)|Rework for adding event log info for profiler output| -|[#6717](https://github.com/NVIDIA/spark-rapids/pull/6717)|Qualification tool: Parse expressions in Expand, Generate and TakeOrderedAndProject Execs| -|[#6741](https://github.com/NVIDIA/spark-rapids/pull/6741)|Reverse normalizing `nan` in the GpuSortArray| -|[#6728](https://github.com/NVIDIA/spark-rapids/pull/6728)|Disable maven-compiler-plugin| -|[#6644](https://github.com/NVIDIA/spark-rapids/pull/6644)|Simplify how we transpile negated character classes and add more tests| -|[#6706](https://github.com/NVIDIA/spark-rapids/pull/6706)|Adding new profiler output to map app with event log path| -|[#6704](https://github.com/NVIDIA/spark-rapids/pull/6704)|Removing --help tools tests that trigger System.exit()| -|[#6675](https://github.com/NVIDIA/spark-rapids/pull/6675)|Adding error handling to print help out when at end of command| -|[#6667](https://github.com/NVIDIA/spark-rapids/pull/6667)|Retain all heap dumps per JVM lifecycle| -|[#6583](https://github.com/NVIDIA/spark-rapids/pull/6583)|Update the `GpuSingleDirectoryDataWriter` and `GpuDynamicDirectorySingleDataWriter` to split ColumnarBatch when writing to match the `maxRecordsPerFile`| -|[#6649](https://github.com/NVIDIA/spark-rapids/pull/6649)|Update CUDF_VER to 22.12 for CI| -|[#6613](https://github.com/NVIDIA/spark-rapids/pull/6613)|Update project version to 22.12.0-SNAPSHOT| - -## Release 22.10 - -### Features -||| -|:---|:---| -|[#6323](https://github.com/NVIDIA/spark-rapids/issues/6323)|[FEA] AutoTuner Profiling Tool| -|[#6544](https://github.com/NVIDIA/spark-rapids/issues/6544)|[FEA] Update spark2 explain api code for 22.10| -|[#6322](https://github.com/NVIDIA/spark-rapids/issues/6322)|[FEA] Integrate AutoTuner into DataProc Rapids environment| -|[#6401](https://github.com/NVIDIA/spark-rapids/issues/6401)|[FEA] Support cast string to decimal(38,2)| -|[#6170](https://github.com/NVIDIA/spark-rapids/issues/6170)|[FEA] Qualification tool support plugin for running application| -|[#6067](https://github.com/NVIDIA/spark-rapids/issues/6067)|[FEA] Qualification Tool: For Databricks eventlog capture more information in output csv file| -|[#6632](https://github.com/NVIDIA/spark-rapids/issues/6632)|[FEA] Profiling tool: Suggest parameters to tune| -|[#5305](https://github.com/NVIDIA/spark-rapids/issues/5305)|[FEA] Qualification tool: Operator mapping, check if execs/expressions off by default| -|[#5589](https://github.com/NVIDIA/spark-rapids/issues/5589)|[FEA] `GpuGlobalLimitExec` and `GpuCollectLimitExec` support `offset`| -|[#6264](https://github.com/NVIDIA/spark-rapids/issues/6264)|[FEA] Qualification tool print unsupported execs and expressions| -|[#5409](https://github.com/NVIDIA/spark-rapids/issues/5409)|[FEA] Binary Data Write support for Parquet| -|[#6400](https://github.com/NVIDIA/spark-rapids/issues/6400)|[FEA] Windowing with decimal in orderBy.| -|[#6529](https://github.com/NVIDIA/spark-rapids/issues/6529)|[FEA] Update qualification speedup factors for CSP environments| -|[#5096](https://github.com/NVIDIA/spark-rapids/issues/5096)|[FEA] Support GroupBy Array[INT]| -|[#6496](https://github.com/NVIDIA/spark-rapids/issues/6496)|Allow filtering blocks to be done multithreaded in the parquet coalescing reader| -|[#6392](https://github.com/NVIDIA/spark-rapids/issues/6392)|[FEA] Support OptimizedCreateHiveTableAsSelectCommand (Hive CTAS with parquet)| -|[#6395](https://github.com/NVIDIA/spark-rapids/issues/6395)|[FEA] Remove the `hasNans` config from `GpuCollectSet`| -|[#5416](https://github.com/NVIDIA/spark-rapids/issues/5416)|[FEA] Support reading binary data types from Parquet as binary (not strings)| -|[#4656](https://github.com/NVIDIA/spark-rapids/issues/4656)|[FEA] Support Group-By on Array[String]| -|[#5942](https://github.com/NVIDIA/spark-rapids/issues/5942)|[FEA] Support multithreaded and coalescing read strategies for Apache Iceberg| -|[#3974](https://github.com/NVIDIA/spark-rapids/issues/3974)|[FEA] Fully implement multiply and divide for decimal128| -|[#6164](https://github.com/NVIDIA/spark-rapids/issues/6164)|[FEA] Add `Nan` handling in the `GpuMin`| -|[#6142](https://github.com/NVIDIA/spark-rapids/issues/6142)|[FEA] GpuAverage cannot guarantee proper overflow checks for a precision large than 23| -|[#6144](https://github.com/NVIDIA/spark-rapids/issues/6144)|[FEA] Support FromUTCTimestamp| -|[#5559](https://github.com/NVIDIA/spark-rapids/issues/5559)|[FEA] Add `GpuMapConcat` support for nested (array, struct, map) types.| -|[#6143](https://github.com/NVIDIA/spark-rapids/issues/6143)|[FEA] Avoid CPU fallback due to intermediate precision overflow when handling decimal| -|[#4061](https://github.com/NVIDIA/spark-rapids/issues/4061)|[FEA] Validate the size/complexity of regular expressions| -|[#6145](https://github.com/NVIDIA/spark-rapids/issues/6145)|[FEA] Avoid CPU fallback due to date_format:Failed to convert Unsupported word: SSS null.| -|[#6300](https://github.com/NVIDIA/spark-rapids/issues/6300)|[FEA] Profiling Tool supports recommendations for tuning| -|[#6267](https://github.com/NVIDIA/spark-rapids/issues/6267)|[FEA] Support ShuffleExchangeExec with BinaryType as input and output| - -### Performance -||| -|:---|:---| -|[#6708](https://github.com/NVIDIA/spark-rapids/issues/6708)|[BUG] Regression in NDSv2 of 4% because of spillable broadcast| -|[#5999](https://github.com/NVIDIA/spark-rapids/issues/5999)|[FEA] [improvement] Investigate DynamicPartitionDataConcurrentWriter to avoid full sort when writing partitioned data| -|[#6061](https://github.com/NVIDIA/spark-rapids/issues/6061)|[FEA] PoC shuffle read/decompress performance| -|[#4713](https://github.com/NVIDIA/spark-rapids/issues/4713)|[FEA] Running window optimization for percent rank| -|[#5085](https://github.com/NVIDIA/spark-rapids/issues/5085)|Could we evaluate once the child expressions of `GpuExtractChunk32`| -|[#6209](https://github.com/NVIDIA/spark-rapids/issues/6209)|revisit locality wait = 0 setting| -|[#5320](https://github.com/NVIDIA/spark-rapids/issues/5320)|[FEA] fix issues so we can remove hasNans config| -|[#6219](https://github.com/NVIDIA/spark-rapids/issues/6219)|[FEA] Do not read the real data when `readDataSchema` is empty in Avro multi-threaded reading.| - -### Bugs Fixed -||| -|:---|:---| -|[#6727](https://github.com/NVIDIA/spark-rapids/issues/6727)|[BUG] On SPARK-3.2.1 : java.lang.ClassCastException | -|[#6748](https://github.com/NVIDIA/spark-rapids/issues/6748)|[BUG] Casting strings CudfException: strings column has no children| -|[#6614](https://github.com/NVIDIA/spark-rapids/issues/6614)|[BUG] test_iceberg_read_parquet_compression_codec CPU and GPU output mismatched in PASCAL GPU| -|[#6723](https://github.com/NVIDIA/spark-rapids/issues/6723)|[BUG] null pointer exception selecting single column from iceberg table| -|[#6693](https://github.com/NVIDIA/spark-rapids/issues/6693)|[BUG] test_cast_string_to_negative_scale_decimal failed in nightly| -|[#6692](https://github.com/NVIDIA/spark-rapids/issues/6692)|[BUG] compile error deprecated method w/ jdk11| -|[#6431](https://github.com/NVIDIA/spark-rapids/issues/6431)|[BUG] Like does not work how we would like it to.| -|[#6659](https://github.com/NVIDIA/spark-rapids/issues/6659)|[BUG] Potential memory leaks in regexp_extract on the GPU| -|[#6515](https://github.com/NVIDIA/spark-rapids/issues/6515)|[BUG] RapidsShuffleThreadedWriterSuite failed to delete itermitent failure| -|[#6621](https://github.com/NVIDIA/spark-rapids/issues/6621)|[BUG] setting multi-threaded writer threads to 0 leads to divide-by-zero exception| -|[#6508](https://github.com/NVIDIA/spark-rapids/issues/6508)|[BUG] delta lake deletes/updates on Databricks can fail when using alluxio| -|[#6637](https://github.com/NVIDIA/spark-rapids/issues/6637)|[BUG] Qualification tool application time calculation can count stages twice if in separate sql queries | -|[#6578](https://github.com/NVIDIA/spark-rapids/issues/6578)|[BUG] Autotuner does not load worker-info from remote storage| -|[#6592](https://github.com/NVIDIA/spark-rapids/issues/6592)|[BUG] Delta Lake Deletes on Databricks broken with PERFILE parquet reader| -|[#6593](https://github.com/NVIDIA/spark-rapids/issues/6593)|[BUG] Avro tests using `packages` feature needs to enable snapshot repositories| -|[#6539](https://github.com/NVIDIA/spark-rapids/issues/6539)|Delta Lake and AQE on Databricks 10.4 workaround| -|[#3328](https://github.com/NVIDIA/spark-rapids/issues/3328)|[BUG] Segfault when partitioning empty batch| -|[#6572](https://github.com/NVIDIA/spark-rapids/issues/6572)|[BUG] UCX smoke tests can fail with OOM when initializing UCX| -|[#6312](https://github.com/NVIDIA/spark-rapids/issues/6312)|[BUG] Timestamp from GPU ORC reading is different from CPU ORC reading| -|[#6270](https://github.com/NVIDIA/spark-rapids/issues/6270)|[BUG] `UPDATE` on a Databricks (10.4) DELTA table leads to JVM crash| -|[#6404](https://github.com/NVIDIA/spark-rapids/issues/6404)|[BUG] DMLC XGBoost train FAILED against rapids-4-spark 22.10.0-SNAPSHOT FAILED| -|[#6531](https://github.com/NVIDIA/spark-rapids/issues/6531)|[BUG] window function of window function queries fail on Databricks 10.4| -|[#6559](https://github.com/NVIDIA/spark-rapids/issues/6559)|[BUG]EmptyHashedRelation$ cannot be cast to org.apache.spark.sql.rapids.execution.SerializeConcatHostBuffersDeserializeBatch| -|[#6501](https://github.com/NVIDIA/spark-rapids/issues/6501)|[BUG]cgroup directory permission get reverted on reboot| -|[#6558](https://github.com/NVIDIA/spark-rapids/issues/6558)|[BUG] orc_write_test.py::test_write_ cases failed| -|[#6519](https://github.com/NVIDIA/spark-rapids/issues/6519)|[BUG] Windowing skew caused GPU run OOM| -|[#135](https://github.com/NVIDIA/spark-rapids/issues/135)|[BUG] mergeSchema on ORC reads does not work| -|[#6302](https://github.com/NVIDIA/spark-rapids/issues/6302)|[BUG] `spark.sql.parquet.outputTimestampType` is not considered during read/write parquet for nested types containing timestamp| -|[#1059](https://github.com/NVIDIA/spark-rapids/issues/1059)|[BUG] adaptive query executor and delta optimized table writes don't work on databricks| -|[#6416](https://github.com/NVIDIA/spark-rapids/issues/6416)|[BUG] Example Jupyter notebook fails to parse and contains errors| -|[#5657](https://github.com/NVIDIA/spark-rapids/issues/5657)|[BUG] Documented deployment of spark-avro is not tested| -|[#6520](https://github.com/NVIDIA/spark-rapids/issues/6520)|[BUG] NoClassDefFoundError: com/nvidia/spark/rapids/shims/PlanShims in UCX tests| -|[#6397](https://github.com/NVIDIA/spark-rapids/issues/6397)|[BUG] GpuBringBackToHost doExecute needs columnar conversion| -|[#6460](https://github.com/NVIDIA/spark-rapids/issues/6460)|[BUG] test_hash_grpby_sum_full_decimal fails| -|[#6465](https://github.com/NVIDIA/spark-rapids/issues/6465)|[BUG] orc_cast_test fails on CDH| -|[#6478](https://github.com/NVIDIA/spark-rapids/issues/6478)|[BUG] test_cast_float_to_timestamp_side_effect intermittently fails| -|[#6372](https://github.com/NVIDIA/spark-rapids/issues/6372)|[BUG] Decimal average excessively checks for overflow| -|[#6467](https://github.com/NVIDIA/spark-rapids/issues/6467)|[BUG] Fix DOP calculations for xdist | -|[#6428](https://github.com/NVIDIA/spark-rapids/issues/6428)|[BUG] IntervalDivisionSuite has memory leak| -|[#6438](https://github.com/NVIDIA/spark-rapids/issues/6438)|[BUG] `GpuSortArray` doesn't match the behavior of Spark when handling `Nan`s| -|[#6442](https://github.com/NVIDIA/spark-rapids/issues/6442)|[BUG] java.lang.ClassNotFoundException: org.apache.spark.sql.rapids.execution.SerializeConcatHostBuffersDeserializeBatch| -|[#6417](https://github.com/NVIDIA/spark-rapids/issues/6417)|[BUG] CDH integration tests ClassNotFoundException: com.nvidia.spark.rapids.spark321cdh.RapidsShuffleManager| -|[#6471](https://github.com/NVIDIA/spark-rapids/issues/6471)|[BUG] Encrypted Parquet writes are not falling back if configs are set in configuration| -|[#6433](https://github.com/NVIDIA/spark-rapids/issues/6433)|[BUG] dist module "install" should install reduced pom | -|[#6240](https://github.com/NVIDIA/spark-rapids/issues/6240)|[BUG] shuffle file can not be deleted correctly when use RapidsShuffleManager.| -|[#6446](https://github.com/NVIDIA/spark-rapids/issues/6446)|[BUG] test_casting_from_integer[timestamp] fails on databricks321| -|[#6426](https://github.com/NVIDIA/spark-rapids/issues/6426)|[BUG] GpuShuffledHashJoinExecSuite has leaks| -|[#6447](https://github.com/NVIDIA/spark-rapids/issues/6447)|[BUG] Python UDF triggered java.lang.NullPointerException| -|[#6406](https://github.com/NVIDIA/spark-rapids/issues/6406)|[BUG] integration tests arithmetic_ops_test.test_day_time_interval_multiply_number failing| -|[#6340](https://github.com/NVIDIA/spark-rapids/issues/6340)|[BUG] test_hash_grpby_sum_full_decimal can fail with negative numbers| -|[#6368](https://github.com/NVIDIA/spark-rapids/issues/6368)|[BUG] It's confusing that BASE_SPARK_VERSION in jenkins/databricks/build.sh, but BASE_SPARK_VER in databricks/test.sh| -|[#6351](https://github.com/NVIDIA/spark-rapids/issues/6351)|[BUG] Implement escape characters for spark property encoding in PYSP_TEST env variables| -|[#6284](https://github.com/NVIDIA/spark-rapids/issues/6284)|[BUG] `date_format` cannot output with subsecond| -|[#6341](https://github.com/NVIDIA/spark-rapids/issues/6341)|[BUG] test_decimal_multiplication_mixed_no_overflow_guarantees fails for some negative values| -|[#6303](https://github.com/NVIDIA/spark-rapids/issues/6303)|[BUG] Coalescing readers don't include filterblock time in scan time metric| -|[#6363](https://github.com/NVIDIA/spark-rapids/issues/6363)|[BUG] missing zip utility on CI | -|[#6073](https://github.com/NVIDIA/spark-rapids/issues/6073)|[SPARK-39806][SQL] Accessing _metadata on partitioned table can crash a query| -|[#6330](https://github.com/NVIDIA/spark-rapids/issues/6330)|[BUG] withPsNote on ArrayMin does not appear in generated docs| -|[#6332](https://github.com/NVIDIA/spark-rapids/issues/6332)|[BUG] `array_min` does not fall back to CPU when `hasNan = true`| -|[#6352](https://github.com/NVIDIA/spark-rapids/issues/6352)|[BUG] Reading Binary Type in Iceberg table fallback to CPU| -|[#6347](https://github.com/NVIDIA/spark-rapids/issues/6347)|[BUG] test_delta_metadata_query_fallback failed in spark32X| -|[#6359](https://github.com/NVIDIA/spark-rapids/issues/6359)|[BUG] test_from_json_map failed| -|[#5619](https://github.com/NVIDIA/spark-rapids/issues/5619)|[BUG] Mixing parquet input files with different schemas results in crashes| -|[#6344](https://github.com/NVIDIA/spark-rapids/issues/6344)|[BUG] Iceberg tests fail due to duplication of spark.jarc conf via PYSP_TST and on the command line| -|[#3851](https://github.com/NVIDIA/spark-rapids/issues/3851)|[BUG] ShimLoader.updateSparkClassLoader fails with openjdk Java11| -|[#5714](https://github.com/NVIDIA/spark-rapids/issues/5714)|[BUG] discrepancy in the plugin jar deployment in run_pyspark_from_build.sh depending on TEST_PARALLEL| -|[#6294](https://github.com/NVIDIA/spark-rapids/issues/6294)|[BUG] Incorrect result when casting timestamp to string| -|[#6165](https://github.com/NVIDIA/spark-rapids/issues/6165)|[BUG] AnsiCastOpSuite fail in spark331 shim| -|[#6308](https://github.com/NVIDIA/spark-rapids/issues/6308)|[BUG] Integration tests failing on Spark 3.2 due to BinaryType| -|[#6243](https://github.com/NVIDIA/spark-rapids/issues/6243)|[BUG] AST fuzz test regexp find, replace fail| -|[#6236](https://github.com/NVIDIA/spark-rapids/issues/6236)|[BUG] integration tests corrupt executorEnv names containing underscore| -|[#5706](https://github.com/NVIDIA/spark-rapids/issues/5706)|[BUG] buildall --generate-bloop creates projects that Metals/Bloop does not recognize in VS code| - -### PRs -||| -|:---|:---| -|[#6907](https://github.com/NVIDIA/spark-rapids/pull/6907)|[Doc]a hot fix for download links versions[skip ci]| -|[#6803](https://github.com/NVIDIA/spark-rapids/pull/6803)|Updated 22.10 changelog to latest [skip ci]| -|[#6799](https://github.com/NVIDIA/spark-rapids/pull/6799)|Update JNI version to released 22.10.0| -|[#6755](https://github.com/NVIDIA/spark-rapids/pull/6755)|[doc] Add diagnostic tool section to GCP Dataproc getting started page [skip ci]| -|[#6734](https://github.com/NVIDIA/spark-rapids/pull/6734)|Init 22.10 changelog [skip ci]| -|[#6770](https://github.com/NVIDIA/spark-rapids/pull/6770)|Revert "Docker container for ease of deployment to Databricks [skip ci]"| -|[#6754](https://github.com/NVIDIA/spark-rapids/pull/6754)|[Doc] update getting started guide for emr 6.8.0 release[skip ci]| -|[#6772](https://github.com/NVIDIA/spark-rapids/pull/6772)|[Doc]remove group on array in 22.10, target in 22.12[skip ci]| -|[#6767](https://github.com/NVIDIA/spark-rapids/pull/6767)|Avoid any issues with scalar values returned by evalColumnar| -|[#6765](https://github.com/NVIDIA/spark-rapids/pull/6765)|[DOC] Add gcp dataproc gpu limit [skip ci]| -|[#6703](https://github.com/NVIDIA/spark-rapids/pull/6703)|Docker container for ease of deployment to Databricks [skip ci]| -|[#6750](https://github.com/NVIDIA/spark-rapids/pull/6750)|Enabling decimal 38,2 casting| -|[#6729](https://github.com/NVIDIA/spark-rapids/pull/6729)|Fix NullPointerException in iceberg schema parsing code when selecting single column| -|[#6724](https://github.com/NVIDIA/spark-rapids/pull/6724)|Qualification tool: Read SQL function names for parsing expressions| -|[#6695](https://github.com/NVIDIA/spark-rapids/pull/6695)|[Doc] Adding Dataproc quick start steps to use new user tools package [skip ci]| -|[#6719](https://github.com/NVIDIA/spark-rapids/pull/6719)|Document that we test on JDK8 and JDK11, other versions are untested [skip ci]| -|[#6721](https://github.com/NVIDIA/spark-rapids/pull/6721)|Fix a couple of markdown links that are now permanently moved [skip ci]| -|[#6701](https://github.com/NVIDIA/spark-rapids/pull/6701)|Add AutoTuner documentation [skip ci]| -|[#6709](https://github.com/NVIDIA/spark-rapids/pull/6709)|Take semaphore after first stream batch is materialized (broadcast)| -|[#6697](https://github.com/NVIDIA/spark-rapids/pull/6697)|Fix AutoTuner yaml error handling and discovery script rounding| -|[#6705](https://github.com/NVIDIA/spark-rapids/pull/6705)|Suppress warning for jdk11 Finalize method deprecation| -|[#6691](https://github.com/NVIDIA/spark-rapids/pull/6691)|Fix validity checks for large decimal window bounds| -|[#6670](https://github.com/NVIDIA/spark-rapids/pull/6670)|[Doc]Add 22.10 download page[skip ci]| -|[#6690](https://github.com/NVIDIA/spark-rapids/pull/6690)|Update spark2 code for Revert "Add support for arrays in hashaggregate"| -|[#6689](https://github.com/NVIDIA/spark-rapids/pull/6689)|Fix the maxPartitionBytes recommendation by AutoTuner to use the max task input bytes| -|[#6652](https://github.com/NVIDIA/spark-rapids/pull/6652)|Revise AutoTuner to match the BootStrap tool| -|[#6616](https://github.com/NVIDIA/spark-rapids/pull/6616)|String to decimal casting custom kernel| -|[#6679](https://github.com/NVIDIA/spark-rapids/pull/6679)|Revert "Add support for arrays in hashaggregate (#6066)"| -|[#6631](https://github.com/NVIDIA/spark-rapids/pull/6631)|Fixes split estimation in explode/explode_outer| -|[#6604](https://github.com/NVIDIA/spark-rapids/pull/6604)|Make broadcast tables spillable| -|[#6666](https://github.com/NVIDIA/spark-rapids/pull/6666)|Fix resource leaks in regexp_extract_all| -|[#6657](https://github.com/NVIDIA/spark-rapids/pull/6657)|Add Qualification tool support for running application - per sql output| -|[#6662](https://github.com/NVIDIA/spark-rapids/pull/6662)|update spark2 code| -|[#6643](https://github.com/NVIDIA/spark-rapids/pull/6643)|Workflow to add new issues to Github global project [skip ci]| -|[#6648](https://github.com/NVIDIA/spark-rapids/pull/6648)|Update iceberg doc for split size options [docs]| -|[#6640](https://github.com/NVIDIA/spark-rapids/pull/6640)|Avoid failing test on cleanup when filesystem has issues| -|[#6641](https://github.com/NVIDIA/spark-rapids/pull/6641)|Fix case where number of shuffle writer threads is set to 0| -|[#6638](https://github.com/NVIDIA/spark-rapids/pull/6638)|Qualification tool: Print cluster usage tags to csv and log file| -|[#6651](https://github.com/NVIDIA/spark-rapids/pull/6651)|Changing toList to toIterator to improve memory optimization and runt…| -|[#6601](https://github.com/NVIDIA/spark-rapids/pull/6601)|delta lake deletes/updates on Databricks fail when using alluxio| -|[#6642](https://github.com/NVIDIA/spark-rapids/pull/6642)|Qualification tool application time calculation can count stages twice if in separate sql queries | -|[#6606](https://github.com/NVIDIA/spark-rapids/pull/6606)|Print nvidia-smi output when a task fails due to a cuda fatal exception.| -|[#6630](https://github.com/NVIDIA/spark-rapids/pull/6630)|Allow AutoTuner to accept remore path for WorkerInfo| -|[#6627](https://github.com/NVIDIA/spark-rapids/pull/6627)|Move spark331 back to list of snapshot shims| -|[#6617](https://github.com/NVIDIA/spark-rapids/pull/6617)|Fix a Delta Lake Deletes issue| -|[#6612](https://github.com/NVIDIA/spark-rapids/pull/6612)|Tolerate event log folder existence when to create it to avoid raisin…| -|[#6610](https://github.com/NVIDIA/spark-rapids/pull/6610)|Disable 22.10 snapshot builds| -|[#6607](https://github.com/NVIDIA/spark-rapids/pull/6607)|Enable tests that were missed when binary support was extended| -|[#6584](https://github.com/NVIDIA/spark-rapids/pull/6584)|Fix spark2-sql-plugin| -|[#6506](https://github.com/NVIDIA/spark-rapids/pull/6506)|Add alluxio reliability doc| -|[#6609](https://github.com/NVIDIA/spark-rapids/pull/6609)|Enable automerge from 22.10 to 22.12 [skip ci]| -|[#6569](https://github.com/NVIDIA/spark-rapids/pull/6569)|Add dynamic partition concurrent writer to avoid full sort| -|[#6602](https://github.com/NVIDIA/spark-rapids/pull/6602)|Fix version-def script to correctly set list of shims| -|[#6599](https://github.com/NVIDIA/spark-rapids/pull/6599)|Add in support for casting binary to string| -|[#6432](https://github.com/NVIDIA/spark-rapids/pull/6432)|[Doc]Add archived release page[skip ci]| -|[#6594](https://github.com/NVIDIA/spark-rapids/pull/6594)|Add Apache snapshot repository when running Avro tests| -|[#6574](https://github.com/NVIDIA/spark-rapids/pull/6574)|Add shim layer for Cloudera CDS 3.3| -|[#6412](https://github.com/NVIDIA/spark-rapids/pull/6412)|Qualification tool: Print unsupported Execs and expressions| -|[#6590](https://github.com/NVIDIA/spark-rapids/pull/6590)|Parallelize tests using spark packages feature| -|[#6589](https://github.com/NVIDIA/spark-rapids/pull/6589)|Update doc to indicate ORC and Parquet zstd read support [skip ci]| -|[#6437](https://github.com/NVIDIA/spark-rapids/pull/6437)|Use dist/pom file as source of truth for spark versions| -|[#6587](https://github.com/NVIDIA/spark-rapids/pull/6587)|Delta Lake and AQE on Databricks 10.4 workaround| -|[#6573](https://github.com/NVIDIA/spark-rapids/pull/6573)|Update UCX to 1.13.1 in CI and sets UCX_TLS=^posix| -|[#6586](https://github.com/NVIDIA/spark-rapids/pull/6586)|Adds link to spark supporting shuffle classes and fix copyright| -|[#6545](https://github.com/NVIDIA/spark-rapids/pull/6545)|Allow ORC tests to run with wider range of timestamp input| -|[#6511](https://github.com/NVIDIA/spark-rapids/pull/6511)|Multi-threaded shuffle reader for RapidsShuffleManager| -|[#6576](https://github.com/NVIDIA/spark-rapids/pull/6576)|Bump snakeyaml version to 1.32| -|[#6577](https://github.com/NVIDIA/spark-rapids/pull/6577)|Work around multiprocess issues with updating Ivy cache| -|[#6579](https://github.com/NVIDIA/spark-rapids/pull/6579)|Disable UCX smoke test temporarily| -|[#6564](https://github.com/NVIDIA/spark-rapids/pull/6564)|Fix the check of empty batches for partitioning| -|[#6534](https://github.com/NVIDIA/spark-rapids/pull/6534)|Add GpuColumnVectorUtils to access GpuColumnVector| -|[#6575](https://github.com/NVIDIA/spark-rapids/pull/6575)|Fix maxPartitionBytes bounds checking in AutoTuner| -|[#6553](https://github.com/NVIDIA/spark-rapids/pull/6553)|Update handling for projectList based WindowExecs to handle window function of window function| -|[#6562](https://github.com/NVIDIA/spark-rapids/pull/6562)|Handle EmptyRelation in GpuSubqueryBroadcastExec| -|[#6504](https://github.com/NVIDIA/spark-rapids/pull/6504)|[DOC] Add notes for cgroup permission reverted[skip ci]| -|[#6554](https://github.com/NVIDIA/spark-rapids/pull/6554)|Support Decimal ordering column for `RANGE` window functions| -|[#6550](https://github.com/NVIDIA/spark-rapids/pull/6550)|Allow percent_rank to not need an entire group in memory| -|[#6557](https://github.com/NVIDIA/spark-rapids/pull/6557)|Mitigate non-test failure and remove 21.xx premerge support| -|[#6566](https://github.com/NVIDIA/spark-rapids/pull/6566)|Fix map gen for `orc_write_test.py`| -|[#6563](https://github.com/NVIDIA/spark-rapids/pull/6563)|Add missing closing ``` for a code block [skip ci]| -|[#6512](https://github.com/NVIDIA/spark-rapids/pull/6512)|Remove the `hasNans` config and update the doc| -|[#6542](https://github.com/NVIDIA/spark-rapids/pull/6542)|[Doc]Doc update for databricks single node cluster[skip ci]| -|[#6555](https://github.com/NVIDIA/spark-rapids/pull/6555)|Document a safe unshimming algorithm [skip ci]| -|[#6549](https://github.com/NVIDIA/spark-rapids/pull/6549)|Update SnakeYaml version for bug fixes| -|[#6523](https://github.com/NVIDIA/spark-rapids/pull/6523)|ORC reading supports mergeSchema| -|[#6522](https://github.com/NVIDIA/spark-rapids/pull/6522)|Nightly spark-tests script to follow PYSP_TEST pattern [skip ci]| -|[#6548](https://github.com/NVIDIA/spark-rapids/pull/6548)|Fixes for recent cuDF regexp changes| -|[#6541](https://github.com/NVIDIA/spark-rapids/pull/6541)|Add another alluxio path replacement algorithm| -|[#6547](https://github.com/NVIDIA/spark-rapids/pull/6547)|Append new authorized user to blossom-ci whitelist [skip ci]| -|[#6429](https://github.com/NVIDIA/spark-rapids/pull/6429)|Fix up `buffer time` for multi-file readers| -|[#6473](https://github.com/NVIDIA/spark-rapids/pull/6473)|Fix parquet write when the input column is nested type containing timestamp| -|[#6461](https://github.com/NVIDIA/spark-rapids/pull/6461)|Enabling AQE on| -|[#6436](https://github.com/NVIDIA/spark-rapids/pull/6436)|Switch to gpu string to integer casts| -|[#6538](https://github.com/NVIDIA/spark-rapids/pull/6538)|Updating qual tool speedup factors from latest CSP benchmarks| -|[#6421](https://github.com/NVIDIA/spark-rapids/pull/6421)|Fix notebook and getting started examples [skip ci]| -|[#6505](https://github.com/NVIDIA/spark-rapids/pull/6505)|Include avro test by using '--packages' option [skip ci]| -|[#6525](https://github.com/NVIDIA/spark-rapids/pull/6525)|Fix typo in file name| -|[#6527](https://github.com/NVIDIA/spark-rapids/pull/6527)|Use ShimLoader to access PlanShims| -|[#6466](https://github.com/NVIDIA/spark-rapids/pull/6466)|Use tiered projections for hash aggregates| -|[#6510](https://github.com/NVIDIA/spark-rapids/pull/6510)|Revert "Added in very specific support for from_json to a Map (#6211)"| -|[#6319](https://github.com/NVIDIA/spark-rapids/pull/6319)|Support float/double castings for ORC reading| -|[#6498](https://github.com/NVIDIA/spark-rapids/pull/6498)|Allow filtering blocks to be done multithreaded in the Parquet coalescing reader| -|[#6507](https://github.com/NVIDIA/spark-rapids/pull/6507)|Perform columnar-to-row transition in GpuBringBackToHost.doExecute| -|[#6491](https://github.com/NVIDIA/spark-rapids/pull/6491)|[DOC] Change recommend setting of spark.locality.wait to 3s [skip ci]| -|[#6476](https://github.com/NVIDIA/spark-rapids/pull/6476)|Add GPU acceleration for OptimizedCreateHiveTableAsSelect| -|[#6499](https://github.com/NVIDIA/spark-rapids/pull/6499)|Fix non-deterministic overflows in test_hash_grpby_sum_full_decimal| -|[#6490](https://github.com/NVIDIA/spark-rapids/pull/6490)|Fix: orc_cast_test fails on CDH| -|[#6486](https://github.com/NVIDIA/spark-rapids/pull/6486)|Remove the `hasNans` config from `GpuCollectSet`| -|[#6484](https://github.com/NVIDIA/spark-rapids/pull/6484)|Fixes excessive ShuffleBlockId object creation due to missing map index bounds| -|[#6492](https://github.com/NVIDIA/spark-rapids/pull/6492)|Fix intermittent failure on test_cast_float_to_timestamp_side_effect| -|[#6483](https://github.com/NVIDIA/spark-rapids/pull/6483)|Fix DOP calculation for xdist| -|[#6479](https://github.com/NVIDIA/spark-rapids/pull/6479)|Remove KnownFloatingPointNormalized from allow_non_gpu| -|[#6482](https://github.com/NVIDIA/spark-rapids/pull/6482)|Fix leak in interval divide| -|[#6451](https://github.com/NVIDIA/spark-rapids/pull/6451)|Normalize nans in GpuSortArray| -|[#6066](https://github.com/NVIDIA/spark-rapids/pull/6066)|Add support for arrays in hashaggregate| -|[#6475](https://github.com/NVIDIA/spark-rapids/pull/6475)|Change GpuKryoRegistrator to load the classes we want to register with the ShimLoader| -|[#6472](https://github.com/NVIDIA/spark-rapids/pull/6472)|Check more places for Parquet encryption configs| -|[#6468](https://github.com/NVIDIA/spark-rapids/pull/6468)|Use non-capture groups in LIKE regexp pattern| -|[#6434](https://github.com/NVIDIA/spark-rapids/pull/6434)|Install reduced pom for dist module| -|[#6462](https://github.com/NVIDIA/spark-rapids/pull/6462)|Increase stability of pytest run with PVC storage| -|[#6454](https://github.com/NVIDIA/spark-rapids/pull/6454)|Support bool/int8/16/32/64 castings for ORC reading| -|[#6422](https://github.com/NVIDIA/spark-rapids/pull/6422)|Iceberg supports coalescing reading for Parquet | -|[#6450](https://github.com/NVIDIA/spark-rapids/pull/6450)|Add new github ID to blossom-ci allow list [skip ci]| -|[#6458](https://github.com/NVIDIA/spark-rapids/pull/6458)|Change some Alluxio log messages to be debug| -|[#6457](https://github.com/NVIDIA/spark-rapids/pull/6457)|Reading delta log Table Checkpoint files should fallback the entire plan| -|[#6439](https://github.com/NVIDIA/spark-rapids/pull/6439)|Fix leaks in GpuShuffledHashJoinExecSuite| -|[#6251](https://github.com/NVIDIA/spark-rapids/pull/6251)|Add `Nan` handling in the `GpuMin`| -|[#6449](https://github.com/NVIDIA/spark-rapids/pull/6449)|Remove caching of needles in GpuInSet| -|[#6414](https://github.com/NVIDIA/spark-rapids/pull/6414)|Add support for full 128-bit decimal divide| -|[#6448](https://github.com/NVIDIA/spark-rapids/pull/6448)|Revert patch that caused failing test on databricks 321| -|[#6441](https://github.com/NVIDIA/spark-rapids/pull/6441)|Skip decimal gens that overflow on Spark 3.3.0+| -|[#6273](https://github.com/NVIDIA/spark-rapids/pull/6273)|Support bool/int8/int16/int32/int64 castings for ORC reading.| -|[#6370](https://github.com/NVIDIA/spark-rapids/pull/6370)|Support simple pass-through for `FromUTCTimestamp`| -|[#6290](https://github.com/NVIDIA/spark-rapids/pull/6290)|Add `GpuMapConcat` support for nested type keys.| -|[#6405](https://github.com/NVIDIA/spark-rapids/pull/6405)|Support more timestamp format when casting string to timestamp| -|[#6418](https://github.com/NVIDIA/spark-rapids/pull/6418)|Fix tests for DateTimeInterval that were overflowing on CPU| -|[#6410](https://github.com/NVIDIA/spark-rapids/pull/6410)|Fix handling of older array encodings in Parquet| -|[#6398](https://github.com/NVIDIA/spark-rapids/pull/6398)|Fix DecimalGen to generate full range and fix failing test cases| -|[#6396](https://github.com/NVIDIA/spark-rapids/pull/6396)|Make the variable "BASE_SPARK_VERSION" consistent| -|[#6409](https://github.com/NVIDIA/spark-rapids/pull/6409)|Fix test_dpp_from_swizzled_hash_keys on CDH| -|[#6407](https://github.com/NVIDIA/spark-rapids/pull/6407)|Remove empty unreferenced file unshimmed-spark311.txt| -|[#6379](https://github.com/NVIDIA/spark-rapids/pull/6379)|Rebalance time of parallel stages for pre-merge CI| -|[#6358](https://github.com/NVIDIA/spark-rapids/pull/6358)|Support _ in spark conf of integration tests| -|[#6387](https://github.com/NVIDIA/spark-rapids/pull/6387)|Use new custom kernel for large decimal multiply| -|[#6355](https://github.com/NVIDIA/spark-rapids/pull/6355)|Include filterblock time in scan time metric for Coalescing readers| -|[#6393](https://github.com/NVIDIA/spark-rapids/pull/6393)|Add zip&unzip in pre-merge dockerfile| -|[#6374](https://github.com/NVIDIA/spark-rapids/pull/6374)|Remove anthony-chang [skip ci]| -|[#6349](https://github.com/NVIDIA/spark-rapids/pull/6349)|Add `Nan` handling in `GpuArrayMin`| -|[#6371](https://github.com/NVIDIA/spark-rapids/pull/6371)|Fix datetime name collision in cast_test| -|[#6361](https://github.com/NVIDIA/spark-rapids/pull/6361)|Binary type support in Iceberg read| -|[#6306](https://github.com/NVIDIA/spark-rapids/pull/6306)|Struct null aware equality comparator <=> support| -|[#6350](https://github.com/NVIDIA/spark-rapids/pull/6350)|Allow writing Binary data in Parquet| -|[#6365](https://github.com/NVIDIA/spark-rapids/pull/6365)|Honor delta_lake marker for pytest| -|[#6271](https://github.com/NVIDIA/spark-rapids/pull/6271)|Add format `SSS` for `date_format` function| -|[#6338](https://github.com/NVIDIA/spark-rapids/pull/6338)|Adding AutoTuner to Profiling Tool| -|[#6356](https://github.com/NVIDIA/spark-rapids/pull/6356)|Fix auto merge conflict 6353 [skip ci]| -|[#6342](https://github.com/NVIDIA/spark-rapids/pull/6342)|Avoid passing duplicate conf to spark_init_internal| -|[#6286](https://github.com/NVIDIA/spark-rapids/pull/6286)|Change `TimestampGen` unit in integration test from millisecond to microsecond| -|[#6335](https://github.com/NVIDIA/spark-rapids/pull/6335)|Add missing subnet option to dataproc cluster example [skip ci]| -|[#6307](https://github.com/NVIDIA/spark-rapids/pull/6307)|Add more information in FileSourceScanExec log when timezone is not UTC| -|[#5981](https://github.com/NVIDIA/spark-rapids/pull/5981)|Run Delta Lake tests with Spark 3.2.x| -|[#5646](https://github.com/NVIDIA/spark-rapids/pull/5646)|Use Spark's `Utils.getContextOrSparkClassLoader` to load Shims| -|[#6333](https://github.com/NVIDIA/spark-rapids/pull/6333)|Make run_pyspark to report fail and error as default| -|[#6044](https://github.com/NVIDIA/spark-rapids/pull/6044)|[BUG] Fix IT discrepancy which depending on TEST_PARALLEL| -|[#6311](https://github.com/NVIDIA/spark-rapids/pull/6311)|Re-implement cast timestamp to string and add more tests| -|[#6316](https://github.com/NVIDIA/spark-rapids/pull/6316)|Add Nan handling for `GpuArrayMax`| -|[#6256](https://github.com/NVIDIA/spark-rapids/pull/6256)|[Bug] Add Expr OverflowInTableInsert to fix AnsiCastOpSuite| -|[#6314](https://github.com/NVIDIA/spark-rapids/pull/6314)|Increase robustness of mvn commands in nightly scripts| -|[#6318](https://github.com/NVIDIA/spark-rapids/pull/6318)|[BugFix]Change the RapidsDiskBlockManager in ShuffleBufferCatalog to guarantee the shuffle files can be cleaned successfully| -|[#6006](https://github.com/NVIDIA/spark-rapids/pull/6006)|Estimate and validate regular expression complexities| -|[#6305](https://github.com/NVIDIA/spark-rapids/pull/6305)|Increase robustness of MVN commands in pre-merge scripts| -|[#6309](https://github.com/NVIDIA/spark-rapids/pull/6309)|Add BinaryType to some shimmed expressions| -|[#6062](https://github.com/NVIDIA/spark-rapids/pull/6062)|Nested struct binary comparison operator support| -|[#6298](https://github.com/NVIDIA/spark-rapids/pull/6298)|Add BinaryType support to operations that already support arrays| -|[#6297](https://github.com/NVIDIA/spark-rapids/pull/6297)|Fix merge conflict with branch-22.08| -|[#6241](https://github.com/NVIDIA/spark-rapids/pull/6241)|Read metadata only when read schema is empty in Avro multi-threaded reading| -|[#5989](https://github.com/NVIDIA/spark-rapids/pull/5989)|Add `NaN` handling in `GpuMax`| -|[#6203](https://github.com/NVIDIA/spark-rapids/pull/6203)|Add config option to log all query transformations| -|[#6246](https://github.com/NVIDIA/spark-rapids/pull/6246)|Fix merge conflict with 22.08| -|[#6247](https://github.com/NVIDIA/spark-rapids/pull/6247)|regexp: Catch "nothing to repeat" errors nested in groups| -|[#6237](https://github.com/NVIDIA/spark-rapids/pull/6237)|Preserve underscore in executorEnv in integration tests| -|[#6235](https://github.com/NVIDIA/spark-rapids/pull/6235)|Fix merge conflict with branch-22.08| -|[#6110](https://github.com/NVIDIA/spark-rapids/pull/6110)|Iceberg Parquet supports multi-threaded reading.| -|[#6227](https://github.com/NVIDIA/spark-rapids/pull/6227)|Configurable task failures in integration tests| -|[#6194](https://github.com/NVIDIA/spark-rapids/pull/6194)|Make dist jar compression opt-out optional| -|[#6211](https://github.com/NVIDIA/spark-rapids/pull/6211)|Added in very specific support for from_json to a Map| -|[#6218](https://github.com/NVIDIA/spark-rapids/pull/6218)|Disable overflow tableInsert tests for 331+| -|[#6210](https://github.com/NVIDIA/spark-rapids/pull/6210)|Fix merge conflict with branch-22.08| -|[#6152](https://github.com/NVIDIA/spark-rapids/pull/6152)|Improve coverage in mvn verify check github workflow| -|[#6156](https://github.com/NVIDIA/spark-rapids/pull/6156)|Fix Bloop project generation in buildall [skip ci]| -|[#5946](https://github.com/NVIDIA/spark-rapids/pull/5946)|GpuGlobalLimitExec and GpuCollectLimitExec support offset| -|[#6162](https://github.com/NVIDIA/spark-rapids/pull/6162)|Remove hard-coded versions from buildall [skip ci]| -|[#6055](https://github.com/NVIDIA/spark-rapids/pull/6055)|Add tests for .count() in the file readers| -|[#6129](https://github.com/NVIDIA/spark-rapids/pull/6129)|Init 22.10.0-SNAPSHOT| - -## Release 22.08 - -### Features -||| -|:---|:---| -|[#6081](https://github.com/NVIDIA/spark-rapids/issues/6081)|[FEA] Update spark2 code for 22.08| -|[#5508](https://github.com/NVIDIA/spark-rapids/issues/5508)|[FEA] collect_set on struct[Array]| -|[#5222](https://github.com/NVIDIA/spark-rapids/issues/5222)|[FEA] Support function array_except | -|[#5228](https://github.com/NVIDIA/spark-rapids/issues/5228)|[FEA] Support array_union| -|[#5188](https://github.com/NVIDIA/spark-rapids/issues/5188)|[FEA] Support arrays_overlap| -|[#4932](https://github.com/NVIDIA/spark-rapids/issues/4932)|[FEA] Support ArrayIntersect on at least Arrays of String| -|[#4005](https://github.com/NVIDIA/spark-rapids/issues/4005)|[FEA] Support First() in windowing context with Integer type| -|[#5061](https://github.com/NVIDIA/spark-rapids/issues/5061)|[FEA] Support last in windowing context for Integer type.| -|[#6059](https://github.com/NVIDIA/spark-rapids/issues/6059)|[FEA] Add SQL table to Qualification's app-details view| -|[#5617](https://github.com/NVIDIA/spark-rapids/issues/5617)|[FEA] Qualification tool support parsing expressions (part 1)| -|[#4719](https://github.com/NVIDIA/spark-rapids/issues/4719)|[FEA] GpuStringSplit: Add support for line and string anchors in regular expressions| -|[#5502](https://github.com/NVIDIA/spark-rapids/issues/5502)|[FEA] Qualification tool should use SQL ID of each Application ID like profiling tool| -|[#5524](https://github.com/NVIDIA/spark-rapids/issues/5524)|[FEA] Automatically adjust spark.rapids.sql.format.parquet.multiThreadedRead.numThreads to the same as spark.executor.cores| -|[#4817](https://github.com/NVIDIA/spark-rapids/issues/4817)|[FEA] Support Iceberg batch reads| -|[#5510](https://github.com/NVIDIA/spark-rapids/issues/5510)|[FEA] Support Iceberg for data INSERT, DELETE operations| -|[#5890](https://github.com/NVIDIA/spark-rapids/issues/5890)|[FEA] Mount the alluxio buckets/paths on the fly when the query is being executed| -|[#6018](https://github.com/NVIDIA/spark-rapids/issues/6018)|[FEA] Support Spark 3.2.2 | -|[#5417](https://github.com/NVIDIA/spark-rapids/issues/5417)|[FEA] Fully support reading parquet binary as string| -|[#4283](https://github.com/NVIDIA/spark-rapids/issues/4283)|[FEA] Implement regexp_extract_all on GPU for idx > 0| -|[#4353](https://github.com/NVIDIA/spark-rapids/issues/4353)|[FEA] Implement regexp_extract_all on GPU for idx = 0| -|[#5813](https://github.com/NVIDIA/spark-rapids/issues/5813)|[FEA] Set sql.json.read.double.enabled and sql.csv.read.double.enabled to `true` by default| -|[#4720](https://github.com/NVIDIA/spark-rapids/issues/4720)|[FEA] GpuStringSplit: Add support for limit = 0 and limit =1| -|[#5953](https://github.com/NVIDIA/spark-rapids/issues/5953)|[FEA] Support Rocky Linux release| -|[#5204](https://github.com/NVIDIA/spark-rapids/issues/5204)|[FEA] Support Key vectors for `GetMapValue` and `ElementAt` for maps.| -|[#4323](https://github.com/NVIDIA/spark-rapids/issues/4323)|[FEA] Profiling tool add option to filter based on filesystem date| -|[#5846](https://github.com/NVIDIA/spark-rapids/issues/5846)|[FEA] Support null characters in regular expressions| -|[#5904](https://github.com/NVIDIA/spark-rapids/issues/5904)|[FEA] Add support for negated POSIX character classes in regular expressions| -|[#5702](https://github.com/NVIDIA/spark-rapids/issues/5702)|[FEA] Set spark.rapids.sql.explain=NOT_ON_GPU by default| -|[#5867](https://github.com/NVIDIA/spark-rapids/issues/5867)|[FEA] Add shim for Spark 3.3.1| -|[#5628](https://github.com/NVIDIA/spark-rapids/issues/5628)|[FEA] Enable Application detailed view in Qualification UI| -|[#5831](https://github.com/NVIDIA/spark-rapids/issues/5831)|[FEA] Update default speedup factors used for qualification tool| -|[#4519](https://github.com/NVIDIA/spark-rapids/issues/4519)|[FEA] Add regular expression support for Form Feed, Alert, and Escape control characters| -|[#4040](https://github.com/NVIDIA/spark-rapids/issues/4040)|[FEA] Support spark.sql.parquet.binaryAsString=true| -|[#5797](https://github.com/NVIDIA/spark-rapids/issues/5797)|[FEA] Support RoundCeil and RoundFloor when scale is zero| -|[#4468](https://github.com/NVIDIA/spark-rapids/issues/4468)|[FEA] Support repetition quantifiers `?` and `*` with regexp_replace| -|[#5679](https://github.com/NVIDIA/spark-rapids/issues/5679)|[FEA] Support MMyyyy date/timestamp format| -|[#4413](https://github.com/NVIDIA/spark-rapids/issues/4413)|[FEA] Add support for POSIX characters in regular expressions| -|[#4289](https://github.com/NVIDIA/spark-rapids/issues/4289)|[FEA] Regexp: Add support for word and non-word boundaries in regexp pattern| -|[#4517](https://github.com/NVIDIA/spark-rapids/issues/4517)|[FEA] Add support for word boundaries `\b` and `\B` in regular expressions| - -### Performance -||| -|:---|:---| -|[#6060](https://github.com/NVIDIA/spark-rapids/issues/6060)|[FEA] Add experimental multi-threaded BypassMergeSortShuffleWriter| -|[#5453](https://github.com/NVIDIA/spark-rapids/issues/5453)|[FEA] Support runtime filters for BatchScanExec| -|[#5075](https://github.com/NVIDIA/spark-rapids/issues/5075)|Performance can be very slow when reading just a few columns out of many on parquet| -|[#5624](https://github.com/NVIDIA/spark-rapids/issues/5624)|[FEA] Let CPU handle Delta table's metadata related queries| -|[#4837](https://github.com/NVIDIA/spark-rapids/issues/4837)|[FEA] Optimize JSON reading of floating-point values| - -### Bugs Fixed -||| -|:---|:---| -|[#6112](https://github.com/NVIDIA/spark-rapids/issues/6112)|[BUG] UCX ubuntu dockerfile build failed| -|[#6281](https://github.com/NVIDIA/spark-rapids/issues/6281)|[BUG] Reading binary columns from nested types does not work.| -|[#6282](https://github.com/NVIDIA/spark-rapids/issues/6282)|[BUG] Missing CPU fallback for GetMapValue on scalar map, vector key| -|[#6208](https://github.com/NVIDIA/spark-rapids/issues/6208)|[BUG] test_array_intersect failed in databricks 10.4 runtime and Spark 3.3+| -|[#6249](https://github.com/NVIDIA/spark-rapids/issues/6249)|[BUG] test_array_union_before_spark313 failed in UCX job| -|[#6232](https://github.com/NVIDIA/spark-rapids/issues/6232)|[BUG] Query failed with java.lang.NullPointerException when doing GpuSubqueryBroadcastExec| -|[#6230](https://github.com/NVIDIA/spark-rapids/issues/6230)|[BUG] AQE does not respect `entirePlanWillNotWork`| -|[#6131](https://github.com/NVIDIA/spark-rapids/issues/6131)|[BUG] count() in avro failed when reader_types is coalescing| -|[#6220](https://github.com/NVIDIA/spark-rapids/issues/6220)|[BUG] Host buffer leak occurred when executing `count` with Avro multi-threaded reader | -|[#6160](https://github.com/NVIDIA/spark-rapids/issues/6160)|[BUG] When Hive table's actual data has varchar, but the DDL is string, then query fails to do varchar to string conversion| -|[#6183](https://github.com/NVIDIA/spark-rapids/issues/6183)|[BUG] Qualification UI uses single precision floating point| -|[#6005](https://github.com/NVIDIA/spark-rapids/issues/6005)|[BUG] When old Hive partition has different schema than new partition& Hive Schema, read old partition fails with "Found no metadata for schema index"| -|[#6158](https://github.com/NVIDIA/spark-rapids/issues/6158)|[BUG] AQE being used on Databricks even when its disabled| -|[#6179](https://github.com/NVIDIA/spark-rapids/issues/6179)|[BUG] Qualfication tool per sql output --num-output-rows option broken| -|[#6157](https://github.com/NVIDIA/spark-rapids/issues/6157)|[BUG] Pandas UDF hang in Databricks| -|[#6167](https://github.com/NVIDIA/spark-rapids/issues/6167)|[BUG] iceberg_test failed in nightly| -|[#6128](https://github.com/NVIDIA/spark-rapids/issues/6128)|[BUG] Can not ansi cast decimal type to long type while fetching decimal column from data table| -|[#6029](https://github.com/NVIDIA/spark-rapids/issues/6029)|[BUG] Query failed if reading a Hive partition table with partition key column is a Boolean data type, and if spark.rapids.alluxio.pathsToReplace is set| -|[#6054](https://github.com/NVIDIA/spark-rapids/issues/6054)|[BUG] Test Parquet nested unsigned int: uint8, uint16, uint32 FAILED in spark 320+| -|[#6086](https://github.com/NVIDIA/spark-rapids/issues/6086)|[BUG] `checkValue` does not work in `RapidsConf`| -|[#6127](https://github.com/NVIDIA/spark-rapids/issues/6127)|[BUG] regex_test failed in nightly| -|[#6026](https://github.com/NVIDIA/spark-rapids/issues/6026)|[BUG] Failed to cast value `false` to `BooleanType` for partition column `k1`| -|[#5984](https://github.com/NVIDIA/spark-rapids/issues/5984)|[BUG] DATABRICKS: NullPointerException: format is null in 22.08 (works fine with 22.06)| -|[#6089](https://github.com/NVIDIA/spark-rapids/issues/6089)|[BUG] orc_test is failing on Spark 3.2+| -|[#5892](https://github.com/NVIDIA/spark-rapids/issues/5892)|[BUG] When using Alluxio+Spark RAPIDS, if the S3 bucket is not mounted, then query will return nothing| -|[#6056](https://github.com/NVIDIA/spark-rapids/issues/6056)|[BUG] zstd integration tests failed for orc on Cloudera| -|[#5957](https://github.com/NVIDIA/spark-rapids/issues/5957)|[BUG] Exception calling `collect()` when partitioning using with arrays with null values using `array_union(...)`| -|[#6017](https://github.com/NVIDIA/spark-rapids/issues/6017)|[BUG] test_parquet_read_round_trip hanging forever in spark 32x standalone mode| -|[#6035](https://github.com/NVIDIA/spark-rapids/issues/6035)|[BUG] cache tests throws ClassCastException on Databricks| -|[#6032](https://github.com/NVIDIA/spark-rapids/issues/6032)|[BUG] Part of the plan is not columnar class org.apache.spark.sql.execution.ProjectExec failure| -|[#6028](https://github.com/NVIDIA/spark-rapids/issues/6028)|[BUG] regexp_test is failing in nightly tests| -|[#3677](https://github.com/NVIDIA/spark-rapids/issues/3677)|[BUG] PCBS does not fully follow the pattern for public classes| -|[#6022](https://github.com/NVIDIA/spark-rapids/issues/6022)|[BUG] test_iceberg_fallback_not_unsafe_row failed in databricks 10.4 runtime| -|[#109](https://github.com/NVIDIA/spark-rapids/issues/109)|[BUG] GPU degreees function does not overflow| -|[#5959](https://github.com/NVIDIA/spark-rapids/issues/5959)|[BUG] test_parquet_read_encryption fails| -|[#5493](https://github.com/NVIDIA/spark-rapids/issues/5493)|[BUG] test_parquet_read_merge_schema failed w/ TITAN V| -|[#5521](https://github.com/NVIDIA/spark-rapids/issues/5521)|[BUG] Investigate regexp failures with unicode input| -|[#5629](https://github.com/NVIDIA/spark-rapids/issues/5629)|[BUG] regexp unicode tests require LANG=en_US.UTF-8 to pass| -|[#5448](https://github.com/NVIDIA/spark-rapids/issues/5448)|[BUG] partitioned writes require single batches and sorting, causing gpu OOM in some cases| -|[#6003](https://github.com/NVIDIA/spark-rapids/issues/6003)|[BUG] join_test failed in integration tests| -|[#5979](https://github.com/NVIDIA/spark-rapids/issues/5979)|[BUG] executors shutdown intermittently during integrations test parallel run| -|[#5948](https://github.com/NVIDIA/spark-rapids/issues/5948)|[BUG] GPU ORC reading fails when positional schema is enabled and more columns are required.| -|[#5909](https://github.com/NVIDIA/spark-rapids/issues/5909)|[BUG] Null characters do not work in regular expression character classes| -|[#5956](https://github.com/NVIDIA/spark-rapids/issues/5956)|[BUG] Warnings in build for GpuRegExpUtils with group_index| -|[#4676](https://github.com/NVIDIA/spark-rapids/issues/4676)|[BUG] Research associating MemoryCleaner to Spark's ShutdownHookManager| -|[#5854](https://github.com/NVIDIA/spark-rapids/issues/5854)|[BUG] Memory leaked in some test cases| -|[#5937](https://github.com/NVIDIA/spark-rapids/issues/5937)|[BUG] test_get_map_value_string_col_keys_ansi_fail in databricks321 runtime| -|[#5891](https://github.com/NVIDIA/spark-rapids/issues/5891)|[BUG] GpuShuffleCoalesce op time metric doesn't include concat batch time| -|[#5896](https://github.com/NVIDIA/spark-rapids/issues/5896)|[BUG] Profiling tool on taking a really long time for integration tests| -|[#5939](https://github.com/NVIDIA/spark-rapids/issues/5939)|[BUG] Qualification tool UI. Read Schema column is broken| -|[#5711](https://github.com/NVIDIA/spark-rapids/issues/5711)|[BUG] regexp: Build fails on CI when more characters added to fuzzer but not locally| -|[#5929](https://github.com/NVIDIA/spark-rapids/issues/5929)|[BUG] test_sorted_groupby_first_last failed in nightly tests| -|[#5914](https://github.com/NVIDIA/spark-rapids/issues/5914)|[BUG] test_parquet_compress_read_round_trip tests failed in spark320+| -|[#5859](https://github.com/NVIDIA/spark-rapids/issues/5859)|[BUG] Qualification tools csv order is not in sync| -|[#5648](https://github.com/NVIDIA/spark-rapids/issues/5648)|[BUG] compile-time references to classes potentially unavailable at run time| -|[#5838](https://github.com/NVIDIA/spark-rapids/issues/5838)|[BUG] Qualification ui output goes to wrong folder| -|[#5855](https://github.com/NVIDIA/spark-rapids/issues/5855)|[BUG] MortgageSparkSuite.scala set spark.rapids.sql.explain as true, which is invalid| -|[#5630](https://github.com/NVIDIA/spark-rapids/issues/5630)|[BUG] Qualification UI cannot render long strings| -|[#5732](https://github.com/NVIDIA/spark-rapids/issues/5732)|[BUG] fix estimated speed-up for not-applicable apps in Qualification results| -|[#5788](https://github.com/NVIDIA/spark-rapids/issues/5788)|[BUG] Qualification UI Sanitize template content| -|[#5836](https://github.com/NVIDIA/spark-rapids/issues/5836)|[BUG] string_test.py::test_re_replace_repetition failed IT | -|[#5837](https://github.com/NVIDIA/spark-rapids/issues/5837)|[BUG] test_parquet_read_round_trip_binary_as_string failures on YARN and Dataproc| -|[#5726](https://github.com/NVIDIA/spark-rapids/issues/5726)|[BUG] CastChecks.sparkIntegralSig has BINARY in it twice| -|[#5775](https://github.com/NVIDIA/spark-rapids/issues/5775)|[BUG] TimestampSuite is run on Spark 3.3.0 only| -|[#5678](https://github.com/NVIDIA/spark-rapids/issues/5678)|[BUG] Inconsistency between the time zone in the fallback reason and the actual time zone checked in RapidsMeta.checkTImeZoneId| -|[#5688](https://github.com/NVIDIA/spark-rapids/issues/5688)|[BUG] AnsiCast is merged into Cast in Spark 340, failing the 340 build| -|[#5480](https://github.com/NVIDIA/spark-rapids/issues/5480)|[BUG] Some arithmetic tests are failing on Spark 3.4.0| -|[#5777](https://github.com/NVIDIA/spark-rapids/issues/5777)|[BUG] repeated runs of `mvn package` without `clean` lead to missing spark-rapids-jni-version-info.properties in dist jar| -|[#5456](https://github.com/NVIDIA/spark-rapids/issues/5456)|[BUG] Handle regexp_replace inconsistency from https://issues.apache.org/jira/browse/SPARK-39107| -|[#5683](https://github.com/NVIDIA/spark-rapids/issues/5683)|[BUG] test_cast_neg_to_decimal_err failed in recent 22.08 tests| -|[#5525](https://github.com/NVIDIA/spark-rapids/issues/5525)|[BUG] Investigate more edge cases in regexp support| -|[#5744](https://github.com/NVIDIA/spark-rapids/issues/5744)|[BUG] Compile failure with Spark 3.2.2| -|[#5707](https://github.com/NVIDIA/spark-rapids/issues/5707)|[BUG] Fix shim-related bugs | - -### PRs -||| -|:---|:---| -|[#6376](https://github.com/NVIDIA/spark-rapids/pull/6376)|Update 22.08 changelog to latest| -|[#6367](https://github.com/NVIDIA/spark-rapids/pull/6367)|Revert "Enable Strings as a supported type for GpuColumnarToRow transitions"| -|[#6354](https://github.com/NVIDIA/spark-rapids/pull/6354)|Update 22.08 changelog to latest [skip ci]| -|[#6348](https://github.com/NVIDIA/spark-rapids/pull/6348)|Update plugin jni version to released 22.08.0| -|[#6234](https://github.com/NVIDIA/spark-rapids/pull/6234)|[Doc] Add 22.08 docs' links [skip ci]| -|[#6288](https://github.com/NVIDIA/spark-rapids/pull/6288)|CPU fallback for Map scalars with key vectors| -|[#6292](https://github.com/NVIDIA/spark-rapids/pull/6292)|Fix parquet binary reads to do the transformation in the plugin| -|[#6257](https://github.com/NVIDIA/spark-rapids/pull/6257)|Fallback to CPU for Parquet reads with `_databricks_internal` columns| -|[#6274](https://github.com/NVIDIA/spark-rapids/pull/6274)|Use schema instead of row field count during columnar conversion| -|[#6268](https://github.com/NVIDIA/spark-rapids/pull/6268)|Apply BroadcastMode key projections before interpreting key expressions in subqueries| -|[#6250](https://github.com/NVIDIA/spark-rapids/pull/6250)|Fix bug where AQE does not respect `entirePlanWillNotWork`| -|[#6248](https://github.com/NVIDIA/spark-rapids/pull/6248)|Fix some issues with reading binary from parquet| -|[#6239](https://github.com/NVIDIA/spark-rapids/pull/6239)|Add rocky Dockerfiles and refine docker documentation| -|[#6079](https://github.com/NVIDIA/spark-rapids/pull/6079)|Add support for nested types to `collect_set(...)` on the GPU| -|[#6215](https://github.com/NVIDIA/spark-rapids/pull/6215)|Update Spark2 Explain API code for 22.08| -|[#6161](https://github.com/NVIDIA/spark-rapids/pull/6161)|Added binary read support for Parquet [Databricks]| -|[#6222](https://github.com/NVIDIA/spark-rapids/pull/6222)|Init 22.08 changelog [skip ci]| -|[#6225](https://github.com/NVIDIA/spark-rapids/pull/6225)|Fix count() in avro failed when reader_types is coalescing| -|[#6216](https://github.com/NVIDIA/spark-rapids/pull/6216)|[Doc] Update 22.08 documentation| -|[#6223](https://github.com/NVIDIA/spark-rapids/pull/6223)|Temporary fix for test_array_intersect failures on Spark 3.3.0| -|[#6221](https://github.com/NVIDIA/spark-rapids/pull/6221)|Release host buffers when Avro read schema is empty| -|[#6132](https://github.com/NVIDIA/spark-rapids/pull/6132)|[DOC]update outofdate mortgage notebooks and update docs for xgboost161 jar[skip ci]| -|[#6188](https://github.com/NVIDIA/spark-rapids/pull/6188)|Allow ORC conversion from VARCHAR to STRING| -|[#6013](https://github.com/NVIDIA/spark-rapids/pull/6013)|Add fixed issues to regex fuzzer| -|[#5958](https://github.com/NVIDIA/spark-rapids/pull/5958)|Add set based operations for arrays: `array_intersect`, `array_union`, `array_except`, and `arrays_overlap` for running on GPU| -|[#6189](https://github.com/NVIDIA/spark-rapids/pull/6189)|Qualification UI change floating precision [skip ci]| -|[#6063](https://github.com/NVIDIA/spark-rapids/pull/6063)|Fix Parquet schema evolution when missing column is in a nested type| -|[#6159](https://github.com/NVIDIA/spark-rapids/pull/6159)|Workaround for Databricks using AQE even when disabled| -|[#6181](https://github.com/NVIDIA/spark-rapids/pull/6181)|Fix the qualification tool per sql number output rows option| -|[#6166](https://github.com/NVIDIA/spark-rapids/pull/6166)|Update the configs used to choose the Python runner for flat-map Pandas UDF| -|[#6169](https://github.com/NVIDIA/spark-rapids/pull/6169)|Fix IcebergProvider classname in unshim exceptions| -|[#6103](https://github.com/NVIDIA/spark-rapids/pull/6103)|Fix crash when casting decimals to long| -|[#6071](https://github.com/NVIDIA/spark-rapids/pull/6071)|Update `test_add_overflow_with_ansi_enabled` and `test_subtraction_overflow_with_ansi_enabled` to check the exception type for Integral case.| -|[#6136](https://github.com/NVIDIA/spark-rapids/pull/6136)|Fix Alluxio inferring partitions for BooleanType with Hive| -|[#6027](https://github.com/NVIDIA/spark-rapids/pull/6027)|Re-enable "transpile complex regex 2" scala test| -|[#6140](https://github.com/NVIDIA/spark-rapids/pull/6140)|Update profile names in unit tests docs [skip ci]| -|[#6141](https://github.com/NVIDIA/spark-rapids/pull/6141)|Fixes threaded shuffle writer test mocks for spark 3.3.0+| -|[#6147](https://github.com/NVIDIA/spark-rapids/pull/6147)|Revert "Temporarily disable Parquet unsigned int test in ParquetScanS…| -|[#6133](https://github.com/NVIDIA/spark-rapids/pull/6133)|[DOC]update getting started guide doc for aws-emr670 release[skip ci]| -|[#6007](https://github.com/NVIDIA/spark-rapids/pull/6007)|Add doc for parsing expressions in qualification tool [skip ci]| -|[#6125](https://github.com/NVIDIA/spark-rapids/pull/6125)|Add SQL table to Qualification's app-details view [skip ci]| -|[#6116](https://github.com/NVIDIA/spark-rapids/pull/6116)|Fix: check validity before setting the default value| -|[#6120](https://github.com/NVIDIA/spark-rapids/pull/6120)|Qualification Tool add test for SQL Description escaping commas for csv| -|[#6106](https://github.com/NVIDIA/spark-rapids/pull/6106)|Qualification tool: Parse expressions in WindowExec| -|[#6040](https://github.com/NVIDIA/spark-rapids/pull/6040)|Enable anchors in regexp string split| -|[#6052](https://github.com/NVIDIA/spark-rapids/pull/6052)|Multi-threaded shuffle writer for RapidsShuffleManager| -|[#5998](https://github.com/NVIDIA/spark-rapids/pull/5998)|Enable Strings as a supported type for GpuColumnarToRow transitions| -|[#6092](https://github.com/NVIDIA/spark-rapids/pull/6092)|Qualification tool output recommendations on a per sql query basis| -|[#6104](https://github.com/NVIDIA/spark-rapids/pull/6104)|Revert to only supporting Apache Iceberg 0.13.x| -|[#6111](https://github.com/NVIDIA/spark-rapids/pull/6111)|Fix missed gnupg2 in ucx example dockerfiles [skip ci]| -|[#6107](https://github.com/NVIDIA/spark-rapids/pull/6107)|Disable snapshot shims build in 22.08| -|[#6016](https://github.com/NVIDIA/spark-rapids/pull/6016)|Automatically adjust `spark.rapids.sql.multiThreadedRead.numThreads` to the same as `spark.executor.cores`| -|[#6098](https://github.com/NVIDIA/spark-rapids/pull/6098)|Support Apache Iceberg 0.14.0| -|[#6097](https://github.com/NVIDIA/spark-rapids/pull/6097)|Fix 3.3 shim to include castTo handling AnyTimestampType and minor spacing| -|[#6057](https://github.com/NVIDIA/spark-rapids/pull/6057)|Tag `GpuWindow` child expressions for GPU execution| -|[#6090](https://github.com/NVIDIA/spark-rapids/pull/6090)|Add missing is_spark_321cdh import in orc_test| -|[#6048](https://github.com/NVIDIA/spark-rapids/pull/6048)|Port whole parsePartitions method from Spark3.3 to Gpu side| -|[#5941](https://github.com/NVIDIA/spark-rapids/pull/5941)|GPU accelerate Apache Iceberg reads| -|[#5925](https://github.com/NVIDIA/spark-rapids/pull/5925)|Add Alluxio auto mount feature| -|[#6004](https://github.com/NVIDIA/spark-rapids/pull/6004)|Check the existence of alluxio path| -|[#6082](https://github.com/NVIDIA/spark-rapids/pull/6082)|Enable auto-merge from branch-22.08 to branch-22.10 [skip ci]| -|[#6058](https://github.com/NVIDIA/spark-rapids/pull/6058)|Disable zstd orc tests in cdh| -|[#6078](https://github.com/NVIDIA/spark-rapids/pull/6078)|Temporarily disable Parquet unsigned int test in ParquetScanSuite| -|[#6049](https://github.com/NVIDIA/spark-rapids/pull/6049)|Fix test hang caused by parquet hadoop test jar log4j file| -|[#6042](https://github.com/NVIDIA/spark-rapids/pull/6042)|Qualification tool: Parse expressions in Aggregates and Sort execs.| -|[#6041](https://github.com/NVIDIA/spark-rapids/pull/6041)|Improve check for UTF-8 in integration tests by testing from the JVM| -|[#5970](https://github.com/NVIDIA/spark-rapids/pull/5970)|Address feedback in "Improve regular expression error messages" PR| -|[#6000](https://github.com/NVIDIA/spark-rapids/pull/6000)|Support nth_value, first and last in window context| -|[#6031](https://github.com/NVIDIA/spark-rapids/pull/6031)|Update spark322shim dependency to released lib| -|[#6033](https://github.com/NVIDIA/spark-rapids/pull/6033)|Refactor: Fix PCBS does not fully follow the pattern for public classes| -|[#6019](https://github.com/NVIDIA/spark-rapids/pull/6019)|Update the interval division to throw same type exceptions as Spark| -|[#6030](https://github.com/NVIDIA/spark-rapids/pull/6030)|Cleans up some of the redundant code in proxy/internal RAPIDS Shuffle Managers| -|[#5988](https://github.com/NVIDIA/spark-rapids/pull/5988)|[FEA] Add a progress bar in Qualification tool when it is running| -|[#6020](https://github.com/NVIDIA/spark-rapids/pull/6020)|Unify test modes in databricks test script| -|[#6025](https://github.com/NVIDIA/spark-rapids/pull/6025)|Skip Iceberg tests on Databricks| -|[#5983](https://github.com/NVIDIA/spark-rapids/pull/5983)|Adding AUTO native parquet support and legacy tests| -|[#6010](https://github.com/NVIDIA/spark-rapids/pull/6010)|Update docs to better explain limitations of Dataset support| -|[#5996](https://github.com/NVIDIA/spark-rapids/pull/5996)|Fix GPU degrees function does not overflow| -|[#5994](https://github.com/NVIDIA/spark-rapids/pull/5994)|Skip Parquet encryption read tests if Parquet version is less than 1.12| -|[#5776](https://github.com/NVIDIA/spark-rapids/pull/5776)|Enable regular expression support based on whether UTF-8 is in the current locale| -|[#6009](https://github.com/NVIDIA/spark-rapids/pull/6009)|Fix issue where spark-tests was producing an unintended error code| -|[#5903](https://github.com/NVIDIA/spark-rapids/pull/5903)|Avoid requiring single batch when using out-of-core sort| -|[#6008](https://github.com/NVIDIA/spark-rapids/pull/6008)|Rename test modes in spark-tests.sh [skip ci]| -|[#5991](https://github.com/NVIDIA/spark-rapids/pull/5991)|Enable zstd integration tests for parquet and orc| -|[#5997](https://github.com/NVIDIA/spark-rapids/pull/5997)|support testing parquet encryption| -|[#5968](https://github.com/NVIDIA/spark-rapids/pull/5968)|Add support for regexp_extract_all on GPU| -|[#5995](https://github.com/NVIDIA/spark-rapids/pull/5995)|Fix a minor potential issue when rebatching for GpuArrowEvalPythonExec| -|[#5960](https://github.com/NVIDIA/spark-rapids/pull/5960)|Set up the framework of type casting for ORC reading| -|[#5987](https://github.com/NVIDIA/spark-rapids/pull/5987)|Document how to check if finalized plan on GPU from user code / REPLs [skip ci]| -|[#5982](https://github.com/NVIDIA/spark-rapids/pull/5982)|Use the new native parquet footer API instead of the old one| -|[#5972](https://github.com/NVIDIA/spark-rapids/pull/5972)|[DOC] add app-details to qualification tools doc [skip ci]| -|[#5976](https://github.com/NVIDIA/spark-rapids/pull/5976)|Enable null in regex character classes| -|[#5974](https://github.com/NVIDIA/spark-rapids/pull/5974)|Remove scaladoc warning | -|[#5912](https://github.com/NVIDIA/spark-rapids/pull/5912)|Fall back to CPU for Delta Lake metadata queries| -|[#5955](https://github.com/NVIDIA/spark-rapids/pull/5955)|Fix fake memory leaks in some test cases| -|[#5915](https://github.com/NVIDIA/spark-rapids/pull/5915)|Make the error message of changing decimal type the same as Spark's| -|[#5971](https://github.com/NVIDIA/spark-rapids/pull/5971)|Append new authorized user to blossom-ci whitelist [skip ci]| -|[#5967](https://github.com/NVIDIA/spark-rapids/pull/5967)|[Doc]In Databricks doc, disable DPP config[skip ci]| -|[#5871](https://github.com/NVIDIA/spark-rapids/pull/5871)|Improve regular expression error messages| -|[#5952](https://github.com/NVIDIA/spark-rapids/pull/5952)|Qualification tool: Parse expressions in ProjectExec| -|[#5961](https://github.com/NVIDIA/spark-rapids/pull/5961)|Don't set spark.sql.ansi.strictIndexOperator to false for array subscript test| -|[#5935](https://github.com/NVIDIA/spark-rapids/pull/5935)|Enable reading double values on GPU when reading CSV and JSON| -|[#5950](https://github.com/NVIDIA/spark-rapids/pull/5950)|Fix GpuShuffleCoalesce op time metric doesn't include concat batch time| -|[#5932](https://github.com/NVIDIA/spark-rapids/pull/5932)|Add string split support for limit = 0 and limit =1 | -|[#5951](https://github.com/NVIDIA/spark-rapids/pull/5951)|Fix issue with Profiling tool taking a long time due to finding stage ids that maps to sql nodes| -|[#5954](https://github.com/NVIDIA/spark-rapids/pull/5954)|Add IT dockerfile for rockylinux8 [skip ci]| -|[#5949](https://github.com/NVIDIA/spark-rapids/pull/5949)|Update `GpuAdd` and `GpuSubtract` to throw same type exception as Spark| -|[#5878](https://github.com/NVIDIA/spark-rapids/pull/5878)|Fix misleading documentation for `approx_percentile` and some other functions| -|[#5913](https://github.com/NVIDIA/spark-rapids/pull/5913)|Update gcp cluster init option [skip ci]| -|[#5940](https://github.com/NVIDIA/spark-rapids/pull/5940)|Qualification tool UI. fix Read-Schema column broken [skip ci]| -|[#5938](https://github.com/NVIDIA/spark-rapids/pull/5938)|Fix leaks in the test cases of CachedBatchWriterSuite| -|[#5934](https://github.com/NVIDIA/spark-rapids/pull/5934)|Add underscore to regexp fuzzer| -|[#5936](https://github.com/NVIDIA/spark-rapids/pull/5936)|[BUG] Fix databricks test report location| -|[#5883](https://github.com/NVIDIA/spark-rapids/pull/5883)|Add support for `element_at` and `GetMapValue`| -|[#5918](https://github.com/NVIDIA/spark-rapids/pull/5918)|Filter profiling tool based on start time. | -|[#5926](https://github.com/NVIDIA/spark-rapids/pull/5926)|Collect databricks test report| -|[#5924](https://github.com/NVIDIA/spark-rapids/pull/5924)|Changes made to the Audit process for prioritizing the commits [skip-ci]| -|[#5834](https://github.com/NVIDIA/spark-rapids/pull/5834)|Add support for null characters in regular expressions| -|[#5930](https://github.com/NVIDIA/spark-rapids/pull/5930)|Make first/last test for sorted deterministic| -|[#5917](https://github.com/NVIDIA/spark-rapids/pull/5917)|Improve sort removal heuristic for sort aggregate| -|[#5916](https://github.com/NVIDIA/spark-rapids/pull/5916)|Revert "Enable testing zstd for spark releases 3.2.0 and later (#5898)"| -|[#5686](https://github.com/NVIDIA/spark-rapids/pull/5686)|Add `GpuMapConcat` support for nested-type values| -|[#5905](https://github.com/NVIDIA/spark-rapids/pull/5905)|Add support for negated POSIX character classes `\P`| -|[#5898](https://github.com/NVIDIA/spark-rapids/pull/5898)|Enable testing parquet with zstd for spark releases 3.2.0 and later| -|[#5900](https://github.com/NVIDIA/spark-rapids/pull/5900)|Optimize some common if/else cases| -|[#5869](https://github.com/NVIDIA/spark-rapids/pull/5869)|Qualification: fix sorting and add unit-tests script| -|[#5819](https://github.com/NVIDIA/spark-rapids/pull/5819)|Modify the default value of spark.rapids.sql.explain as NOT_ON_GPU| -|[#5723](https://github.com/NVIDIA/spark-rapids/pull/5723)|Dynamically load hive and avro using reflection to avoid potential class not found exception| -|[#5886](https://github.com/NVIDIA/spark-rapids/pull/5886)|Avoid serializing plan in GpuCoalesceBatches, GpuHashAggregateExec, and GpuTopN| -|[#5897](https://github.com/NVIDIA/spark-rapids/pull/5897)|GpuBatchScanExec partitions should be marked transient| -|[#5894](https://github.com/NVIDIA/spark-rapids/pull/5894)|[Doc]fix a typo with double "("[skip ci] | -|[#5880](https://github.com/NVIDIA/spark-rapids/pull/5880)|Qualification tool: Parse expressions in FilterExec| -|[#5885](https://github.com/NVIDIA/spark-rapids/pull/5885)|[Doc] Fix alluxio doc link issue[skip ci]| -|[#5879](https://github.com/NVIDIA/spark-rapids/pull/5879)|Avoid duplicate sanitization step when reading JSON floats| -|[#5877](https://github.com/NVIDIA/spark-rapids/pull/5877)|Add Apache Spark 3.3.1-SNAPSHOT Shims| -|[#5783](https://github.com/NVIDIA/spark-rapids/pull/5783)|`assertMinValueOverflow` should throw same type of exception as Spark| -|[#5875](https://github.com/NVIDIA/spark-rapids/pull/5875)|Qualification ui output goes to wrong folder| -|[#5870](https://github.com/NVIDIA/spark-rapids/pull/5870)|Use a common thread pool across formats for multithreaded reads| -|[#5868](https://github.com/NVIDIA/spark-rapids/pull/5868)|Profiling tool add wholestagecodegen to execs mapping, sql to stage info and job end time| -|[#5873](https://github.com/NVIDIA/spark-rapids/pull/5873)|Correct the value of spark.rapids.sql.explain| -|[#5695](https://github.com/NVIDIA/spark-rapids/pull/5695)|Verify DPP over LIKE ANY/ALL expression| -|[#5856](https://github.com/NVIDIA/spark-rapids/pull/5856)|Update unit test doc| -|[#5866](https://github.com/NVIDIA/spark-rapids/pull/5866)|Fix CsvScanForIntervalSuite leak issues| -|[#5810](https://github.com/NVIDIA/spark-rapids/pull/5810)|Qualification UI - add application details view| -|[#5860](https://github.com/NVIDIA/spark-rapids/pull/5860)|[Doc]Add Spark3.3 support in doc[skip ci]| -|[#5858](https://github.com/NVIDIA/spark-rapids/pull/5858)|Remove SNAPSHOT support from Spark 3.3.0 shim| -|[#5857](https://github.com/NVIDIA/spark-rapids/pull/5857)|Remove user sperlingxx[skip ci]| -|[#5841](https://github.com/NVIDIA/spark-rapids/pull/5841)|Enable regexp empty string short circuit on shim version 3.1.3| -|[#5853](https://github.com/NVIDIA/spark-rapids/pull/5853)|Fix auto merge conflict 5850| -|[#5845](https://github.com/NVIDIA/spark-rapids/pull/5845)|Update Parquet binaryAsString integration to use a static parquet file| -|[#5842](https://github.com/NVIDIA/spark-rapids/pull/5842)|Update default speedup factors for qualification tool| -|[#5829](https://github.com/NVIDIA/spark-rapids/pull/5829)|Add regexp support for Alert, and Escape control characters| -|[#5833](https://github.com/NVIDIA/spark-rapids/pull/5833)|Add test for GpuCast canonicalization with timezone | -|[#5822](https://github.com/NVIDIA/spark-rapids/pull/5822)|Configure log4j version 2.x for test cases| -|[#5830](https://github.com/NVIDIA/spark-rapids/pull/5830)|Enable the `spark.sql.parquet.binaryAsString=true` configuration option on the GPU| -|[#5805](https://github.com/NVIDIA/spark-rapids/pull/5805)|[Issue 5726] Removing duplicate BINARY keyword| -|[#5828](https://github.com/NVIDIA/spark-rapids/pull/5828)|Update tools module to latest Hadoop version| -|[#5809](https://github.com/NVIDIA/spark-rapids/pull/5809)|Disable Spark 3.4.0 premerge for 22.08 and enable for 22.10 | -|[#5767](https://github.com/NVIDIA/spark-rapids/pull/5767)|Fix the time zone check issue| -|[#5814](https://github.com/NVIDIA/spark-rapids/pull/5814)|Fix auto merge conflict 5812 [skip ci]| -|[#5804](https://github.com/NVIDIA/spark-rapids/pull/5804)|Support RoundCeil and RoundFloor when scale is zero| -|[#5696](https://github.com/NVIDIA/spark-rapids/pull/5696)|Support Parquet field IDs| -|[#5749](https://github.com/NVIDIA/spark-rapids/pull/5749)|Add shims for `AnsiCast`| -|[#5780](https://github.com/NVIDIA/spark-rapids/pull/5780)|Append new authorized user to blossom-ci whitelist [skip ci]| -|[#5350](https://github.com/NVIDIA/spark-rapids/pull/5350)|Halt Spark executor when encountering unrecoverable CUDA errors| -|[#5779](https://github.com/NVIDIA/spark-rapids/pull/5779)|Fix repeated runs mvn package without clean lead to missing spark-rapids spark-rapids-jni-version-info.properties in dist jar| -|[#5800](https://github.com/NVIDIA/spark-rapids/pull/5800)|Fix auto merge conflict 5799| -|[#5794](https://github.com/NVIDIA/spark-rapids/pull/5794)|Fix auto merge conflict 5789| -|[#5740](https://github.com/NVIDIA/spark-rapids/pull/5740)|Handle regexp_replace inconsistency with empty strings and zero-repetition patterns| -|[#5790](https://github.com/NVIDIA/spark-rapids/pull/5790)|Fix auto merge conflict 5789| -|[#5690](https://github.com/NVIDIA/spark-rapids/pull/5690)|Update the error checking of `test_cast_neg_to_decimal_err`| -|[#5774](https://github.com/NVIDIA/spark-rapids/pull/5774)|Fix merge conflict with branch-22.06| -|[#5768](https://github.com/NVIDIA/spark-rapids/pull/5768)|Support MMyyyy date/timestamp format| -|[#5692](https://github.com/NVIDIA/spark-rapids/pull/5692)|Add support for POSIX predefined character classes| -|[#5762](https://github.com/NVIDIA/spark-rapids/pull/5762)|Fix auto merge conflict 5759| -|[#5754](https://github.com/NVIDIA/spark-rapids/pull/5754)|Fix auto merge conflict 5752| -|[#5450](https://github.com/NVIDIA/spark-rapids/pull/5450)|Handle `?`, `*`, `{0,}` and `{0,n}` based repetitions in regexp_replace on the GPU| -|[#5479](https://github.com/NVIDIA/spark-rapids/pull/5479)|Add support for word boundaries `\b` and `\B`| -|[#5745](https://github.com/NVIDIA/spark-rapids/pull/5745)|Move `RapidsErrorUtils` to `org.apache.spark.sql.shims` package| -|[#5610](https://github.com/NVIDIA/spark-rapids/pull/5610)|Fall back to CPU for unsupported regular expression edge cases with end of line/string anchors and newlines| -|[#5725](https://github.com/NVIDIA/spark-rapids/pull/5725)|Fix auto merge conflict 5724| -|[#5687](https://github.com/NVIDIA/spark-rapids/pull/5687)|Minor: Clean up GpuConcat| -|[#5710](https://github.com/NVIDIA/spark-rapids/pull/5710)|Fix auto merge conflict 5709| -|[#5708](https://github.com/NVIDIA/spark-rapids/pull/5708)|Fix shim-related bugs| -|[#5700](https://github.com/NVIDIA/spark-rapids/pull/5700)|Fix auto merge conflict 5699| -|[#5675](https://github.com/NVIDIA/spark-rapids/pull/5675)|Update the error messages for the failing arithmetic tests.| -|[#5689](https://github.com/NVIDIA/spark-rapids/pull/5689)|Disable 340 for premerge and nightly| -|[#5603](https://github.com/NVIDIA/spark-rapids/pull/5603)|Skip unshim and dedup of external spark-rapids-jni and jucx| -|[#5472](https://github.com/NVIDIA/spark-rapids/pull/5472)|Add shims for Spark 3.4.0| -|[#5647](https://github.com/NVIDIA/spark-rapids/pull/5647)|Init version 22.08.0-SNAPSHOT| - -## Release 22.06 - -### Features -||| -|:---|:---| -|[#5451](https://github.com/NVIDIA/spark-rapids/issues/5451)|[FEA] Update Spark2 explain code for 22.06| -|[#5261](https://github.com/NVIDIA/spark-rapids/issues/5261)|[FEA] Create MIG with Cgroups on YARN Dataproc scripts| -|[#5476](https://github.com/NVIDIA/spark-rapids/issues/5476)|[FEA] extend concat on arrays to all nested types.| -|[#5113](https://github.com/NVIDIA/spark-rapids/issues/5113)|[FEA] ANSI mode: Support CAST between types| -|[#5112](https://github.com/NVIDIA/spark-rapids/issues/5112)|[FEA] ANSI mode: allow casting between numeric type and timestamp type| -|[#5323](https://github.com/NVIDIA/spark-rapids/issues/5323)|[FEA] Enable floating point by default| -|[#4518](https://github.com/NVIDIA/spark-rapids/issues/4518)|[FEA] Add support for escaped unicode hex in regular expressions| -|[#5405](https://github.com/NVIDIA/spark-rapids/issues/5405)|[FEA] Support map_concat function| -|[#5547](https://github.com/NVIDIA/spark-rapids/issues/5547)|[FEA] Regexp: Can we transpile `\W` and `\D` to Java's definition so we can support on GPU?| -|[#5512](https://github.com/NVIDIA/spark-rapids/issues/5512)|[FEA] Qualification tool, hook up final output and output execs table| -|[#5507](https://github.com/NVIDIA/spark-rapids/issues/5507)|[FEA] Support GpuRaiseError| -|[#5325](https://github.com/NVIDIA/spark-rapids/issues/5325)|[FEA] Support spark.sql.mapKeyDedupPolicy=LAST_WIN for `TransformKeys`| -|[#3682](https://github.com/NVIDIA/spark-rapids/issues/3682)|[FEA] Use conventional jar layout in dist jar if there is only one input shim| -|[#1556](https://github.com/NVIDIA/spark-rapids/issues/1556)|[FEA] Implement ANSI mode tests for string to timestamp functions| -|[#4425](https://github.com/NVIDIA/spark-rapids/issues/4425)|[FEA] Support line anchor `$` and string anchors `\z` and `\Z` in regexp_replace| -|[#5176](https://github.com/NVIDIA/spark-rapids/issues/5176)|[FEA] Qualification tool UI| -|[#5111](https://github.com/NVIDIA/spark-rapids/issues/5111)|[FEA] ANSI mode: CAST between ANSI intervals and IntegralType| -|[#4605](https://github.com/NVIDIA/spark-rapids/issues/4605)|[FEA] Add regular expression support for new character classes introduced in Java 8| -|[#5273](https://github.com/NVIDIA/spark-rapids/issues/5273)|[FEA] Support map_filter| -|[#1557](https://github.com/NVIDIA/spark-rapids/issues/1557)|[FEA] Enable ANSI mode for CAST string to date| -|[#5446](https://github.com/NVIDIA/spark-rapids/issues/5446)|[FEA] Remove hasNans check for array_contains| -|[#5445](https://github.com/NVIDIA/spark-rapids/issues/5445)|[FEA] Support reading Int as Byte/Short/Date from parquet | -|[#5449](https://github.com/NVIDIA/spark-rapids/issues/5449)|[FEA] QualificationTool. Add speedup information to AppSummaryInfo| -|[#5322](https://github.com/NVIDIA/spark-rapids/issues/5322)|[FEA] remove hasNans for Pivot| -|[#4800](https://github.com/NVIDIA/spark-rapids/issues/4800)|[FEA] Enable support for more regular expressions with \A and \Z| -|[#5404](https://github.com/NVIDIA/spark-rapids/issues/5404)|[FEA] Add Shim for the Spark version shipped with Cloudera CDH 7.1.7| -|[#5226](https://github.com/NVIDIA/spark-rapids/issues/5226)|[FEA] Support array_repeat| -|[#5229](https://github.com/NVIDIA/spark-rapids/issues/5229)|[FEA] Support arrays_zip| -|[#5119](https://github.com/NVIDIA/spark-rapids/issues/5119)|[FEA] Support ANSI mode for SQL functions/operators| -|[#4532](https://github.com/NVIDIA/spark-rapids/issues/4532)|[FEA] Re-enable support for `\Z` in regular expressions| -|[#3985](https://github.com/NVIDIA/spark-rapids/issues/3985)|[FEA] UDF-Compiler: Translation of simple predicate UDF should allow predicate pushdown| -|[#5034](https://github.com/NVIDIA/spark-rapids/issues/5034)|[FEA] Implement ExistenceJoin for BroadcastNestedLoopJoin Exec| -|[#4533](https://github.com/NVIDIA/spark-rapids/issues/4533)|[FEA] Re-enable support for `$` in regular expressions| -|[#5263](https://github.com/NVIDIA/spark-rapids/issues/5263)|[FEA] Write out operator mapping from plugin to CSV file for use in qualification tool| -|[#5095](https://github.com/NVIDIA/spark-rapids/issues/5095)|[FEA] Support collect_set on struct in reduction context| -|[#4811](https://github.com/NVIDIA/spark-rapids/issues/4811)|[FEA] Support ANSI intervals for Cast and Sample| -|[#2062](https://github.com/NVIDIA/spark-rapids/issues/2062)|[FEA] support collect aggregations| -|[#5060](https://github.com/NVIDIA/spark-rapids/issues/5060)|[FEA] Support Count on Struct of [ Struct of [String, Map(String,String)], Array(String), Map(String,String) ]| -|[#4528](https://github.com/NVIDIA/spark-rapids/issues/4528)|[FEA] Add support for regular expressions containing `\s` and `\S`| -|[#4557](https://github.com/NVIDIA/spark-rapids/issues/4557)|[FEA] Add support for regexp_replace with back-references| - -### Performance -||| -|:---|:---| -|[#5148](https://github.com/NVIDIA/spark-rapids/issues/5148)|Add the MULTI-THREADED reading support for avro| -|[#5304](https://github.com/NVIDIA/spark-rapids/issues/5304)|[FEA] Optimize remote Avro reading for a PartitionFile| -|[#5257](https://github.com/NVIDIA/spark-rapids/issues/5257)|[FEA][Audit] - [SPARK-34863][SQL] Support complex types for Parquet vectorized reader| -|[#5149](https://github.com/NVIDIA/spark-rapids/issues/5149)|Add the COALESCING reading support for avro| - -### Bugs Fixed -||| -|:---|:---| -|[#5769](https://github.com/NVIDIA/spark-rapids/issues/5769)|[BUG] arithmetic ops tests failing on Spark 3.3.0| -|[#5785](https://github.com/NVIDIA/spark-rapids/issues/5785)|[BUG] Tests module build failed in OrcEncryptionSuite for 321cdh| -|[#5765](https://github.com/NVIDIA/spark-rapids/issues/5765)|[BUG] Container decimal overflow when casting float/double to decimal | -|[#5246](https://github.com/NVIDIA/spark-rapids/issues/5246)|Verify Parquet columnar encryption is handled safely| -|[#5770](https://github.com/NVIDIA/spark-rapids/issues/5770)|[BUG] test_buckets failed| -|[#5733](https://github.com/NVIDIA/spark-rapids/issues/5733)|[BUG] Integration test test_orc_write_encryption_fallback fail| -|[#5719](https://github.com/NVIDIA/spark-rapids/issues/5719)|[BUG] test_cast_float_to_timestamp_ansi_for_nan_inf failed in spark330| -|[#5739](https://github.com/NVIDIA/spark-rapids/issues/5739)|[BUG] Spark 3.3 build failure - QueryExecutionErrors package scope changed| -|[#5670](https://github.com/NVIDIA/spark-rapids/issues/5670)|[BUG] Job failed when parsing "java.lang.reflect.InvocationTargetException: org.apache.spark.sql.catalyst.parser.ParseException:" | -|[#4860](https://github.com/NVIDIA/spark-rapids/issues/4860)|[BUG] GPU writing ORC columns statistics| -|[#5717](https://github.com/NVIDIA/spark-rapids/issues/5717)|[BUG] `div_by_zero` test is failing on Spark 330 on 22.06| -|[#5632](https://github.com/NVIDIA/spark-rapids/issues/5632)|[BUG] udf_cudf tests failed: EOFException DataInputStream.readInt(DataInputStream.java:392)| -|[#5672](https://github.com/NVIDIA/spark-rapids/issues/5672)|[BUG] Read exception occurs when clipped schema is empty| -|[#5694](https://github.com/NVIDIA/spark-rapids/issues/5694)|[BUG] Inconsistent behavior with Spark when reading a non-existent column from Parquet| -|[#5562](https://github.com/NVIDIA/spark-rapids/issues/5562)|[BUG] read ORC file with various file schemas| -|[#5654](https://github.com/NVIDIA/spark-rapids/issues/5654)|[BUG] Transpiler produces regex pattern that cuDF cannot compile| -|[#5655](https://github.com/NVIDIA/spark-rapids/issues/5655)|[BUG] Regular expression pattern `[&&1]` produces incorrect results on GPU| -|[#4862](https://github.com/NVIDIA/spark-rapids/issues/4862)|[FEA] Add support for regular expressions containing octal digits inside character classes , eg`[\0177]`| -|[#5615](https://github.com/NVIDIA/spark-rapids/issues/5615)|[BUG] GpuBatchScanExec only reports output row metrics| -|[#4505](https://github.com/NVIDIA/spark-rapids/issues/4505)|[BUG] RegExp parse fails to parse character ranges containing escaped characters| -|[#4865](https://github.com/NVIDIA/spark-rapids/issues/4865)|[BUG] Add support for regular expressions containing hexadecimal digits inside character classes, eg `[\x7f]`| -|[#5513](https://github.com/NVIDIA/spark-rapids/issues/5513)|[BUG] NoClassDefFoundError with caller classloader off in GpuShuffleCoalesceIterator in local-cluster| -|[#5530](https://github.com/NVIDIA/spark-rapids/issues/5530)|[BUG] regexp: `\d`, `\w` inconsistencies with non-latin unicode input| -|[#5594](https://github.com/NVIDIA/spark-rapids/issues/5594)|[BUG] 3.3 test_div_overflow_exception_when_ansi test failures| -|[#5596](https://github.com/NVIDIA/spark-rapids/issues/5596)|[BUG] Shim service provider failure when using jar built with -DallowConventionalDistJar| -|[#5582](https://github.com/NVIDIA/spark-rapids/issues/5582)|[BUG] Nightly CI failed with : 'dist/target/rapids-4-spark_2.12-22.06.0-SNAPSHOT.jar' not exists| -|[#5577](https://github.com/NVIDIA/spark-rapids/issues/5577)|[BUG] test_cast_neg_to_decimal_err failing in databricks| -|[#5557](https://github.com/NVIDIA/spark-rapids/issues/5557)|[BUG] dist jar does not contain reduced pom, creates an unnecessary jar| -|[#5474](https://github.com/NVIDIA/spark-rapids/issues/5474)|[BUG] Spark 3.2.1 arithmetic_ops_test failures| -|[#5497](https://github.com/NVIDIA/spark-rapids/issues/5497)|[BUG] 3 tests in `IntervalSuite` are faling on 330| -|[#5544](https://github.com/NVIDIA/spark-rapids/issues/5544)|[BUG] GpuCreateMap needs to set hasSideEffects in some cases| -|[#5469](https://github.com/NVIDIA/spark-rapids/issues/5469)|[BUG] NPE during serialization for shuffle in array-aggregation-with-limit query| -|[#5496](https://github.com/NVIDIA/spark-rapids/issues/5496)|[BUG] `avg literals bools` is failing on 330| -|[#5511](https://github.com/NVIDIA/spark-rapids/issues/5511)|[BUG] orc_test failures on 321cdh| -|[#5439](https://github.com/NVIDIA/spark-rapids/issues/5439)|[BUG] Encrypted Parquet writes are being replaced with a GPU unencrypted write| -|[#5108](https://github.com/NVIDIA/spark-rapids/issues/5108)|[BUG] GpuArrayExists encounters a CudfException on an input partition consisting of just empty lists | -|[#5492](https://github.com/NVIDIA/spark-rapids/issues/5492)|[BUG] com.nvidia.spark.rapids.RegexCharacterClass cannot be cast to com.nvidia.spark.rapids.RegexCharacterClassComponent| -|[#4818](https://github.com/NVIDIA/spark-rapids/issues/4818)|[BUG] ASYNC: the spill store needs to synchronize on spills against the allocating stream| -|[#5481](https://github.com/NVIDIA/spark-rapids/issues/5481)|[BUG] test_parquet_check_schema_compatibility failed in databricks runtimes| -|[#5482](https://github.com/NVIDIA/spark-rapids/issues/5482)|[BUG] test_cast_string_date_invalid_ansi_before_320 failed in databricks runtime| -|[#5457](https://github.com/NVIDIA/spark-rapids/issues/5457)|[BUG] 330 AnsiCastOpSuite Unit tests failed 22 cases| -|[#5098](https://github.com/NVIDIA/spark-rapids/issues/5098)|[BUG] Harden calls to `RapidsBuffer.free`| -|[#5464](https://github.com/NVIDIA/spark-rapids/issues/5464)|[BUG] Query failure with java.lang.AssertionError when using partitioned Iceberg tables| -|[#4746](https://github.com/NVIDIA/spark-rapids/issues/4746)|[FEA] Add support for regular expressions containing octal digits in range `\200` to `377`| -|[#5200](https://github.com/NVIDIA/spark-rapids/issues/5200)|[BUG] More detailed logs to show which parquet file and which data type has mismatch.| -|[#4866](https://github.com/NVIDIA/spark-rapids/issues/4866)|[BUG] Add support for regular expressions containing hexadecimal digits greater than `0x7f`| -|[#5140](https://github.com/NVIDIA/spark-rapids/issues/5140)|[BUG] NPE on array_max of transformed empty array| -|[#5444](https://github.com/NVIDIA/spark-rapids/issues/5444)|[BUG] build failed on Databricks| -|[#5357](https://github.com/NVIDIA/spark-rapids/issues/5357)|[BUG] Spark 3.3 cache_test test_passing_gpuExpr_as_Expr[failures| -|[#5429](https://github.com/NVIDIA/spark-rapids/issues/5429)|[BUG] test_cache_expand_exec fails on Spark 3.3| -|[#5312](https://github.com/NVIDIA/spark-rapids/issues/5312)|[BUG] The coalesced AVRO file may contain different sync markers if the sync marker varies in the avro files being coalesced.| -|[#5415](https://github.com/NVIDIA/spark-rapids/issues/5415)|[BUG] Regular Expressions: matching the dot `.` doesn't fully exclude all unicode line terminator characters| -|[#5413](https://github.com/NVIDIA/spark-rapids/issues/5413)|[BUG] Databricks 321 build fails - not found: type OrcShims320untilAllBase| -|[#5286](https://github.com/NVIDIA/spark-rapids/issues/5286)|[BUG] assert failed test_struct_self_join and test_computation_in_grpby_columns| -|[#5351](https://github.com/NVIDIA/spark-rapids/issues/5351)|[BUG] Build fails for Spark 3.3 due to extra arguments to mapKeyNotExistError| -|[#5260](https://github.com/NVIDIA/spark-rapids/issues/5260)|[BUG] map_test failures on Spark 3.3.0| -|[#5189](https://github.com/NVIDIA/spark-rapids/issues/5189)|[BUG] Reading from iceberg table will fail.| -|[#5130](https://github.com/NVIDIA/spark-rapids/issues/5130)|[BUG] string_split does not respect spark.rapids.sql.regexp.enabled config| -|[#5267](https://github.com/NVIDIA/spark-rapids/issues/5267)|[BUG] markdown link check failed issue| -|[#5295](https://github.com/NVIDIA/spark-rapids/issues/5295)|[BUG] Build fails for Spark 3.3 due to extra arguments to `mapKeyNotExistError`| -|[#5264](https://github.com/NVIDIA/spark-rapids/issues/5264)|[BUG] Delete unused generic type.| -|[#5275](https://github.com/NVIDIA/spark-rapids/issues/5275)|[BUG] rlike cannot run on GPU because invalid or unsupported escape character ']' near index 14| -|[#5278](https://github.com/NVIDIA/spark-rapids/issues/5278)|[BUG] build 311cdh failed: unable to find valid certification path to requested target| -|[#5211](https://github.com/NVIDIA/spark-rapids/issues/5211)|[BUG] csv_test:test_basic_csv_read FAILED | -|[#5244](https://github.com/NVIDIA/spark-rapids/issues/5244)|[BUG] Spark 3.3 integration test failures logic_test.py::test_logical_with_side_effect| -|[#5041](https://github.com/NVIDIA/spark-rapids/issues/5041)|[BUG] Implement hasSideEffects for all expressions that have side-effects| -|[#4980](https://github.com/NVIDIA/spark-rapids/issues/4980)|[BUG] window_function_test FAILED on PASCAL GPU| -|[#5240](https://github.com/NVIDIA/spark-rapids/issues/5240)|[BUG] EGX integration test_collect_list_reductions failures| -|[#5242](https://github.com/NVIDIA/spark-rapids/issues/5242)|[BUG] Executor falls back to cudaMalloc if the pool can't be initialized| -|[#5215](https://github.com/NVIDIA/spark-rapids/issues/5215)|[BUG] Coalescing reading is not working for v2 parquet/orc datasource| -|[#5104](https://github.com/NVIDIA/spark-rapids/issues/5104)|[BUG] Unconditional warning in UDF Plugin "The compiler is disabled by default"| -|[#5099](https://github.com/NVIDIA/spark-rapids/issues/5099)|[BUG] Profiling tool should not sum gettingResultTime| -|[#5182](https://github.com/NVIDIA/spark-rapids/issues/5182)|[BUG] Spark 3.3 integration tests arithmetic_ops_test.py::test_div_overflow_exception_when_ansi failures| -|[#5147](https://github.com/NVIDIA/spark-rapids/issues/5147)|[BUG] object LZ4Compressor is not a member of package ai.rapids.cudf.nvcomp| -|[#4695](https://github.com/NVIDIA/spark-rapids/issues/4695)|[BUG] Segfault with UCX and ASYNC allocator| -|[#5138](https://github.com/NVIDIA/spark-rapids/issues/5138)|[BUG] xgboost job failed if we enable PCBS| -|[#5135](https://github.com/NVIDIA/spark-rapids/issues/5135)|[BUG] GpuRegExExtract is not align with RegExExtract| -|[#5084](https://github.com/NVIDIA/spark-rapids/issues/5084)|[BUG] GpuWriteTaskStatsTracker complains for all writes in local mode| -|[#5123](https://github.com/NVIDIA/spark-rapids/issues/5123)|[BUG] Compile error for Spark330 because of VectorizedColumnReader constructor added a new parameter.| -|[#5133](https://github.com/NVIDIA/spark-rapids/issues/5133)|[BUG] Compile error for Spark330 because of Spark changed the method signature: QueryExecutionErrors.mapKeyNotExistError| -|[#4959](https://github.com/NVIDIA/spark-rapids/issues/4959)|[BUG] Test case in OpcodeSuite failed on Spark 3.3.0| - -### PRs -||| -|:---|:---| -|[#5863](https://github.com/NVIDIA/spark-rapids/pull/5863)|Update 22.06 changelog to include new commits [skip ci]| -|[#5861](https://github.com/NVIDIA/spark-rapids/pull/5861)|[Doc]Add Spark3.3 support in doc for 22.06 branch[skip ci]| -|[#5851](https://github.com/NVIDIA/spark-rapids/pull/5851)|Update 22.06 changelog to include new commits [skip ci]| -|[#5848](https://github.com/NVIDIA/spark-rapids/pull/5848)|Update spark330shim to use released lib| -|[#5840](https://github.com/NVIDIA/spark-rapids/pull/5840)|[DOC] Updated RapidsConf to reflect the default value of `spark.rapids.sql.improvedFloatOps.enabled` [skip ci]| -|[#5816](https://github.com/NVIDIA/spark-rapids/pull/5816)|Update 22.06.0 changelog to latest [skip ci]| -|[#5795](https://github.com/NVIDIA/spark-rapids/pull/5795)|Update FAQ to include local jar deployment via extraClassPath [skip ci]| -|[#5802](https://github.com/NVIDIA/spark-rapids/pull/5802)|Update spark-rapids-jni.version to release 22.06.0| -|[#5798](https://github.com/NVIDIA/spark-rapids/pull/5798)|Fall back to CPU for RoundCeil and RoundFloor expressions| -|[#5791](https://github.com/NVIDIA/spark-rapids/pull/5791)|Remove ORC encryption test from 321cdh| -|[#5766](https://github.com/NVIDIA/spark-rapids/pull/5766)|Fix the overflow of container type when casting floats to decimal| -|[#5786](https://github.com/NVIDIA/spark-rapids/pull/5786)|Fix rounds over decimal in Spark 330+| -|[#5761](https://github.com/NVIDIA/spark-rapids/pull/5761)|Throw an exception when attempting to read columnar encrypted Parquet files on the GPU| -|[#5784](https://github.com/NVIDIA/spark-rapids/pull/5784)|Update the error string for test_cast_neg_to_decimal_err on 330| -|[#5781](https://github.com/NVIDIA/spark-rapids/pull/5781)|Correct the exception string for test_mod_pmod_by_zero on Spark 3.3.0| -|[#5764](https://github.com/NVIDIA/spark-rapids/pull/5764)|Add test for encrypted ORC write| -|[#5760](https://github.com/NVIDIA/spark-rapids/pull/5760)|Enable avrotest in nightly tests [skip ci]| -|[#5746](https://github.com/NVIDIA/spark-rapids/pull/5746)|Init 22.06 changelog [skip ci]| -|[#5716](https://github.com/NVIDIA/spark-rapids/pull/5716)|Disable Avro support when spark-avro classes not loadable by Shim classloader| -|[#5737](https://github.com/NVIDIA/spark-rapids/pull/5737)|Remove the ORC encryption tests| -|[#5753](https://github.com/NVIDIA/spark-rapids/pull/5753)|[DOC] Update regexp compatibility for 22.06 [skip ci]| -|[#5738](https://github.com/NVIDIA/spark-rapids/pull/5738)|Update Spark2 explain code for 22.06| -|[#5731](https://github.com/NVIDIA/spark-rapids/pull/5731)|Throw SparkDateTimeException for InvalidInput while casting in ANSI mode| -|[#5742](https://github.com/NVIDIA/spark-rapids/pull/5742)|Spark-3.3 build fix - Move QueryExecutionErrors to sql package| -|[#5641](https://github.com/NVIDIA/spark-rapids/pull/5641)|[Doc]Update 22.06 documentation[skip ci]| -|[#5701](https://github.com/NVIDIA/spark-rapids/pull/5701)|Update docs for qualification tool to reflect recommendations and UI [skip ci]| -|[#5283](https://github.com/NVIDIA/spark-rapids/pull/5283)|Add documentation for MIG on Dataproc [skip ci]| -|[#5728](https://github.com/NVIDIA/spark-rapids/pull/5728)|Qualification tool: Add test for stage failures| -|[#5681](https://github.com/NVIDIA/spark-rapids/pull/5681)|Branch 22.06 nvcomp notice binary [skip ci]| -|[#5713](https://github.com/NVIDIA/spark-rapids/pull/5713)|Fix GpuCast losing the timezoneId during canonicalization| -|[#5715](https://github.com/NVIDIA/spark-rapids/pull/5715)|Update GPU ORC statistics write support| -|[#5718](https://github.com/NVIDIA/spark-rapids/pull/5718)|Update the error message for div_by_zero test| -|[#5604](https://github.com/NVIDIA/spark-rapids/pull/5604)|ORC encrypted write should fallback to CPU| -|[#5674](https://github.com/NVIDIA/spark-rapids/pull/5674)|Fix reading ORC/PARQUET over empty clipped schema| -|[#5676](https://github.com/NVIDIA/spark-rapids/pull/5676)|Fix ORC reading over different schemas| -|[#5693](https://github.com/NVIDIA/spark-rapids/pull/5693)|Temporarily allow 3.3.1 for 3.3.0 shims.| -|[#5591](https://github.com/NVIDIA/spark-rapids/pull/5591)|Enable regular expressions by default| -|[#5664](https://github.com/NVIDIA/spark-rapids/pull/5664)|Fix edge case where one side of regexp choice ends in duplicate string anchors | -|[#5542](https://github.com/NVIDIA/spark-rapids/pull/5542)|Support arrays of arrays and structs for concat on arrays| -|[#5677](https://github.com/NVIDIA/spark-rapids/pull/5677)|Qualification tool Enable UI by default| -|[#5575](https://github.com/NVIDIA/spark-rapids/pull/5575)|Regexp: Transpile `\D`, `\W` to Java's definitions| -|[#5668](https://github.com/NVIDIA/spark-rapids/pull/5668)|Add user as CI owner [skip ci]| -|[#5627](https://github.com/NVIDIA/spark-rapids/pull/5627)|Install locales and generate en_US.UTF-8| -|[#5514](https://github.com/NVIDIA/spark-rapids/pull/5514)|ANSI mode: allow casting between numeric type and timestamp type| -|[#5600](https://github.com/NVIDIA/spark-rapids/pull/5600)|Qualification tool UI cosmetics and CSV output changes| -|[#5658](https://github.com/NVIDIA/spark-rapids/pull/5658)|Fallback to CPU when `&&` found in character class| -|[#5644](https://github.com/NVIDIA/spark-rapids/pull/5644)|Qualification tool: Enable UDF reporting in potential problems| -|[#5645](https://github.com/NVIDIA/spark-rapids/pull/5645)|Add support for octal digits in character classes| -|[#5643](https://github.com/NVIDIA/spark-rapids/pull/5643)|Fix missing GpuBatchScanExec metrics in SQL UI| -|[#5441](https://github.com/NVIDIA/spark-rapids/pull/5441)|Enable optional float confs and update docs mentioning them| -|[#5532](https://github.com/NVIDIA/spark-rapids/pull/5532)|Support hex digits in character classes and escaped characters in character class ranges| -|[#5625](https://github.com/NVIDIA/spark-rapids/pull/5625)|[DOC]update links for 2206 release[skip ci]| -|[#5623](https://github.com/NVIDIA/spark-rapids/pull/5623)|Handle duplicates in negated character classes| -|[#5533](https://github.com/NVIDIA/spark-rapids/pull/5533)|Support `GpuMapConcat` | -|[#5614](https://github.com/NVIDIA/spark-rapids/pull/5614)|Move HostConcatResultUtil out of unshimmed classes| -|[#5612](https://github.com/NVIDIA/spark-rapids/pull/5612)|Qualification tool: update SQL Df value used and look at jobs in SQL| -|[#5526](https://github.com/NVIDIA/spark-rapids/pull/5526)|Fix whitespace `\s` and `\S` tests| -|[#5541](https://github.com/NVIDIA/spark-rapids/pull/5541)|Regexp: Transpile `\d`, `\w` to Java's definitions| -|[#5598](https://github.com/NVIDIA/spark-rapids/pull/5598)|Qualification tool: Update RunningQualificationApp tests| -|[#5601](https://github.com/NVIDIA/spark-rapids/pull/5601)|Update test_div_overflow_exception_when_ansi test for Spark-3.3| -|[#5588](https://github.com/NVIDIA/spark-rapids/pull/5588)|Update Databricks build scripts| -|[#5599](https://github.com/NVIDIA/spark-rapids/pull/5599)|Move ShimServiceProvider file re-init/truncate| -|[#5531](https://github.com/NVIDIA/spark-rapids/pull/5531)|Filter rows with null keys when coalescing due to reaching cuDF row limits| -|[#5550](https://github.com/NVIDIA/spark-rapids/pull/5550)|Qualification tool hook up final output based on per exec analysis| -|[#5540](https://github.com/NVIDIA/spark-rapids/pull/5540)|Support RaiseError| -|[#5505](https://github.com/NVIDIA/spark-rapids/pull/5505)|Support spark.sql.mapKeyDedupPolicy=LAST_WIN for TransformKeys| -|[#5583](https://github.com/NVIDIA/spark-rapids/pull/5583)|Disable spark snapshot shims build for pre-merge| -|[#5584](https://github.com/NVIDIA/spark-rapids/pull/5584)|Enable automerge from branch-22.06 to 22.08 [skip ci]| -|[#5581](https://github.com/NVIDIA/spark-rapids/pull/5581)|nightly CI to install and deploy cuda11 classifier dist jar [skip ci]| -|[#5579](https://github.com/NVIDIA/spark-rapids/pull/5579)|Update test_cast_neg_to_decimal_err to work with Databricks 10.4 where exception is different| -|[#5578](https://github.com/NVIDIA/spark-rapids/pull/5578)|Fix unfiltered partitions being used to create GpuBatchScanExec RDD| -|[#5560](https://github.com/NVIDIA/spark-rapids/pull/5560)|Minor: Clean up the tests of `concat_list`| -|[#5528](https://github.com/NVIDIA/spark-rapids/pull/5528)|Enable build and test with JDK11| -|[#5571](https://github.com/NVIDIA/spark-rapids/pull/5571)|Update array_min and array_max to use new cudf operations| -|[#5558](https://github.com/NVIDIA/spark-rapids/pull/5558)|Fix target file for update from extra-resources in dist module| -|[#5556](https://github.com/NVIDIA/spark-rapids/pull/5556)|Move FsInput creation into AvroFileReader| -|[#5483](https://github.com/NVIDIA/spark-rapids/pull/5483)|Don't distinguish between types of `ArithmeticException` for Spark 3.2.x| -|[#5539](https://github.com/NVIDIA/spark-rapids/pull/5539)|Fix IntervalSuite cases failure| -|[#5421](https://github.com/NVIDIA/spark-rapids/pull/5421)|Support multi-threaded reading for avro| -|[#5538](https://github.com/NVIDIA/spark-rapids/pull/5538)|Add tests for string to timestamp functions in ANSI mode| -|[#5546](https://github.com/NVIDIA/spark-rapids/pull/5546)|Set hasSideEffects correctly for GpuCreateMap| -|[#5529](https://github.com/NVIDIA/spark-rapids/pull/5529)|Fix failing bool agg test in Spark 3.3| -|[#5500](https://github.com/NVIDIA/spark-rapids/pull/5500)|Fallback parquet reading with merged schema and native footer reader| -|[#5534](https://github.com/NVIDIA/spark-rapids/pull/5534)|MVN_OPT to last, as it is empty in most cases| -|[#5523](https://github.com/NVIDIA/spark-rapids/pull/5523)|Enable forcePositionEvolution for 321cdh| -|[#5501](https://github.com/NVIDIA/spark-rapids/pull/5501)|Build against specified spark-rapids-jni snapshot jar [skip ci]| -|[#5489](https://github.com/NVIDIA/spark-rapids/pull/5489)|Fallback to the CPU if Parquet encryption keys are set| -|[#5527](https://github.com/NVIDIA/spark-rapids/pull/5527)|Fix bug with character class immediately following a string anchor| -|[#5506](https://github.com/NVIDIA/spark-rapids/pull/5506)|Fix ClassCastException in regular expression transpiler| -|[#5519](https://github.com/NVIDIA/spark-rapids/pull/5519)|Address feedback in "string anchors regexp replace" PR| -|[#5520](https://github.com/NVIDIA/spark-rapids/pull/5520)|[DOC] Remove Spark from our naming of Tools [skip ci]| -|[#5491](https://github.com/NVIDIA/spark-rapids/pull/5491)|Enables `$`, `\z`, and `\Z` in `REGEXP_REPLACE` on the GPU| -|[#5470](https://github.com/NVIDIA/spark-rapids/pull/5470)|Qualification tool support UI code generation| -|[#5353](https://github.com/NVIDIA/spark-rapids/pull/5353)|Supports casting between ANSI interval types and integral types| -|[#5487](https://github.com/NVIDIA/spark-rapids/pull/5487)|Add limited support for captured vars and athrow| -|[#5499](https://github.com/NVIDIA/spark-rapids/pull/5499)|[DOC]update doc for emr6.6[skip ci]| -|[#5485](https://github.com/NVIDIA/spark-rapids/pull/5485)|Add cudaStreamSynchronize when a new device buffer is added to the spill framework| -|[#5477](https://github.com/NVIDIA/spark-rapids/pull/5477)|Add support for `\h`, `\H`, `\v`, `\V`, and `\R` character classes| -|[#5490](https://github.com/NVIDIA/spark-rapids/pull/5490)|Qualification tool: Update speedup factor for few operators| -|[#5494](https://github.com/NVIDIA/spark-rapids/pull/5494)|Fix databrick Shim to support Ansi mode when casting from string to date| -|[#5498](https://github.com/NVIDIA/spark-rapids/pull/5498)|Enable 330 unit tests for nightly| -|[#5504](https://github.com/NVIDIA/spark-rapids/pull/5504)|Fix printing of split information when dumping debug data| -|[#5486](https://github.com/NVIDIA/spark-rapids/pull/5486)|Fix regression in AnsiCastOpSuite with Spark 3.3.0| -|[#5436](https://github.com/NVIDIA/spark-rapids/pull/5436)|Support `map_filter` operator| -|[#5471](https://github.com/NVIDIA/spark-rapids/pull/5471)|Add implicit `safeFree` for `RapidsBuffer`| -|[#5465](https://github.com/NVIDIA/spark-rapids/pull/5465)|Fix query planning issue when Iceberg is used with DPP and AQE| -|[#5459](https://github.com/NVIDIA/spark-rapids/pull/5459)|Add test cases for casting string to date in ANSI mode| -|[#5443](https://github.com/NVIDIA/spark-rapids/pull/5443)|Add support for regular expressions containing octal digits greater than `\200`| -|[#5468](https://github.com/NVIDIA/spark-rapids/pull/5468)|Qualification tool: Add support for join, pandas, aggregate execs| -|[#5473](https://github.com/NVIDIA/spark-rapids/pull/5473)|Remove hasNan check over array_contains| -|[#5434](https://github.com/NVIDIA/spark-rapids/pull/5434)|Check schema compatibility when building parquet readers| -|[#5442](https://github.com/NVIDIA/spark-rapids/pull/5442)|Add support for regular expressions containing hexadecimal digits greater than `0x7f`| -|[#5466](https://github.com/NVIDIA/spark-rapids/pull/5466)|[Doc] Change the picture of the query plan to text format. [skip ci]| -|[#5310](https://github.com/NVIDIA/spark-rapids/pull/5310)|Use C++ to parse and filter parquet footers.| -|[#5454](https://github.com/NVIDIA/spark-rapids/pull/5454)|QualificationTool. Add speedup information to AppSummaryInfo| -|[#5455](https://github.com/NVIDIA/spark-rapids/pull/5455)|Moved ShimCurrentBatchIterator so it's visible to db312 and db321| -|[#5354](https://github.com/NVIDIA/spark-rapids/pull/5354)|Plugin should throw same arithmetic exceptions as Spark part1| -|[#5440](https://github.com/NVIDIA/spark-rapids/pull/5440)|Qualification tool support for read and write execs and more, add mapping stage times to sql execs| -|[#5431](https://github.com/NVIDIA/spark-rapids/pull/5431)|[DOC] Update the ubuntu repo key [skip ci]| -|[#5425](https://github.com/NVIDIA/spark-rapids/pull/5425)|Handle readBatch changes for Spark 3.3.0| -|[#5438](https://github.com/NVIDIA/spark-rapids/pull/5438)|Add tests for all-null data for array_max| -|[#5428](https://github.com/NVIDIA/spark-rapids/pull/5428)|Make the sync marker uniform for the Avro coalescing reader| -|[#5432](https://github.com/NVIDIA/spark-rapids/pull/5432)|Test case insensitive reading for Parquet and CSV| -|[#5433](https://github.com/NVIDIA/spark-rapids/pull/5433)|[DOC] Removed mention of 30x from shims.md [skip ci]| -|[#5424](https://github.com/NVIDIA/spark-rapids/pull/5424)|Exclude all unicode line terminator characters from matching dot| -|[#5426](https://github.com/NVIDIA/spark-rapids/pull/5426)|Qualification tool: Parsing Execs to get the ExecInfo #2| -|[#5427](https://github.com/NVIDIA/spark-rapids/pull/5427)|Workaround to fix cuda repo key rotation in ubuntu images [skip ci]| -|[#5419](https://github.com/NVIDIA/spark-rapids/pull/5419)|Append my id to blossom-ci whitelist [skip ci]| -|[#5422](https://github.com/NVIDIA/spark-rapids/pull/5422)|xfail tests for spark 3.3.0 due to changes in readBatch| -|[#5420](https://github.com/NVIDIA/spark-rapids/pull/5420)|Qualification tool: Parsing Execs to get the ExecInfo #1 | -|[#5418](https://github.com/NVIDIA/spark-rapids/pull/5418)|Add GpuEqualToNoNans and update GpuPivotFirst to use to handle PivotFirst with NaN support enabled on GPU| -|[#5306](https://github.com/NVIDIA/spark-rapids/pull/5306)|Support coalescing reading for avro| -|[#5410](https://github.com/NVIDIA/spark-rapids/pull/5410)|Update docs for removal of 311cdh| -|[#5414](https://github.com/NVIDIA/spark-rapids/pull/5414)|Add 320+-noncdh to Databricks to fix 321db build| -|[#5349](https://github.com/NVIDIA/spark-rapids/pull/5349)|Enable some repetitions for `\A` and `\Z`| -|[#5346](https://github.com/NVIDIA/spark-rapids/pull/5346)|ADD 321cdh shim to rapids and remove 311cdh shim| -|[#5408](https://github.com/NVIDIA/spark-rapids/pull/5408)|[DOC] Add rebase mode notes for databricks doc [skip ci]| -|[#5348](https://github.com/NVIDIA/spark-rapids/pull/5348)|Qualification tool: Skip GPU event logs| -|[#5400](https://github.com/NVIDIA/spark-rapids/pull/5400)|Restore test_computation_in_grpby_columns and test_struct_self_join| -|[#5399](https://github.com/NVIDIA/spark-rapids/pull/5399)|Update New Issue template to recommend a Discussion or Question [skip ci]| -|[#5293](https://github.com/NVIDIA/spark-rapids/pull/5293)|Support array_repeat| -|[#5359](https://github.com/NVIDIA/spark-rapids/pull/5359)|Qualification tool base plan parsing infrastructure| -|[#5360](https://github.com/NVIDIA/spark-rapids/pull/5360)|Revert "skip failing tests for Spark 3.3.0 (#5313)"| -|[#5326](https://github.com/NVIDIA/spark-rapids/pull/5326)|Update GCP doc and scripts [skip ci]| -|[#5352](https://github.com/NVIDIA/spark-rapids/pull/5352)|Fix spark330 build due to mapKeyNotExistError changed| -|[#5317](https://github.com/NVIDIA/spark-rapids/pull/5317)|Support arrays_zip| -|[#5316](https://github.com/NVIDIA/spark-rapids/pull/5316)|Support ANSI mode for `ToUnixTimestamp, UnixTimestamp, GetTimestamp, DateAddInterval`| -|[#5319](https://github.com/NVIDIA/spark-rapids/pull/5319)|Re-enable support for `\Z` in regular expressions on the GPU| -|[#5315](https://github.com/NVIDIA/spark-rapids/pull/5315)|Simplify conditional catalyst expressions generated by udf-compiler| -|[#5301](https://github.com/NVIDIA/spark-rapids/pull/5301)|Support existence join type for broadcast nested loop join| -|[#5313](https://github.com/NVIDIA/spark-rapids/pull/5313)|skip failing tests for Spark 3.3.0| -|[#5311](https://github.com/NVIDIA/spark-rapids/pull/5311)|Add information about the discussion board to the README and FAQ [skip ci]| -|[#5308](https://github.com/NVIDIA/spark-rapids/pull/5308)|Remove unused ColumnViewUtil| -|[#5289](https://github.com/NVIDIA/spark-rapids/pull/5289)|Re-enable dollar ($) line anchor in regular expressions in find mode | -|[#5274](https://github.com/NVIDIA/spark-rapids/pull/5274)|Perform explicit UnsafeRow projection in ColumnarToRow transition| -|[#5297](https://github.com/NVIDIA/spark-rapids/pull/5297)|GpuStringSplit now honors the`spark.rapids.sql.regexp.enabled` configuration option| -|[#5307](https://github.com/NVIDIA/spark-rapids/pull/5307)|Remove compatibility guide reference to issue #4060| -|[#5298](https://github.com/NVIDIA/spark-rapids/pull/5298)|Qualification tool: Operator mapping from plugin to CSV file| -|[#5266](https://github.com/NVIDIA/spark-rapids/pull/5266)|Update Outdated GCP getting started guide[skip ci]| -|[#5300](https://github.com/NVIDIA/spark-rapids/pull/5300)|Fix DIST_JAR PATH in coverage-report [skip ci]| -|[#5290](https://github.com/NVIDIA/spark-rapids/pull/5290)|Add documentation about reporting security issues [skip ci]| -|[#5277](https://github.com/NVIDIA/spark-rapids/pull/5277)|Support multiple datatypes in `TypeSig.withPsNote()`| -|[#5296](https://github.com/NVIDIA/spark-rapids/pull/5296)|Fix spark330 build due to removal of isElementAt parameter from mapKeyNotExistError| -|[#5291](https://github.com/NVIDIA/spark-rapids/pull/5291)|fix dead links in shims.md [skip ci]| -|[#5276](https://github.com/NVIDIA/spark-rapids/pull/5276)|fix markdown check issue[skip ci]| -|[#5270](https://github.com/NVIDIA/spark-rapids/pull/5270)|Include dependency of common jar in tools jar| -|[#5265](https://github.com/NVIDIA/spark-rapids/pull/5265)|Remove unused generic types| -|[#5288](https://github.com/NVIDIA/spark-rapids/pull/5288)|Temporarily xfail tests to restore premerge builds| -|[#5287](https://github.com/NVIDIA/spark-rapids/pull/5287)|Fix nightly scripts to deploy w/ classifier correctly [skip ci]| -|[#5134](https://github.com/NVIDIA/spark-rapids/pull/5134)|Support division on ANSI interval types| -|[#5279](https://github.com/NVIDIA/spark-rapids/pull/5279)|Add test case for ANSI pmod and ANSI Remainder| -|[#5284](https://github.com/NVIDIA/spark-rapids/pull/5284)|Enable support for escaping the right square bracket| -|[#5280](https://github.com/NVIDIA/spark-rapids/pull/5280)|[BUG] Fix incorrect plugin nightly deployment and release [skip ci]| -|[#5249](https://github.com/NVIDIA/spark-rapids/pull/5249)|Use a bundled spark-rapids-jni dependency instead of external cudf dependency| -|[#5268](https://github.com/NVIDIA/spark-rapids/pull/5268)|[BUG] When ASYNC is enabled GDS needs to handle cudaMalloced bounce buffers| -|[#5230](https://github.com/NVIDIA/spark-rapids/pull/5230)|Update csv float tests to reflect changes in precision in cuDF| -|[#5001](https://github.com/NVIDIA/spark-rapids/pull/5001)|Add fuzzing test for JSON reader| -|[#5155](https://github.com/NVIDIA/spark-rapids/pull/5155)|Support casting between day-time interval and string| -|[#5247](https://github.com/NVIDIA/spark-rapids/pull/5247)|Fix test failure caused by change in Spark 3.3 exception| -|[#5254](https://github.com/NVIDIA/spark-rapids/pull/5254)|Fix the integration test of collect_list_reduction| -|[#5243](https://github.com/NVIDIA/spark-rapids/pull/5243)|Throw again after logging that RMM could not intialize| -|[#5105](https://github.com/NVIDIA/spark-rapids/pull/5105)|Support multiplication on ANSI interval types| -|[#5171](https://github.com/NVIDIA/spark-rapids/pull/5171)|Fix the bug COALESCING reading does not work for v2 parquet/orc datasource| -|[#5157](https://github.com/NVIDIA/spark-rapids/pull/5157)|Update the log warning of UDF compiler| -|[#5213](https://github.com/NVIDIA/spark-rapids/pull/5213)|Support sample on ANSI interval types| -|[#5218](https://github.com/NVIDIA/spark-rapids/pull/5218)|XFAIL tests that are failing due to issue 5211| -|[#5202](https://github.com/NVIDIA/spark-rapids/pull/5202)|Profiling tool: Remove gettingResultTime from stages & jobs aggregation| -|[#5201](https://github.com/NVIDIA/spark-rapids/pull/5201)|Fix merge conflict from branch-22.04| -|[#5195](https://github.com/NVIDIA/spark-rapids/pull/5195)|Refactor Spark33XShims to avoid code duplication| -|[#5185](https://github.com/NVIDIA/spark-rapids/pull/5185)|Fix test failure with Spark 3.3 by looking for less specific error message| -|[#4992](https://github.com/NVIDIA/spark-rapids/pull/4992)|Support Collect-like Reduction Aggregations| -|[#5193](https://github.com/NVIDIA/spark-rapids/pull/5193)|Fix auto merge conflict 5192 [skip ci]| -|[#5020](https://github.com/NVIDIA/spark-rapids/pull/5020)|Support arithmetic operators on ANSI interval types| -|[#5174](https://github.com/NVIDIA/spark-rapids/pull/5174)|Fix auto merge conflict 5173 [skip ci]| -|[#5168](https://github.com/NVIDIA/spark-rapids/pull/5168)|Fix auto merge conflict 5166| -|[#5151](https://github.com/NVIDIA/spark-rapids/pull/5151)|Remove NvcompLZ4CompressionCodec single-buffer APIs| -|[#5132](https://github.com/NVIDIA/spark-rapids/pull/5132)|Add `count` support for all types| -|[#5141](https://github.com/NVIDIA/spark-rapids/pull/5141)|Upgrade to UCX 1.12.1 for 22.06| -|[#5143](https://github.com/NVIDIA/spark-rapids/pull/5143)|Fix merge conflict with branch-22.04| -|[#5144](https://github.com/NVIDIA/spark-rapids/pull/5144)|Adapt to storage-partitioned join additions in SPARK-37377| -|[#5139](https://github.com/NVIDIA/spark-rapids/pull/5139)|Make mvn-verify check name more descriptive [skip ci]| -|[#5136](https://github.com/NVIDIA/spark-rapids/pull/5136)|Fix GpuRegExExtract about inconsistent to Spark | -|[#5107](https://github.com/NVIDIA/spark-rapids/pull/5107)|Fix GpuFileFormatDataWriter failing to stat file after commit| -|[#5124](https://github.com/NVIDIA/spark-rapids/pull/5124)|Fix ShimVectorizedColumnReader construction for recent Spark 3.3.0 changes| -|[#5047](https://github.com/NVIDIA/spark-rapids/pull/5047)|Change Cast.toString as "cast" instead of "ansi_cast" under ANSI mode| -|[#5089](https://github.com/NVIDIA/spark-rapids/pull/5089)|Enable regular expressions containing `\s` and `\S`| -|[#5087](https://github.com/NVIDIA/spark-rapids/pull/5087)|Add support for regexp_replace with back-references| -|[#5110](https://github.com/NVIDIA/spark-rapids/pull/5110)|Appending my id (mattahrens) to the blossom-ci whitelist [skip ci]| -|[#5090](https://github.com/NVIDIA/spark-rapids/pull/5090)|Add nvtx ranges around pre, agg, and post steps in hash aggregate| -|[#5092](https://github.com/NVIDIA/spark-rapids/pull/5092)|Remove single-buffer compression codec APIs| -|[#5093](https://github.com/NVIDIA/spark-rapids/pull/5093)|Fix leak when GDS buffer store closes| -|[#5067](https://github.com/NVIDIA/spark-rapids/pull/5067)|Premerge databricks CI autotrigger [skip ci]| -|[#5083](https://github.com/NVIDIA/spark-rapids/pull/5083)|Remove EMRShimVersion| -|[#5076](https://github.com/NVIDIA/spark-rapids/pull/5076)|Unshim cache serializer and other 311+-all code| -|[#5074](https://github.com/NVIDIA/spark-rapids/pull/5074)|Make ASYNC the default allocator for 22.06| -|[#5073](https://github.com/NVIDIA/spark-rapids/pull/5073)|Add in nvtx ranges for parquet filterBlocks| -|[#5077](https://github.com/NVIDIA/spark-rapids/pull/5077)|Change Scala style continuation indentation to be 2 spaces to match guide [skip ci]| -|[#5070](https://github.com/NVIDIA/spark-rapids/pull/5070)|Fix merge from 22.04 to 22.06| -|[#5046](https://github.com/NVIDIA/spark-rapids/pull/5046)|Init 22.06.0-SNAPSHOT| -|[#5059](https://github.com/NVIDIA/spark-rapids/pull/5059)|Fix merge from 22.04 to 22.06| -|[#5036](https://github.com/NVIDIA/spark-rapids/pull/5036)|Unshim many expressions| -|[#4993](https://github.com/NVIDIA/spark-rapids/pull/4993)|PCBS and Parquet support ANSI year month interval type| -|[#5031](https://github.com/NVIDIA/spark-rapids/pull/5031)|Unshim many SparkShim interfaces| -|[#5027](https://github.com/NVIDIA/spark-rapids/pull/5027)|Fix merge of branch-22.04 to branch-22.06| -|[#5022](https://github.com/NVIDIA/spark-rapids/pull/5022)|Unshim many Pandas execs| -|[#5013](https://github.com/NVIDIA/spark-rapids/pull/5013)|Unshim GpuRowBasedScalaUDF| -|[#5012](https://github.com/NVIDIA/spark-rapids/pull/5012)|Unshim GpuOrcScan and GpuParquetScan| -|[#5010](https://github.com/NVIDIA/spark-rapids/pull/5010)|Unshim GpuSumDefaults| -|[#5007](https://github.com/NVIDIA/spark-rapids/pull/5007)|Remove schema utils, case class copying, file partition, and legacy statistical aggregate shims| -|[#4999](https://github.com/NVIDIA/spark-rapids/pull/4999)|Enable automerge from branch-22.04 to branch-22.06 [skip ci]| - -## Release 22.04 - -### Features -||| -|:---|:---| -|[#4734](https://github.com/NVIDIA/spark-rapids/issues/4734)|[FEA] Support approx_percentile in reduction context| -|[#1922](https://github.com/NVIDIA/spark-rapids/issues/1922)|[FEA] Support ORC forced positional evolution| -|[#123](https://github.com/NVIDIA/spark-rapids/issues/123)|[FEA] add in support for dayfirst formats in the CSV parser| -|[#4863](https://github.com/NVIDIA/spark-rapids/issues/4863)|[FEA] Improve timestamp support in JSON and CSV readers| -|[#4935](https://github.com/NVIDIA/spark-rapids/issues/4935)|[FEA] Support reading Avro: primitive types| -|[#4915](https://github.com/NVIDIA/spark-rapids/issues/4915)|[FEA] Drop support for Spark 3.0.1, 3.0.2, 3.0.3, Databricks 7.3 ML LTS| -|[#4815](https://github.com/NVIDIA/spark-rapids/issues/4815)|[FEA] Support org.apache.spark.sql.catalyst.expressions.ArrayExists| -|[#3245](https://github.com/NVIDIA/spark-rapids/issues/3245)|[FEA] GpuGetMapValue should support all valid value data types and non-complex key types| -|[#4914](https://github.com/NVIDIA/spark-rapids/issues/4914)|[FEA] Support for Databricks 10.4 ML LTS| -|[#4945](https://github.com/NVIDIA/spark-rapids/issues/4945)|[FEA] Support filter and comparisons on ANSI day time interval type| -|[#4004](https://github.com/NVIDIA/spark-rapids/issues/4004)|[FEA] Add support for percent_rank| -|[#1111](https://github.com/NVIDIA/spark-rapids/issues/1111)|[FEA] support `spark.sql.legacy.timeParserPolicy` when parsing CSV files| -|[#4849](https://github.com/NVIDIA/spark-rapids/issues/4849)|[FEA] Support parsing dates in JSON reader| -|[#4789](https://github.com/NVIDIA/spark-rapids/issues/4789)|[FEA] Add Spark 3.1.4 shim| -|[#4646](https://github.com/NVIDIA/spark-rapids/issues/4646)|[FEA] Make JSON parsing of `NaN` and `Infinity` values fully compatible with Spark| -|[#4824](https://github.com/NVIDIA/spark-rapids/issues/4824)|[FEA] Support reading decimals from JSON and CSV| -|[#4814](https://github.com/NVIDIA/spark-rapids/issues/4814)|[FEA] Support element_at with non-literal index| -|[#4816](https://github.com/NVIDIA/spark-rapids/issues/4816)|[FEA] Support org.apache.spark.sql.catalyst.expressions.GetArrayStructFields| -|[#3542](https://github.com/NVIDIA/spark-rapids/issues/3542)|[FEA] Support str_to_map function| -|[#4721](https://github.com/NVIDIA/spark-rapids/issues/4721)|[FEA] Support regular expression delimiters for `str_to_map`| -|[#4791](https://github.com/NVIDIA/spark-rapids/issues/4791)|Update Spark 3.1.3 to be released| -|[#4712](https://github.com/NVIDIA/spark-rapids/issues/4712)|[FEA] Allow to partition on Decimal 128 when running on the GPU| -|[#4762](https://github.com/NVIDIA/spark-rapids/issues/4762)|[FEA] Improve support for reading JSON integer types| -|[#4696](https://github.com/NVIDIA/spark-rapids/issues/4696)|[FEA] Support casting map to string| -|[#1572](https://github.com/NVIDIA/spark-rapids/issues/1572)|[FEA] Add in decimal support for pmod, remainder and divide| -|[#4763](https://github.com/NVIDIA/spark-rapids/issues/4763)|[FEA] Improve support for reading JSON boolean types| -|[#4003](https://github.com/NVIDIA/spark-rapids/issues/4003)|[FEA] Add regular expression support to GPU implementation of StringSplit| -|[#4626](https://github.com/NVIDIA/spark-rapids/issues/4626)|[FEA] cannot run on GPU because unsupported data types in 'partitionSpec'| -|[#33](https://github.com/NVIDIA/spark-rapids/issues/33)|[FEA] hypot SQL function| -|[#4515](https://github.com/NVIDIA/spark-rapids/issues/4515)|[FEA] Set RMM async allocator as default| - -### Performance -||| -|:---|:---| -|[#3026](https://github.com/NVIDIA/spark-rapids/issues/3026)|[FEA] [Audit]: Set the list of read columns in the task configuration to reduce reading of ORC data| -|[#4895](https://github.com/NVIDIA/spark-rapids/issues/4895)|Add support for structs in GpuScalarSubquery | -|[#4393](https://github.com/NVIDIA/spark-rapids/issues/4393)|[BUG] Columnar to Columnar transfers are very slow| -|[#589](https://github.com/NVIDIA/spark-rapids/issues/589)|[FEA] Support ExistenceJoin| -|[#4784](https://github.com/NVIDIA/spark-rapids/issues/4784)|[FEA] Improve copying decimal data from CPU columnar data | -|[#4685](https://github.com/NVIDIA/spark-rapids/issues/4685)|[FEA] Avoid regexp cost in string_split for escaped characters| -|[#4777](https://github.com/NVIDIA/spark-rapids/issues/4777)|Remove input upcast in GpuExtractChunk32| -|[#4722](https://github.com/NVIDIA/spark-rapids/issues/4722)|Optimize DECIMAL128 average aggregations| -|[#4645](https://github.com/NVIDIA/spark-rapids/issues/4645)|[FEA] Investigate ASYNC allocator performance with additional queries| -|[#4539](https://github.com/NVIDIA/spark-rapids/issues/4539)|[FEA] semaphore optimization in shuffled hash join| -|[#2441](https://github.com/NVIDIA/spark-rapids/issues/2441)|[FEA] Use AST for filter in join APIs| - -### Bugs Fixed -||| -|:---|:---| -|[#5233](https://github.com/NVIDIA/spark-rapids/issues/5233)|[BUG] rapids-tools v22.04.0 release jar reports maven dependency issue : rapids-4-spark-common_2.12:jar:22.04.0 NOT FOUND| -|[#5183](https://github.com/NVIDIA/spark-rapids/issues/5183)|[BUG] UCX EGX integration test array_test.py::test_array_exists failures| -|[#5180](https://github.com/NVIDIA/spark-rapids/issues/5180)|[BUG] create_map failed with java.lang.IllegalStateException: This is not supported yet| -|[#5181](https://github.com/NVIDIA/spark-rapids/issues/5181)|[BUG] Dataproc tests failing when trying to detect for accelerated row conversions| -|[#5154](https://github.com/NVIDIA/spark-rapids/issues/5154)|[BUG] build failed in databricks 10.4 runtime (updated recently)| -|[#5159](https://github.com/NVIDIA/spark-rapids/issues/5159)|[BUG] Approx percentile query fails with UnsupportedOperationException| -|[#5164](https://github.com/NVIDIA/spark-rapids/issues/5164)|[BUG] Databricks 9.1ML failed with "java.lang.NoSuchMethodError: org.apache.spark.sql.execution.metric.SQLMetrics$.createSizeMetric"| -|[#5125](https://github.com/NVIDIA/spark-rapids/issues/5125)|[BUG] GpuCast.hasSideEffects does not check if child expression has side effects| -|[#5091](https://github.com/NVIDIA/spark-rapids/issues/5091)|[BUG] Profiling tool fails process custom task accumulators of type CollectionAccumulator| -|[#5050](https://github.com/NVIDIA/spark-rapids/issues/5050)|[BUG] Release build of v22.04.0 FAILED on "Execution attach-javadoc failed: NullPointerException" with maven option '-P source-javadoc'| -|[#5035](https://github.com/NVIDIA/spark-rapids/issues/5035)|[BUG] Different CSV parsing behavior between 22.04 and 22.02| -|[#5065](https://github.com/NVIDIA/spark-rapids/issues/5065)|[BUG] spark330+ build error due to SPARK-37463| -|[#5019](https://github.com/NVIDIA/spark-rapids/issues/5019)|[BUG] udf compiler failed to translate UDF in spark-shell | -|[#5048](https://github.com/NVIDIA/spark-rapids/issues/5048)|[BUG] OOM for q18 of TPC-DS benchmark testing on Spark2a| -|[#5038](https://github.com/NVIDIA/spark-rapids/issues/5038)|[BUG] When spark.rapids.sql.regexp.enabled is on in 22.04 snapshot jars, Reading a Delta table in Databricks may cause driver error| -|[#5023](https://github.com/NVIDIA/spark-rapids/issues/5023)|[BUG] When+sequence could trigger "Illegal sequence boundaries" error| -|[#5021](https://github.com/NVIDIA/spark-rapids/issues/5021)|[BUG] test_cache_reverse_order failed| -|[#5003](https://github.com/NVIDIA/spark-rapids/issues/5003)|[BUG] Cloudera 3.1.1 tests fail due to ClouderaShimVersion| -|[#4960](https://github.com/NVIDIA/spark-rapids/issues/4960)|[BUG] Spark 3.3 IT cache_test:test_passing_gpuExpr_as_Expr failure| -|[#4913](https://github.com/NVIDIA/spark-rapids/issues/4913)|[BUG] Fall back to the CPU if we see a scale on Ceil or Floor| -|[#4806](https://github.com/NVIDIA/spark-rapids/issues/4806)|[BUG] When running xgboost training, if PCBS is enabled, it fails with java.lang.AssertionError| -|[#4542](https://github.com/NVIDIA/spark-rapids/issues/4542)|[BUG] test_write_round_trip failed Maximum pool size exceeded | -|[#4911](https://github.com/NVIDIA/spark-rapids/issues/4911)|[BUG][Audit] [SPARK-38314] - Fail to read parquet files after writing the hidden file metadata| -|[#4936](https://github.com/NVIDIA/spark-rapids/issues/4936)|[BUG] databricks nightly window_function_test failures| -|[#4931](https://github.com/NVIDIA/spark-rapids/issues/4931)|[BUG] Spark 3.3 IT test cache_test.py::test_passing_gpuExpr_as_Expr fails with IllegalArgumentException| -|[#4710](https://github.com/NVIDIA/spark-rapids/issues/4710)|[BUG] cudaErrorIllegalAddress for q95 (3TB) on GCP with ASYNC allocator| -|[#4918](https://github.com/NVIDIA/spark-rapids/issues/4918)|[BUG] databricks nightly build failed| -|[#4826](https://github.com/NVIDIA/spark-rapids/issues/4826)|[BUG] cache_test failures when testing with 128-bit decimal| -|[#4855](https://github.com/NVIDIA/spark-rapids/issues/4855)|[BUG] Shim tests in sql-plugin module are not running| -|[#4487](https://github.com/NVIDIA/spark-rapids/issues/4487)|[BUG] regexp_find hangs with some patterns| -|[#4486](https://github.com/NVIDIA/spark-rapids/issues/4486)|[BUG] Regular expressions with hex digits not working as expected| -|[#4879](https://github.com/NVIDIA/spark-rapids/issues/4879)|[BUG] [SPARK-38237][SQL] ClusteredDistribution clustering keys break build with wrong arguments| -|[#4883](https://github.com/NVIDIA/spark-rapids/issues/4883)|[BUG] row-based_udf_test.py::test_hive_empty_* fail nightly tests| -|[#4876](https://github.com/NVIDIA/spark-rapids/issues/4876)|[BUG] Nightly build failed on Databricks with "pip: No such file or directory"| -|[#4739](https://github.com/NVIDIA/spark-rapids/issues/4739)|[BUG] Plugin will crash with query > 100 columns on pascal GPU| -|[#4840](https://github.com/NVIDIA/spark-rapids/issues/4840)|[BUG] test_dpp_via_aggregate_subquery_aqe_off failed with table already exists| -|[#4841](https://github.com/NVIDIA/spark-rapids/issues/4841)|[BUG] test_compress_write_round_trip failed on Spark 3.3| -|[#4668](https://github.com/NVIDIA/spark-rapids/issues/4668)|[FEA][Audit] - [SPARK-37750][SQL] ANSI mode: optionally return null result if element not exists in array/map| -|[#3971](https://github.com/NVIDIA/spark-rapids/issues/3971)|[BUG] udf-examples dependencies are incorrect| -|[#4022](https://github.com/NVIDIA/spark-rapids/issues/4022)|[BUG] Ensure shims.v2.ParquetCachedBatchSerializer and similar classes are at most package-private | -|[#4526](https://github.com/NVIDIA/spark-rapids/issues/4526)|[BUG] Short circuit AND/OR in ANSI mode| -|[#4787](https://github.com/NVIDIA/spark-rapids/issues/4787)|[BUG] Dataproc notebook IT test failure - NoSuchMethodError: org.apache.spark.network.util.ByteUnit.toBytes| -|[#4704](https://github.com/NVIDIA/spark-rapids/issues/4704)|[BUG] Update the premerge and nightly tests after moving the UDF example to external repository| -|[#4795](https://github.com/NVIDIA/spark-rapids/issues/4795)|[BUG] Read ORC does not ignoreCorruptFiles| -|[#4802](https://github.com/NVIDIA/spark-rapids/issues/4802)|[BUG] GPU CSV read does not honor ignoreCorruptFiles or ignoreMissingFiles| -|[#4803](https://github.com/NVIDIA/spark-rapids/issues/4803)|[BUG] GPU JSON read does not honor ignoreCorruptFiles or ignoreMissingFiles| -|[#1986](https://github.com/NVIDIA/spark-rapids/issues/1986)|[BUG] CSV reading null inconsistent between spark.rapids.sql.format.csv.enabled=true&false| -|[#126](https://github.com/NVIDIA/spark-rapids/issues/126)|[BUG] CSV parsing large number values overflow| -|[#4759](https://github.com/NVIDIA/spark-rapids/issues/4759)|[BUG] Profiling tool can miss datasources when they are GPU reads| -|[#4798](https://github.com/NVIDIA/spark-rapids/issues/4798)|[BUG] Integration test builds failing with worker_id not found| -|[#4727](https://github.com/NVIDIA/spark-rapids/issues/4727)|[BUG] Read Parquet does not ignoreCorruptFiles| -|[#4744](https://github.com/NVIDIA/spark-rapids/issues/4744)|[BUG] test_groupby_std_variance_partial_replace_fallback failed| -|[#4761](https://github.com/NVIDIA/spark-rapids/issues/4761)|[BUG] test_simple_partitioned_read failed on Spark 3.3| -|[#2071](https://github.com/NVIDIA/spark-rapids/issues/2071)|[BUG] parsing invalid boolean CSV values return true instead of null| -|[#4749](https://github.com/NVIDIA/spark-rapids/issues/4749)|[BUG] test_write_empty_parquet_round_trip failed| -|[#4730](https://github.com/NVIDIA/spark-rapids/issues/4730)|[BUG] python UDF tests are leaking| -|[#4290](https://github.com/NVIDIA/spark-rapids/issues/4290)|[BUG] Investigate q32 and q67 for decimals potential regression| -|[#4409](https://github.com/NVIDIA/spark-rapids/issues/4409)|[BUG] Possible race condition in regular expression support for octal digits| -|[#4728](https://github.com/NVIDIA/spark-rapids/issues/4728)|[BUG] test_mixed_compress_read orc_test.py failures| -|[#4736](https://github.com/NVIDIA/spark-rapids/issues/4736)|[BUG] buildall --profile=321 fails on missing spark301 rapids-4-spark-sql dependency | -|[#4702](https://github.com/NVIDIA/spark-rapids/issues/4702)|[BUG] cache_test.py failed w/ cache.serializer in spark 3.3.0| -|[#4031](https://github.com/NVIDIA/spark-rapids/issues/4031)|[BUG] Spark 3.3.0 test failure: NoSuchMethodError org.apache.orc.TypeDescription.getAttributeValue| -|[#4664](https://github.com/NVIDIA/spark-rapids/issues/4664)|[BUG] MortgageAdaptiveSparkSuite failed with duplicate buffer exception| -|[#4564](https://github.com/NVIDIA/spark-rapids/issues/4564)|[BUG] map_test ansi failed in spark330| -|[#119](https://github.com/NVIDIA/spark-rapids/issues/119)|[BUG] LIKE does not work if null chars are in the string| -|[#124](https://github.com/NVIDIA/spark-rapids/issues/124)|[BUG] CSV/JSON Parsing some float values results in overflow| -|[#4045](https://github.com/NVIDIA/spark-rapids/issues/4045)|[BUG] q93 failed in this week's NDS runs| -|[#4488](https://github.com/NVIDIA/spark-rapids/issues/4488)|[BUG] isCastingStringToNegDecimalScaleSupported seems set wrong for some Spark versions| - -### PRs -||| -|:---|:---| -|[#5251](https://github.com/NVIDIA/spark-rapids/pull/5251)|Update 22.04 changelog to latest [skip ci]| -|[#5232](https://github.com/NVIDIA/spark-rapids/pull/5232)|Fix issue in GpuArrayExists where a parent view outlived the child| -|[#5239](https://github.com/NVIDIA/spark-rapids/pull/5239)|Fix tools depending on the common jar| -|[#5205](https://github.com/NVIDIA/spark-rapids/pull/5205)|Update 22.04 changelog to latest [skip ci]| -|[#5190](https://github.com/NVIDIA/spark-rapids/pull/5190)|Fix column->row conversion GPU check:| -|[#5184](https://github.com/NVIDIA/spark-rapids/pull/5184)|Fix CPU fallback for Map lookup| -|[#5191](https://github.com/NVIDIA/spark-rapids/pull/5191)|Update version-def to use released cudfjni 22.04.0 [skip ci]| -|[#5167](https://github.com/NVIDIA/spark-rapids/pull/5167)|Update cudfjni version to released 22.04.0| -|[#5169](https://github.com/NVIDIA/spark-rapids/pull/5169)|Terminate test earlier if pytest ENV issue [skip ci]| -|[#5160](https://github.com/NVIDIA/spark-rapids/pull/5160)|Fix approximate percentile reduction UnsupportedOperationException| -|[#5165](https://github.com/NVIDIA/spark-rapids/pull/5165)|Update Databricks 10.4 for changes to the QueryStageExec and ClusteredDistribution| -|[#4997](https://github.com/NVIDIA/spark-rapids/pull/4997)|Update docs for the 22.04 release[skip ci]| -|[#5146](https://github.com/NVIDIA/spark-rapids/pull/5146)|Support env var INTEGRATION_TEST_VERSION to override shim version| -|[#5103](https://github.com/NVIDIA/spark-rapids/pull/5103)|Init 22.04 changelog [skip ci]| -|[#5122](https://github.com/NVIDIA/spark-rapids/pull/5122)|Disable GPU accelerated row-column transpose for Pascal GPUs:| -|[#5127](https://github.com/NVIDIA/spark-rapids/pull/5127)|GpuCast.hasSideEffects now checks to see if the child expression has side-effects| -|[#5118](https://github.com/NVIDIA/spark-rapids/pull/5118)|On task failure catch some CUDA exceptions and kill executor| -|[#5069](https://github.com/NVIDIA/spark-rapids/pull/5069)|Update for the public release [skip ci]| -|[#5097](https://github.com/NVIDIA/spark-rapids/pull/5097)|Implement hasSideEffects for GpuGetArrayItem, GpuElementAt, GpuGetMapValue, GpuUnaryMinus, and GpuAbs| -|[#5079](https://github.com/NVIDIA/spark-rapids/pull/5079)|Disable spark snapshot shims pre-merge build in 22.04| -|[#5094](https://github.com/NVIDIA/spark-rapids/pull/5094)|Fix profiling tool reading collectionAccumulator| -|[#5078](https://github.com/NVIDIA/spark-rapids/pull/5078)|Disable JSON and CSV floating-point reads by default| -|[#4961](https://github.com/NVIDIA/spark-rapids/pull/4961)|Support approx_percentile in reduction context| -|[#5062](https://github.com/NVIDIA/spark-rapids/pull/5062)|Update Spark 2.x explain API with changes in 22.04| -|[#5066](https://github.com/NVIDIA/spark-rapids/pull/5066)|Add getOrcSchemaString for OrcShims| -|[#5030](https://github.com/NVIDIA/spark-rapids/pull/5030)|Fix regression from 21.12 where udfs defined in repl no longer worked| -|[#5051](https://github.com/NVIDIA/spark-rapids/pull/5051)|Revert "Replace ParquetFileReader.readFooter with open() and getFooter "| -|[#5052](https://github.com/NVIDIA/spark-rapids/pull/5052)|Work around incompatibility between Databricks Delta loads and GpuRegExpExtract| -|[#4972](https://github.com/NVIDIA/spark-rapids/pull/4972)|Add support for ORC forced positional evolution| -|[#5042](https://github.com/NVIDIA/spark-rapids/pull/5042)|Implement hasSideEffects for GpuSequence| -|[#5040](https://github.com/NVIDIA/spark-rapids/pull/5040)|Fix missing imports for 321db shim| -|[#5033](https://github.com/NVIDIA/spark-rapids/pull/5033)|Removed limit from the test| -|[#4938](https://github.com/NVIDIA/spark-rapids/pull/4938)|Improve compatibility when reading timestamps from JSON and CSV sources| -|[#5026](https://github.com/NVIDIA/spark-rapids/pull/5026)|Update RoCE doc URL [skip ci]| -|[#4976](https://github.com/NVIDIA/spark-rapids/pull/4976)|Replace ParquetFileReader.readFooter with open() and getFooter| -|[#4989](https://github.com/NVIDIA/spark-rapids/pull/4989)|Use conf.useCompression config to decide if we should be compressing the cache| -|[#4956](https://github.com/NVIDIA/spark-rapids/pull/4956)|Add avro reader support| -|[#5009](https://github.com/NVIDIA/spark-rapids/pull/5009)|Remove references of `shims` folder in docs [skip ci]| -|[#5004](https://github.com/NVIDIA/spark-rapids/pull/5004)|Add ClouderaShimVersion to unshimmed files| -|[#4971](https://github.com/NVIDIA/spark-rapids/pull/4971)|Fall back to the CPU for non-zero scale on Ceil or Floor functions| -|[#4996](https://github.com/NVIDIA/spark-rapids/pull/4996)|Fix collect_set on struct type| -|[#4998](https://github.com/NVIDIA/spark-rapids/pull/4998)|Added the id back for struct children to make them unique| -|[#4995](https://github.com/NVIDIA/spark-rapids/pull/4995)|Include 321db shim in distribution build [skip ci]| -|[#4981](https://github.com/NVIDIA/spark-rapids/pull/4981)|Update doc for CSV reading interval| -|[#4973](https://github.com/NVIDIA/spark-rapids/pull/4973)|Implement support for ArrayExists expression| -|[#4988](https://github.com/NVIDIA/spark-rapids/pull/4988)|Remove support for Spark 3.0.x| -|[#4955](https://github.com/NVIDIA/spark-rapids/pull/4955)|Add UDT support to ParquetCachedBatchSerializer (CPU)| -|[#4994](https://github.com/NVIDIA/spark-rapids/pull/4994)|Add databricks 10.4 build in pre-merge| -|[#4990](https://github.com/NVIDIA/spark-rapids/pull/4990)|Remove 30X permerge support for version 22.04 and above [skip ci]| -|[#4958](https://github.com/NVIDIA/spark-rapids/pull/4958)|Add independent mvn verify check [skip ci]| -|[#4933](https://github.com/NVIDIA/spark-rapids/pull/4933)|Set OrcConf.INCLUDE_COLUMNS for ORC reading| -|[#4944](https://github.com/NVIDIA/spark-rapids/pull/4944)|Support for non-string key-types for `GetMapValue` and `element_at()`| -|[#4974](https://github.com/NVIDIA/spark-rapids/pull/4974)|Add shim for Databricks 10.4| -|[#4907](https://github.com/NVIDIA/spark-rapids/pull/4907)|Add markdown check action| -|[#4977](https://github.com/NVIDIA/spark-rapids/pull/4977)|Add missing 314 to buildall script| -|[#4927](https://github.com/NVIDIA/spark-rapids/pull/4927)|Support reading ANSI day time interval type from CSV source| -|[#4965](https://github.com/NVIDIA/spark-rapids/pull/4965)|Documentation: add example python api call for ExplainPlan.explainPotentialGpuPlan [skip ci]| -|[#4957](https://github.com/NVIDIA/spark-rapids/pull/4957)|Document agg pushdown on ORC file limitation [skip ci]| -|[#4946](https://github.com/NVIDIA/spark-rapids/pull/4946)|Support predictors on ANSI day time interval type| -|[#4952](https://github.com/NVIDIA/spark-rapids/pull/4952)|Have a fixed GPU memory size for integration tests| -|[#4954](https://github.com/NVIDIA/spark-rapids/pull/4954)|Fix of failing to read parquet files after writing the hidden file metadata in| -|[#4953](https://github.com/NVIDIA/spark-rapids/pull/4953)|Add Decimal 128 as a supported type in partition by for databricks running window| -|[#4941](https://github.com/NVIDIA/spark-rapids/pull/4941)|Use new list reduction API to improve performance| -|[#4926](https://github.com/NVIDIA/spark-rapids/pull/4926)|Support `DayTimeIntervalType` in `ParquetCachedBatchSerializer`| -|[#4947](https://github.com/NVIDIA/spark-rapids/pull/4947)|Fallback to ARENA if ASYNC configured and driver < 11.5.0| -|[#4934](https://github.com/NVIDIA/spark-rapids/pull/4934)|Replace MetadataAttribute with FileSourceMetadataAttribute to follow the update in Spark for 3.3.0+| -|[#4942](https://github.com/NVIDIA/spark-rapids/pull/4942)|Fix window rank integration tests on| -|[#4928](https://github.com/NVIDIA/spark-rapids/pull/4928)|Disable regular expressions on GPU by default| -|[#4923](https://github.com/NVIDIA/spark-rapids/pull/4923)|Support GpuScalarSubquery on nested types| -|[#4924](https://github.com/NVIDIA/spark-rapids/pull/4924)|Implement `percent_rank()` on GPU| -|[#4853](https://github.com/NVIDIA/spark-rapids/pull/4853)|Improve date support in JSON and CSV readers| -|[#4930](https://github.com/NVIDIA/spark-rapids/pull/4930)|Add in support for sorting arrays with structs in sort_array| -|[#4861](https://github.com/NVIDIA/spark-rapids/pull/4861)|Add Apache Spark 3.1.4-SNAPSHOT Shims| -|[#4925](https://github.com/NVIDIA/spark-rapids/pull/4925)|Remove unused Spark322PlusShims| -|[#4921](https://github.com/NVIDIA/spark-rapids/pull/4921)|Add DatabricksShimVersion to unshimmed class list| -|[#4917](https://github.com/NVIDIA/spark-rapids/pull/4917)|Default some configs to protect against cluster settings in integration tests| -|[#4922](https://github.com/NVIDIA/spark-rapids/pull/4922)|Add support for decimal 128 for db and spark 320+| -|[#4919](https://github.com/NVIDIA/spark-rapids/pull/4919)|Case-insensitive PR title check [skip ci]| -|[#4796](https://github.com/NVIDIA/spark-rapids/pull/4796)|Implement ExistenceJoin Iterator using an auxiliary left semijoin | -|[#4857](https://github.com/NVIDIA/spark-rapids/pull/4857)|Transition to v2 shims [Databricks]| -|[#4899](https://github.com/NVIDIA/spark-rapids/pull/4899)|Fixed Decimal 128 bug in ParquetCachedBatchSerializer| -|[#4810](https://github.com/NVIDIA/spark-rapids/pull/4810)|Support ANSI intervals to/from Parquet| -|[#4909](https://github.com/NVIDIA/spark-rapids/pull/4909)|Make ARENA the default allocator for 22.04| -|[#4856](https://github.com/NVIDIA/spark-rapids/pull/4856)|Enable shim tests in sql-plugin module| -|[#4880](https://github.com/NVIDIA/spark-rapids/pull/4880)|Bump hadoop-client dependency to 3.1.4| -|[#4825](https://github.com/NVIDIA/spark-rapids/pull/4825)|Initial support for reading decimal types from JSON and CSV| -|[#4859](https://github.com/NVIDIA/spark-rapids/pull/4859)|Fallback to CPU when Spark pushes down Aggregates (Min/Max/Count) for ORC| -|[#4872](https://github.com/NVIDIA/spark-rapids/pull/4872)|Speed up copying decimal column from parquet buffer to GPU buffer| -|[#4904](https://github.com/NVIDIA/spark-rapids/pull/4904)|Relocate Hive UDF Classes| -|[#4871](https://github.com/NVIDIA/spark-rapids/pull/4871)|Minor changes to print revision differences when building shims| -|[#4882](https://github.com/NVIDIA/spark-rapids/pull/4882)|Disable write/read Parquet when Parquet field IDs are used| -|[#4858](https://github.com/NVIDIA/spark-rapids/pull/4858)|Support non-literal index for `GpuElementAt` and `GpuGetArrayItem`| -|[#4875](https://github.com/NVIDIA/spark-rapids/pull/4875)|Support running `GetArrayStructFields` on GPU| -|[#4885](https://github.com/NVIDIA/spark-rapids/pull/4885)|Enable fuzz testing for Regular Expression repetitions and move remaining edge cases to CPU| -|[#4869](https://github.com/NVIDIA/spark-rapids/pull/4869)|Support for hexadecimal digits in regular expressions on the GPU| -|[#4854](https://github.com/NVIDIA/spark-rapids/pull/4854)|Avoid regexp_cost with stringSplit on the GPU using transpilation| -|[#4888](https://github.com/NVIDIA/spark-rapids/pull/4888)|Clean up leak detection code| -|[#4901](https://github.com/NVIDIA/spark-rapids/pull/4901)|fix a broken link in CONTRIBUTING.md[skip ci]| -|[#4891](https://github.com/NVIDIA/spark-rapids/pull/4891)|update getting started doc because aws-emr 6.5.0 released[skip ci]| -|[#4881](https://github.com/NVIDIA/spark-rapids/pull/4881)|Fix compilation error caused by ClusteredDistribution parameters| -|[#4890](https://github.com/NVIDIA/spark-rapids/pull/4890)|Integration-test tests jar for hive UDF tests| -|[#4878](https://github.com/NVIDIA/spark-rapids/pull/4878)|Set conda/mamba default to Python version to 3.8 [skip ci]| -|[#4874](https://github.com/NVIDIA/spark-rapids/pull/4874)|Fix spark-tests syntax issue [skip ci]| -|[#4850](https://github.com/NVIDIA/spark-rapids/pull/4850)|Also check cuda runtime version when using the ASYNC allocator| -|[#4851](https://github.com/NVIDIA/spark-rapids/pull/4851)|Add worker ID to temporary table names in tests| -|[#4847](https://github.com/NVIDIA/spark-rapids/pull/4847)|Fix test_compress_write_round_trip failure on Spark 3.3| -|[#4848](https://github.com/NVIDIA/spark-rapids/pull/4848)|Profile tool: fix printing of task failed reason| -|[#4636](https://github.com/NVIDIA/spark-rapids/pull/4636)|Support `str_to_map`| -|[#4835](https://github.com/NVIDIA/spark-rapids/pull/4835)|Trim parquet_write_test to reduce integration test runtime| -|[#4819](https://github.com/NVIDIA/spark-rapids/pull/4819)|Throw exception if casting from double to datetime | -|[#4838](https://github.com/NVIDIA/spark-rapids/pull/4838)|Trim cache tests to improve integration test time| -|[#4839](https://github.com/NVIDIA/spark-rapids/pull/4839)|Optionally return null if element not exists map/array| -|[#4822](https://github.com/NVIDIA/spark-rapids/pull/4822)|Push decimal workarounds to cuDF| -|[#4619](https://github.com/NVIDIA/spark-rapids/pull/4619)|Move the udf-examples module to the external repository spark-rapids-examples| -|[#4844](https://github.com/NVIDIA/spark-rapids/pull/4844)|Update spark313 dep to released one| -|[#4827](https://github.com/NVIDIA/spark-rapids/pull/4827)|Make InternalExclusiveModeGpuDiscoveryPlugin and ExplainPlanImpl as protected class.| -|[#4836](https://github.com/NVIDIA/spark-rapids/pull/4836)|Support WindowExec partitioning by Decimal 128 on the GPU| -|[#4760](https://github.com/NVIDIA/spark-rapids/pull/4760)|Short circuit AND/OR in ANSI mode| -|[#4829](https://github.com/NVIDIA/spark-rapids/pull/4829)|Make bloopInstall version configurable in buildall| -|[#4823](https://github.com/NVIDIA/spark-rapids/pull/4823)|Reduce redundancy of decimal testing| -|[#4715](https://github.com/NVIDIA/spark-rapids/pull/4715)|Patterns such (3?)+ should now fall back to CPU| -|[#4809](https://github.com/NVIDIA/spark-rapids/pull/4809)|Add ignoreCorruptFiles for ORC readers| -|[#4790](https://github.com/NVIDIA/spark-rapids/pull/4790)|Improve JSON and CSV parsing of integer values| -|[#4812](https://github.com/NVIDIA/spark-rapids/pull/4812)|Default integration test configs to allow negative decimal scale| -|[#4805](https://github.com/NVIDIA/spark-rapids/pull/4805)|Avoid output cast by using unsigned type output for GpuExtractChunk32| -|[#4804](https://github.com/NVIDIA/spark-rapids/pull/4804)|Profiling tool can miss datasources when they are GPU reads| -|[#4797](https://github.com/NVIDIA/spark-rapids/pull/4797)|Do not check for metadata during schema comparison| -|[#4785](https://github.com/NVIDIA/spark-rapids/pull/4785)|Support casting Map to String| -|[#4794](https://github.com/NVIDIA/spark-rapids/pull/4794)|Decimal-128 support for mod and pmod| -|[#4799](https://github.com/NVIDIA/spark-rapids/pull/4799)|Fix failure to generate worker_id when xdist is not present| -|[#4742](https://github.com/NVIDIA/spark-rapids/pull/4742)|Add ignoreCorruptFiles feature for Parquet reader| -|[#4792](https://github.com/NVIDIA/spark-rapids/pull/4792)|Ensure GpuM2 merge aggregation does not produce a null mean or m2| -|[#4770](https://github.com/NVIDIA/spark-rapids/pull/4770)|Improve columnarCopy for HostColumnarToGpu| -|[#4776](https://github.com/NVIDIA/spark-rapids/pull/4776)|Improve aggregation performance of average on DECIMAL128 columns| -|[#4786](https://github.com/NVIDIA/spark-rapids/pull/4786)|Add shims to compare ORC TypeDescription| -|[#4780](https://github.com/NVIDIA/spark-rapids/pull/4780)|Improve JSON and CSV support for boolean values| -|[#4778](https://github.com/NVIDIA/spark-rapids/pull/4778)|Decrease chance of random collisions in test temporary paths| -|[#4782](https://github.com/NVIDIA/spark-rapids/pull/4782)|Check in host leak detection code| -|[#4781](https://github.com/NVIDIA/spark-rapids/pull/4781)|Add Spark properties table to profiling tool output| -|[#4714](https://github.com/NVIDIA/spark-rapids/pull/4714)|Add regular expression support to string_split| -|[#4754](https://github.com/NVIDIA/spark-rapids/pull/4754)|Close SpillableBatch to avoid leaks| -|[#4758](https://github.com/NVIDIA/spark-rapids/pull/4758)|Fix merge conflict with branch-22.02 [skip ci]| -|[#4694](https://github.com/NVIDIA/spark-rapids/pull/4694)|Add clarifications and details to integration-tests README [skip ci]| -|[#4740](https://github.com/NVIDIA/spark-rapids/pull/4740)|Enable regular expressions on GPU by default| -|[#4735](https://github.com/NVIDIA/spark-rapids/pull/4735)|Re-enables partial regex support for octal digits on the GPU| -|[#4737](https://github.com/NVIDIA/spark-rapids/pull/4737)|Check for a null compression codec when creating ORC OutStream| -|[#4738](https://github.com/NVIDIA/spark-rapids/pull/4738)|Change resume-from to aggregator in buildall [skip ci]| -|[#4698](https://github.com/NVIDIA/spark-rapids/pull/4698)|Add tests for few json options| -|[#4731](https://github.com/NVIDIA/spark-rapids/pull/4731)|Trim join tests to improve runtime of tests| -|[#4732](https://github.com/NVIDIA/spark-rapids/pull/4732)|Fix failing serializer tests on Spark 3.3.0| -|[#4709](https://github.com/NVIDIA/spark-rapids/pull/4709)|Update centos 8 dockerfile to handle EOL issue [skip ci]| -|[#4724](https://github.com/NVIDIA/spark-rapids/pull/4724)|Debug dump to Parquet support for DECIMAL128 columns| -|[#4688](https://github.com/NVIDIA/spark-rapids/pull/4688)|Optimize DECIMAL128 sum aggregations| -|[#4692](https://github.com/NVIDIA/spark-rapids/pull/4692)|Add FAQ entry to discuss executor task concurrency configuration [skip ci]| -|[#4588](https://github.com/NVIDIA/spark-rapids/pull/4588)|Optimize semaphore acquisition in GpuShuffledHashJoinExec| -|[#4697](https://github.com/NVIDIA/spark-rapids/pull/4697)|Add preliminary test and test framework changes for ExistanceJoin| -|[#4716](https://github.com/NVIDIA/spark-rapids/pull/4716)|`GpuStringSplit` should return an array on not-null elements| -|[#4611](https://github.com/NVIDIA/spark-rapids/pull/4611)|Support BitLength and OctetLength| -|[#4408](https://github.com/NVIDIA/spark-rapids/pull/4408)|Use the ORC version that corresponds to the Spark version| -|[#4686](https://github.com/NVIDIA/spark-rapids/pull/4686)|Fall back to CPU for queries referencing hidden metadata columns| -|[#4669](https://github.com/NVIDIA/spark-rapids/pull/4669)|Prevent deadlock between RapidsBufferStore and RapidsBufferBase on close| -|[#4707](https://github.com/NVIDIA/spark-rapids/pull/4707)|Fix auto merge conflict 4705 [skip ci]| -|[#4690](https://github.com/NVIDIA/spark-rapids/pull/4690)|Fix map_test ANSI failure in Spark 3.3.0| -|[#4681](https://github.com/NVIDIA/spark-rapids/pull/4681)|Reimplement check for non-regexp strings using RegexParser| -|[#4683](https://github.com/NVIDIA/spark-rapids/pull/4683)|Fix documentation link, clarify documentation [skip ci]| -|[#4677](https://github.com/NVIDIA/spark-rapids/pull/4677)|Make Collect, first and last as deterministic aggregate functions for Spark-3.3| -|[#4682](https://github.com/NVIDIA/spark-rapids/pull/4682)|Enable test for LIKE with embedded null character| -|[#4673](https://github.com/NVIDIA/spark-rapids/pull/4673)|Allow GpuWindowExec to partition on structs| -|[#4637](https://github.com/NVIDIA/spark-rapids/pull/4637)|Improve support for reading CSV and JSON floating-point values| -|[#4629](https://github.com/NVIDIA/spark-rapids/pull/4629)|Remove shims module| -|[#4648](https://github.com/NVIDIA/spark-rapids/pull/4648)|Append new authorized user to blossom-ci safelist| -|[#4623](https://github.com/NVIDIA/spark-rapids/pull/4623)|Fallback to CPU when aggregate push down used for parquet| -|[#4606](https://github.com/NVIDIA/spark-rapids/pull/4606)|Set default RMM pool to ASYNC for cuda 11.2+| -|[#4531](https://github.com/NVIDIA/spark-rapids/pull/4531)|Use libcudf mixed joins for conditional hash semi and anti joins| -|[#4624](https://github.com/NVIDIA/spark-rapids/pull/4624)|Enable integration test results report on Jenkins [skip ci]| -|[#4597](https://github.com/NVIDIA/spark-rapids/pull/4597)|Update plugin version to 22.04.0-SNAPSHOT| -|[#4592](https://github.com/NVIDIA/spark-rapids/pull/4592)|Adds SQL function HYPOT using the GPU| -|[#4504](https://github.com/NVIDIA/spark-rapids/pull/4504)|Implement AST-based regular expression fuzz tests| -|[#4560](https://github.com/NVIDIA/spark-rapids/pull/4560)|Make shims.v2.ParquetCachedBatchSerializer as protected| - -## Release 22.02 - -### Features -||| -|:---|:---| -|[#4305](https://github.com/NVIDIA/spark-rapids/issues/4305)|[FEA] write nvidia tool wrappers to allow old YARN versions to work with MIG| -|[#4410](https://github.com/NVIDIA/spark-rapids/issues/4410)|[FEA] ReplicateRows - Support ReplicateRows for decimal 128 type| -|[#4360](https://github.com/NVIDIA/spark-rapids/issues/4360)|[FEA] Add explain api for Spark 2.X| -|[#3541](https://github.com/NVIDIA/spark-rapids/issues/3541)|[FEA] Support max on single-level struct in aggregation context| -|[#4238](https://github.com/NVIDIA/spark-rapids/issues/4238)|[FEA] Add a Spark 3.X Explain only mode to the plugin| -|[#3952](https://github.com/NVIDIA/spark-rapids/issues/3952)|[Audit] [FEA][SPARK-32986][SQL] Add bucketed scan info in query plan of data source v1| -|[#4412](https://github.com/NVIDIA/spark-rapids/issues/4412)|[FEA] Improve support for \A, \Z, and \z in regular expressions| -|[#3979](https://github.com/NVIDIA/spark-rapids/issues/3979)|[FEA] Improvements for CPU(Row) based UDF| -|[#4467](https://github.com/NVIDIA/spark-rapids/issues/4467)|[FEA] Add support for regular expression with repeated digits (`\d+`, `\d*`, `\d?`)| -|[#4439](https://github.com/NVIDIA/spark-rapids/issues/4439)|[FEA] Enable GPU broadcast exchange reuse for DPP when AQE enabled| -|[#3512](https://github.com/NVIDIA/spark-rapids/issues/3512)|[FEA] Support org.apache.spark.sql.catalyst.expressions.Sequence| -|[#3475](https://github.com/NVIDIA/spark-rapids/issues/3475)|[FEA] Spark 3.2.0 reads Parquet unsigned int64(UINT64) as Decimal(20,0) but CUDF does not support it | -|[#4091](https://github.com/NVIDIA/spark-rapids/issues/4091)|[FEA] regexp_replace: Improve support for ^ and $| -|[#4104](https://github.com/NVIDIA/spark-rapids/issues/4104)|[FEA] Support org.apache.spark.sql.catalyst.expressions.ReplicateRows| -|[#4027](https://github.com/NVIDIA/spark-rapids/issues/4027)|[FEA] Support SubqueryBroadcast on GPU to enable exchange reuse during DPP| -|[#4284](https://github.com/NVIDIA/spark-rapids/issues/4284)|[FEA] Support idx = 0 in GpuRegExpExtract| -|[#4002](https://github.com/NVIDIA/spark-rapids/issues/4002)|[FEA] Implement regexp_extract on GPU| -|[#3221](https://github.com/NVIDIA/spark-rapids/issues/3221)|[FEA] Support GpuFirst and GpuLast on nested types under reduction aggregations| -|[#3944](https://github.com/NVIDIA/spark-rapids/issues/3944)|[FEA] Full support for sum with overflow on Decimal 128| -|[#4028](https://github.com/NVIDIA/spark-rapids/issues/4028)|[FEA] support GpuCast from non-nested ArrayType to StringType| -|[#3250](https://github.com/NVIDIA/spark-rapids/issues/3250)|[FEA] Make CreateMap duplicate key handling compatible with Spark and enable CreateMap by default| -|[#4170](https://github.com/NVIDIA/spark-rapids/issues/4170)|[FEA] Make regular expression behavior with `$` and `\r` consistent with CPU| -|[#4001](https://github.com/NVIDIA/spark-rapids/issues/4001)|[FEA] Add regexp support to regexp_replace| -|[#3962](https://github.com/NVIDIA/spark-rapids/issues/3962)|[FEA] Support null characters in regular expressions in RLIKE| -|[#3797](https://github.com/NVIDIA/spark-rapids/issues/3797)|[FEA] Make RLike support consistent with Apache Spark| - -### Performance -||| -|:---|:---| -|[#4392](https://github.com/NVIDIA/spark-rapids/issues/4392)|[FEA] could the parquet scan code avoid acquiring the semaphore for an empty batch?| -|[#679](https://github.com/NVIDIA/spark-rapids/issues/679)|[FEA] move some deserialization code out of the scope of the gpu-semaphore to increase cpu concurrent| -|[#4350](https://github.com/NVIDIA/spark-rapids/issues/4350)|[FEA] Optimize the all-true and all-false cases in GPU `If` and `CaseWhen` | -|[#4309](https://github.com/NVIDIA/spark-rapids/issues/4309)|[FEA] Leverage cudf conditional nested loop join to implement semi/anti hash join with condition| -|[#4395](https://github.com/NVIDIA/spark-rapids/issues/4395)|[FEA] acquire the semaphore after concatToHost in GpuShuffleCoalesceIterator| -|[#4134](https://github.com/NVIDIA/spark-rapids/issues/4134)|[FEA] Allow `EliminateJoinToEmptyRelation` in `GpuBroadcastExchangeExec` | -|[#4189](https://github.com/NVIDIA/spark-rapids/issues/4189)|[FEA] understand why between is so expensive| - -### Bugs Fixed -||| -|:---|:---| -|[#4316](https://github.com/NVIDIA/spark-rapids/issues/4316)|[BUG] Exception: Unable to find py4j, your SPARK_HOME may not be configured correctly intermittently| -|[#4725](https://github.com/NVIDIA/spark-rapids/issues/4725)|[DOC] Broken links in guide doc| -|[#4675](https://github.com/NVIDIA/spark-rapids/issues/4675)|[BUG] Jenkins integration build timed out at 10 hours| -|[#4665](https://github.com/NVIDIA/spark-rapids/issues/4665)|[BUG] Spark321Shims.getParquetFilters failed with NoSuchMethodError| -|[#4635](https://github.com/NVIDIA/spark-rapids/issues/4635)|[BUG] nvidia-smi wrapper script ignores ENABLE_NON_MIG_GPUS=1 on a heterogeneous multi-GPU machine| -|[#4500](https://github.com/NVIDIA/spark-rapids/issues/4500)|[BUG] Build failures against Spark 3.2.1 rc1 and make 3.2.1 non snapshot| -|[#4631](https://github.com/NVIDIA/spark-rapids/issues/4631)|[BUG] Release build with mvn option `-P source-javadoc` FAILED| -|[#4625](https://github.com/NVIDIA/spark-rapids/issues/4625)|[BUG] NDS query 5 fails with AdaptiveSparkPlanExec assertion| -|[#4632](https://github.com/NVIDIA/spark-rapids/issues/4632)|[BUG] Build failing for Spark 3.3.0 due to deprecated method warnings| -|[#4599](https://github.com/NVIDIA/spark-rapids/issues/4599)|[BUG] test_group_apply_udf and test_group_apply_udf_more_types hangs on Databricks 9.1| -|[#4600](https://github.com/NVIDIA/spark-rapids/issues/4600)|[BUG] crash if we have a decimal128 in a struct in an array | -|[#4581](https://github.com/NVIDIA/spark-rapids/issues/4581)|[BUG] Build error "GpuOverrides.scala:924: wrong number of arguments" on DB9.1.x spark-3.1.2 | -|[#4593](https://github.com/NVIDIA/spark-rapids/issues/4593)|[BUG] dup GpuHashJoin.diff case-folding issue| -|[#4559](https://github.com/NVIDIA/spark-rapids/issues/4559)|[BUG] regexp_replace with replacement string containing `\` can produce incorrect results| -|[#4503](https://github.com/NVIDIA/spark-rapids/issues/4503)|[BUG] regexp_replace with back references produces incorrect results on GPU| -|[#4567](https://github.com/NVIDIA/spark-rapids/issues/4567)|[BUG] Profile tool hangs in compare mode| -|[#4315](https://github.com/NVIDIA/spark-rapids/issues/4315)|[BUG] test_hash_reduction_decimal_overflow_sum[30] failed OOM in integration tests| -|[#4551](https://github.com/NVIDIA/spark-rapids/issues/4551)|[BUG] protobuf-java version changed to 3.x| -|[#4499](https://github.com/NVIDIA/spark-rapids/issues/4499)|[BUG]GpuSequence blows up when nulls exist in any of the inputs (start, stop, step)| -|[#4454](https://github.com/NVIDIA/spark-rapids/issues/4454)|[BUG] Shade warnings when building the tools artifact| -|[#4541](https://github.com/NVIDIA/spark-rapids/issues/4541)|[BUG] Column vector leak in conditionals_test.py| -|[#4514](https://github.com/NVIDIA/spark-rapids/issues/4514)|[BUG] test_hash_reduction_pivot_without_nans failed| -|[#4521](https://github.com/NVIDIA/spark-rapids/issues/4521)|[BUG] Inconsistencies in handling of newline characters and string and line anchors| -|[#4548](https://github.com/NVIDIA/spark-rapids/issues/4548)|[BUG] ai.rapids.cudf.CudaException: an illegal instruction was encountered in databricks 9.1| -|[#4475](https://github.com/NVIDIA/spark-rapids/issues/4475)|[BUG] `\D` and `\W` match newline in Spark but not in cuDF| -|[#1866](https://github.com/NVIDIA/spark-rapids/issues/1866)|[BUG] GpuFileFormatWriter does not close the data writer| -|[#4524](https://github.com/NVIDIA/spark-rapids/issues/4524)|[BUG] RegExp transpiler fails to detect some choice expressions that cuDF cannot compile| -|[#3226](https://github.com/NVIDIA/spark-rapids/issues/3226)|[BUG]OOM happened when do cube operations| -|[#2504](https://github.com/NVIDIA/spark-rapids/issues/2504)|[BUG] OOM when running NDS queries with UCX and GDS| -|[#4273](https://github.com/NVIDIA/spark-rapids/issues/4273)|[BUG] Rounding past the size that can be stored in a type produces incorrect results| -|[#4060](https://github.com/NVIDIA/spark-rapids/issues/4060)|[BUG] test_hash_groupby_approx_percentile_long_repeated_keys failed intermittently| -|[#4039](https://github.com/NVIDIA/spark-rapids/issues/4039)|[BUG] Spark 3.3.0 IT Array test failures| -|[#3849](https://github.com/NVIDIA/spark-rapids/issues/3849)|[BUG] In ANSI mode we can fail in cases Spark would not due to conditionals| -|[#4445](https://github.com/NVIDIA/spark-rapids/issues/4445)|[BUG] mvn clean prints an error message on a clean dir| -|[#4421](https://github.com/NVIDIA/spark-rapids/issues/4421)|[BUG] the driver is trying to load CUDA with latest 22.02 | -|[#4455](https://github.com/NVIDIA/spark-rapids/issues/4455)|[BUG] join_test.py::test_struct_self_join[IGNORE_ORDER({'local': True})] failed in spark330| -|[#4442](https://github.com/NVIDIA/spark-rapids/issues/4442)|[BUG] mvn build FAILED with option `-P noSnapshotsWithDatabricks`| -|[#4281](https://github.com/NVIDIA/spark-rapids/issues/4281)|[BUG] q9 regression between 21.10 and 21.12| -|[#4280](https://github.com/NVIDIA/spark-rapids/issues/4280)|[BUG] q88 regression between 21.10 and 21.12| -|[#4422](https://github.com/NVIDIA/spark-rapids/issues/4422)|[BUG] Host column vectors are being leaked during tests| -|[#4446](https://github.com/NVIDIA/spark-rapids/issues/4446)|[BUG] GpuCast crashes when casting from Array with unsupportable child type| -|[#4432](https://github.com/NVIDIA/spark-rapids/issues/4432)|[BUG] nightly build 3.3.0 failed: HashClusteredDistribution is not a member of org.apache.spark.sql.catalyst.plans.physical| -|[#4443](https://github.com/NVIDIA/spark-rapids/issues/4443)|[BUG] SPARK-37705 breaks parquet filters from Spark 3.3.0 and Spark 3.2.2 onwards| -|[#4378](https://github.com/NVIDIA/spark-rapids/issues/4378)|[BUG] udf_test udf_cudf_test failed require_minimum_pandas_version check in spark 320+| -|[#4423](https://github.com/NVIDIA/spark-rapids/issues/4423)|[BUG] Build is failing due to FileScanRDD changes in Spark 3.3.0-SNAPSHOT| -|[#4401](https://github.com/NVIDIA/spark-rapids/issues/4401)|[BUG]array_test.py::test_array_contains failures| -|[#4403](https://github.com/NVIDIA/spark-rapids/issues/4403)|[BUG] NDS query 72 logs codegen fallback exception and produces incorrect results| -|[#4386](https://github.com/NVIDIA/spark-rapids/issues/4386)|[BUG] conditionals_test.py FAILED with side_effects_cast[Integer/Long] on Databricks 9.1 Runtime| -|[#3934](https://github.com/NVIDIA/spark-rapids/issues/3934)|[BUG] Dependencies of published integration tests jar are missing| -|[#4341](https://github.com/NVIDIA/spark-rapids/issues/4341)|[BUG] GpuCast.scala:nnn warning: discarding unmoored doc comment| -|[#4356](https://github.com/NVIDIA/spark-rapids/issues/4356)|[BUG] nightly spark303 deploy pulling spark301 aggregator| -|[#4347](https://github.com/NVIDIA/spark-rapids/issues/4347)|[BUG] Dist jar pom lists aggregator jar as dependency| -|[#4176](https://github.com/NVIDIA/spark-rapids/issues/4176)|[BUG] ParseDateTimeSuite UT failed| -|[#4292](https://github.com/NVIDIA/spark-rapids/issues/4292)|[BUG] no meaningful message is surfaced to maven when binary-dedupe fails| -|[#4351](https://github.com/NVIDIA/spark-rapids/issues/4351)|[BUG] Tests FAILED On SPARK-3.2.0, com.nvidia.spark.rapids.SerializedTableColumn cannot be cast to com.nvidia.spark.rapids.GpuColumnVector| -|[#4346](https://github.com/NVIDIA/spark-rapids/issues/4346)|[BUG] q73 decimal was twice as slow in weekly results| -|[#4334](https://github.com/NVIDIA/spark-rapids/issues/4334)|[BUG] GpuColumnarToRowExec will always be tagged False for exportColumnarRdd after Spark311 | -|[#4339](https://github.com/NVIDIA/spark-rapids/issues/4339)|The parameter `dataType` is not necessary in `resolveColumnVector` method.| -|[#4275](https://github.com/NVIDIA/spark-rapids/issues/4275)|[BUG] Row-based Hive UDF will fail if arguments contain a foldable expression.| -|[#4229](https://github.com/NVIDIA/spark-rapids/issues/4229)|[BUG] regexp_replace `[^a]` has different behavior between CPU and GPU for multiline strings| -|[#4294](https://github.com/NVIDIA/spark-rapids/issues/4294)|[BUG] parquet_write_test.py::test_ts_write_fails_datetime_exception failed in spark 3.1.1 and 3.1.2| -|[#4205](https://github.com/NVIDIA/spark-rapids/issues/4205)|[BUG] Get different results when casting from timestamp to string| -|[#4277](https://github.com/NVIDIA/spark-rapids/issues/4277)|[BUG] cudf_udf nightly cudf import rmm failed| -|[#4246](https://github.com/NVIDIA/spark-rapids/issues/4246)|[BUG] Regression in CastOpSuite due to cuDF change in parsing NaN| -|[#4243](https://github.com/NVIDIA/spark-rapids/issues/4243)|[BUG] test_regexp_replace_null_pattern_fallback[ALLOW_NON_GPU(ProjectExec,RegExpReplace)] failed in databricks| -|[#4244](https://github.com/NVIDIA/spark-rapids/issues/4244)|[BUG] Cast from string to float using hand-picked values failed| -|[#4227](https://github.com/NVIDIA/spark-rapids/issues/4227)|[BUG] RAPIDS Shuffle Manager doesn't fallback given encryption settings| -|[#3374](https://github.com/NVIDIA/spark-rapids/issues/3374)|[BUG] minor deprecation warnings in a 3.2 shim build| -|[#3613](https://github.com/NVIDIA/spark-rapids/issues/3613)|[BUG] release312db profile pulls in 311until320-apache| -|[#4213](https://github.com/NVIDIA/spark-rapids/issues/4213)|[BUG] unused method with a misleading outdated comment in ShimLoader | -|[#3609](https://github.com/NVIDIA/spark-rapids/issues/3609)|[BUG] GpuShuffleExchangeExec in v2 shims has inconsistent packaging| -|[#4127](https://github.com/NVIDIA/spark-rapids/issues/4127)|[BUG] CUDF 22.02 nightly test failure| - -### PRs -||| -|:---|:---| -|[#4773](https://github.com/NVIDIA/spark-rapids/pull/4773)|Update 22.02 changelog to latest [skip ci]| -|[#4771](https://github.com/NVIDIA/spark-rapids/pull/4771)|revert cudf api links from legacy to stable[skip ci]| -|[#4767](https://github.com/NVIDIA/spark-rapids/pull/4767)|Update 22.02 changelog to latest [skip ci]| -|[#4750](https://github.com/NVIDIA/spark-rapids/pull/4750)|Updated doc for decimal support| -|[#4757](https://github.com/NVIDIA/spark-rapids/pull/4757)|Update qualification tool to remove DECIMAL 128 as potential problem| -|[#4755](https://github.com/NVIDIA/spark-rapids/pull/4755)|Fix databricks doc for limitations.[skip ci]| -|[#4751](https://github.com/NVIDIA/spark-rapids/pull/4751)|Fix broken hyperlinks in documentation [skip ci]| -|[#4706](https://github.com/NVIDIA/spark-rapids/pull/4706)|Update 22.02 changelog to latest [skip ci]| -|[#4700](https://github.com/NVIDIA/spark-rapids/pull/4700)|Update cudfjni version to released 22.02.0| -|[#4701](https://github.com/NVIDIA/spark-rapids/pull/4701)|Decrease nighlty tests upper limitation to 7 [skip ci]| -|[#4639](https://github.com/NVIDIA/spark-rapids/pull/4639)|Update changelog for 22.02 and archive info of some older releases [skip ci]| -|[#4572](https://github.com/NVIDIA/spark-rapids/pull/4572)|Add download page for 22.02 [skip ci]| -|[#4672](https://github.com/NVIDIA/spark-rapids/pull/4672)|Revert "Disable 311cdh build due to missing dependency (#4659)"| -|[#4662](https://github.com/NVIDIA/spark-rapids/pull/4662)|Update the deploy script [skip ci]| -|[#4657](https://github.com/NVIDIA/spark-rapids/pull/4657)|Upmerge spark2 directory to the latest 22.02 changes| -|[#4659](https://github.com/NVIDIA/spark-rapids/pull/4659)|Disable 311cdh build by default because of a missing dependency| -|[#4508](https://github.com/NVIDIA/spark-rapids/pull/4508)|Fix Spark 3.2.1 build failures and make it non-snapshot| -|[#4652](https://github.com/NVIDIA/spark-rapids/pull/4652)|Remove non-deterministic test order in nightly [skip ci]| -|[#4643](https://github.com/NVIDIA/spark-rapids/pull/4643)|Add profile release301 when mvn help:evaluate| -|[#4630](https://github.com/NVIDIA/spark-rapids/pull/4630)|Fix the incomplete capture of SubqueryBroadcast | -|[#4633](https://github.com/NVIDIA/spark-rapids/pull/4633)|Suppress newTaskTempFile method warnings for Spark 3.3.0 build| -|[#4618](https://github.com/NVIDIA/spark-rapids/pull/4618)|[DB31x] Pick the correct Python runner for flatmap-group Pandas UDF| -|[#4622](https://github.com/NVIDIA/spark-rapids/pull/4622)|Fallback to CPU when encoding is not supported for JSON reader| -|[#4470](https://github.com/NVIDIA/spark-rapids/pull/4470)|Add in HashPartitioning support for decimal 128| -|[#4535](https://github.com/NVIDIA/spark-rapids/pull/4535)|Revert "Disable orc write by default because of https://issues.apache.org/jira/browse/ORC-1075 (#4471)"| -|[#4583](https://github.com/NVIDIA/spark-rapids/pull/4583)|Avoid unapply on PromotePrecision| -|[#4573](https://github.com/NVIDIA/spark-rapids/pull/4573)|Correct version from 21.12 to 22.02[skip ci]| -|[#4575](https://github.com/NVIDIA/spark-rapids/pull/4575)|Correct and update links in UDF doc[skip ci]| -|[#4501](https://github.com/NVIDIA/spark-rapids/pull/4501)|Switch and/or to use new cudf binops to improve performance| -|[#4594](https://github.com/NVIDIA/spark-rapids/pull/4594)|Resolve case-folding issue [skip ci]| -|[#4585](https://github.com/NVIDIA/spark-rapids/pull/4585)|Spark2 module upmerge, deploy script, and updates for Jenkins| -|[#4589](https://github.com/NVIDIA/spark-rapids/pull/4589)|Increase premerge databricks IDLE_TIMEOUT to 4 hours [skip ci]| -|[#4485](https://github.com/NVIDIA/spark-rapids/pull/4485)|Add json reader support| -|[#4556](https://github.com/NVIDIA/spark-rapids/pull/4556)|regexp_replace with back-references should fall back to CPU| -|[#4569](https://github.com/NVIDIA/spark-rapids/pull/4569)|Fix infinite loop with Profiling tool compare mode and app with no sql ids| -|[#4529](https://github.com/NVIDIA/spark-rapids/pull/4529)|Add support for Spark 2.x Explain Api| -|[#4577](https://github.com/NVIDIA/spark-rapids/pull/4577)|Revert "Fix CVE-2021-22569 (#4545)"| -|[#4520](https://github.com/NVIDIA/spark-rapids/pull/4520)|GpuSequence refactor| -|[#4570](https://github.com/NVIDIA/spark-rapids/pull/4570)|A few quick fixes to try to reduce max memory usage in the tests| -|[#4477](https://github.com/NVIDIA/spark-rapids/pull/4477)|Use libcudf mixed joins for conditional hash joins| -|[#4566](https://github.com/NVIDIA/spark-rapids/pull/4566)|remove scala-library from combined tools jar| -|[#4552](https://github.com/NVIDIA/spark-rapids/pull/4552)|Fix resource leak in GpuCaseWhen| -|[#4553](https://github.com/NVIDIA/spark-rapids/pull/4553)|Reenable test_hash_reduction_pivot_without_nans| -|[#4530](https://github.com/NVIDIA/spark-rapids/pull/4530)|Fix correctness issues in regexp and add `\r` and `\n` to fuzz tests| -|[#4549](https://github.com/NVIDIA/spark-rapids/pull/4549)|Fix typos in integration tests README [skip ci]| -|[#4545](https://github.com/NVIDIA/spark-rapids/pull/4545)|Fix CVE-2021-22569| -|[#4543](https://github.com/NVIDIA/spark-rapids/pull/4543)|Enable auto-merge from branch-22.02 to branch-22.04 [skip ci]| -|[#4540](https://github.com/NVIDIA/spark-rapids/pull/4540)|Remove user kuhushukla| -|[#4434](https://github.com/NVIDIA/spark-rapids/pull/4434)|Support max on single-level struct in aggregation context| -|[#4534](https://github.com/NVIDIA/spark-rapids/pull/4534)|Temporarily disable integration test - test_hash_reduction_pivot_without_nans| -|[#4322](https://github.com/NVIDIA/spark-rapids/pull/4322)|Add an explain only mode to the plugin| -|[#4497](https://github.com/NVIDIA/spark-rapids/pull/4497)|Make better use of pinned memory pool| -|[#4512](https://github.com/NVIDIA/spark-rapids/pull/4512)|remove hadoop version requirement[skip ci]| -|[#4527](https://github.com/NVIDIA/spark-rapids/pull/4527)|Fall back to CPU for regular expressions containing \D or \W| -|[#4525](https://github.com/NVIDIA/spark-rapids/pull/4525)|Properly close data writer in GpuFileFormatWriter| -|[#4502](https://github.com/NVIDIA/spark-rapids/pull/4502)|Removed the redundant test for element_at and fixed the failing one| -|[#4523](https://github.com/NVIDIA/spark-rapids/pull/4523)|Add more integration tests for decimal 128| -|[#3762](https://github.com/NVIDIA/spark-rapids/pull/3762)|Call the right method to convert table from row major <=> col major| -|[#4482](https://github.com/NVIDIA/spark-rapids/pull/4482)|Simplified the construction of zero scalar in GpuUnaryMinus| -|[#4510](https://github.com/NVIDIA/spark-rapids/pull/4510)|Update copyright in NOTICE [skip ci]| -|[#4484](https://github.com/NVIDIA/spark-rapids/pull/4484)|Update GpuFileFormatWriter to stay in sync with recent Spark changes, but still not support writing Hive bucketed table on GPU.| -|[#4492](https://github.com/NVIDIA/spark-rapids/pull/4492)|Fall back to CPU for regular expressions containing hex digits| -|[#4495](https://github.com/NVIDIA/spark-rapids/pull/4495)|Enable approx_percentile by default| -|[#4420](https://github.com/NVIDIA/spark-rapids/pull/4420)|Fix up incorrect results of rounding past the max digits of data type| -|[#4483](https://github.com/NVIDIA/spark-rapids/pull/4483)|Update test case of reading nested unsigned parquet file| -|[#4490](https://github.com/NVIDIA/spark-rapids/pull/4490)|Remove warning about RMM default allocator| -|[#4461](https://github.com/NVIDIA/spark-rapids/pull/4461)|[Audit] Add bucketed scan info in query plan of data source v1| -|[#4489](https://github.com/NVIDIA/spark-rapids/pull/4489)|Add arrays of decimal128 to join tests| -|[#4476](https://github.com/NVIDIA/spark-rapids/pull/4476)|Don't acquire the semaphore for empty input while scanning| -|[#4424](https://github.com/NVIDIA/spark-rapids/pull/4424)|Improve support for regular expression string anchors `\A`, `\Z`, and `\z`| -|[#4491](https://github.com/NVIDIA/spark-rapids/pull/4491)|Skip the test for spark versions 3.1.1, 3.1.2 and 3.2.0 only| -|[#4459](https://github.com/NVIDIA/spark-rapids/pull/4459)|Use merge sort for struct types in non-key columns| -|[#4494](https://github.com/NVIDIA/spark-rapids/pull/4494)|Append new authorized user to blossom-ci whitelist [skip ci]| -|[#4400](https://github.com/NVIDIA/spark-rapids/pull/4400)|Enable approx percentile tests| -|[#4471](https://github.com/NVIDIA/spark-rapids/pull/4471)|Disable orc write by default because of https://issues.apache.org/jira/browse/ORC-1075| -|[#4462](https://github.com/NVIDIA/spark-rapids/pull/4462)|Rename DECIMAL_128_FULL and rework usage of TypeSig.gpuNumeric| -|[#4479](https://github.com/NVIDIA/spark-rapids/pull/4479)|Change signoff check image to slim-buster [skip ci]| -|[#4464](https://github.com/NVIDIA/spark-rapids/pull/4464)|Throw SparkArrayIndexOutOfBoundsException for Spark 3.3.0+| -|[#4469](https://github.com/NVIDIA/spark-rapids/pull/4469)|Support repetition of \d and \D in regexp functions| -|[#4472](https://github.com/NVIDIA/spark-rapids/pull/4472)|Modify docs for 22.02 to address issue-4319[skip ci]| -|[#4440](https://github.com/NVIDIA/spark-rapids/pull/4440)|Enable GPU broadcast exchange reuse for DPP when AQE enabled| -|[#4376](https://github.com/NVIDIA/spark-rapids/pull/4376)|Add sequence support| -|[#4460](https://github.com/NVIDIA/spark-rapids/pull/4460)|Abstract the text based PartitionReader| -|[#4383](https://github.com/NVIDIA/spark-rapids/pull/4383)|Fix correctness issue with CASE WHEN with expressions that have side-effects| -|[#4465](https://github.com/NVIDIA/spark-rapids/pull/4465)|Refactor for shims 320+| -|[#4463](https://github.com/NVIDIA/spark-rapids/pull/4463)|Avoid replacing a hash join if build side is unsupported by the join type| -|[#4456](https://github.com/NVIDIA/spark-rapids/pull/4456)|Fix build issues: 1 clean non-exists target dirs; 2 remove duplicated plugin| -|[#4416](https://github.com/NVIDIA/spark-rapids/pull/4416)|Unshim join execs| -|[#4172](https://github.com/NVIDIA/spark-rapids/pull/4172)|Support String to Decimal 128| -|[#4458](https://github.com/NVIDIA/spark-rapids/pull/4458)|Exclude some metadata operators when checking GPU replacement| -|[#4451](https://github.com/NVIDIA/spark-rapids/pull/4451)|Some metrics improvements and timeline reporting| -|[#4435](https://github.com/NVIDIA/spark-rapids/pull/4435)|Disable add profile src execution by default to make the build log clean| -|[#4436](https://github.com/NVIDIA/spark-rapids/pull/4436)|Print error log to stderr output| -|[#4155](https://github.com/NVIDIA/spark-rapids/pull/4155)|Add partial support for line begin and end anchors in regexp_replace| -|[#4428](https://github.com/NVIDIA/spark-rapids/pull/4428)|Exhaustively iterate ColumnarToRow iterator to avoid leaks| -|[#4430](https://github.com/NVIDIA/spark-rapids/pull/4430)|update pca example link in ml-integration.md[skip ci]| -|[#4452](https://github.com/NVIDIA/spark-rapids/pull/4452)|Limit parallelism of nightly tests [skip ci]| -|[#4449](https://github.com/NVIDIA/spark-rapids/pull/4449)|Add recursive type checking and fallback tests for casting array with unsupported element types to string| -|[#4437](https://github.com/NVIDIA/spark-rapids/pull/4437)|Change logInfo to logWarning| -|[#4447](https://github.com/NVIDIA/spark-rapids/pull/4447)|Fix 330 build error and add 322 shims layer| -|[#4417](https://github.com/NVIDIA/spark-rapids/pull/4417)|Fix an Intellij debug issue| -|[#4431](https://github.com/NVIDIA/spark-rapids/pull/4431)|Add DateType support for AST expressions| -|[#4433](https://github.com/NVIDIA/spark-rapids/pull/4433)|Import the right pandas from conda [skip ci]| -|[#4419](https://github.com/NVIDIA/spark-rapids/pull/4419)|Import the right pandas from conda| -|[#4427](https://github.com/NVIDIA/spark-rapids/pull/4427)|Update getFileScanRDD shim for recent changes in Spark 3.3.0| -|[#4397](https://github.com/NVIDIA/spark-rapids/pull/4397)|Ignore cufile.log| -|[#4388](https://github.com/NVIDIA/spark-rapids/pull/4388)|Add support for ReplicateRows| -|[#4399](https://github.com/NVIDIA/spark-rapids/pull/4399)|Update docs for Profiling and Qualification tool to change wording| -|[#4407](https://github.com/NVIDIA/spark-rapids/pull/4407)|Fix GpuSubqueryBroadcast on multi-fields relation| -|[#4396](https://github.com/NVIDIA/spark-rapids/pull/4396)|GpuShuffleCoalesceIterator acquire semaphore after host concat| -|[#4361](https://github.com/NVIDIA/spark-rapids/pull/4361)|Accommodate altered semantics of `cudf::lists::contains()`| -|[#4394](https://github.com/NVIDIA/spark-rapids/pull/4394)|Use correct column name in GpuIf test| -|[#4385](https://github.com/NVIDIA/spark-rapids/pull/4385)|Add missing GpuSubqueryBroadcast replacement rule for spark31x | -|[#4387](https://github.com/NVIDIA/spark-rapids/pull/4387)|Fix auto merge conflict 4384[skip ci]| -|[#4374](https://github.com/NVIDIA/spark-rapids/pull/4374)|Fix the IT module depends on the tests module| -|[#4365](https://github.com/NVIDIA/spark-rapids/pull/4365)|Not publishing integration_tests jar to Maven Central [skip ci]| -|[#4358](https://github.com/NVIDIA/spark-rapids/pull/4358)|Update GpuIf to support expressions with side effects| -|[#4382](https://github.com/NVIDIA/spark-rapids/pull/4382)|Remove unused scallop dependency from integration_tests| -|[#4364](https://github.com/NVIDIA/spark-rapids/pull/4364)|Replace Scala document with Scala comment for inner functions| -|[#4373](https://github.com/NVIDIA/spark-rapids/pull/4373)|Add pytest tags for nightly test parallel run [skip ci]| -|[#4150](https://github.com/NVIDIA/spark-rapids/pull/4150)|Support GpuSubqueryBroadcast for DPP| -|[#4372](https://github.com/NVIDIA/spark-rapids/pull/4372)|Move casting to string tests from array_test.py and struct_test.py to cast_test.py| -|[#4371](https://github.com/NVIDIA/spark-rapids/pull/4371)|Fix typo in skipTestsFor330 calculation [skip ci]| -|[#4355](https://github.com/NVIDIA/spark-rapids/pull/4355)|Dedicated deploy-file with reduced pom in nightly build [skip ci]| -|[#4352](https://github.com/NVIDIA/spark-rapids/pull/4352)|Revert "Ignore failing string to timestamp tests temporarily (#4197)"| -|[#4359](https://github.com/NVIDIA/spark-rapids/pull/4359)|Audit - SPARK-37268 - Remove unused variable in GpuFileScanRDD [Databricks]| -|[#4327](https://github.com/NVIDIA/spark-rapids/pull/4327)|Print meaningful message when calling scripts in maven| -|[#4354](https://github.com/NVIDIA/spark-rapids/pull/4354)|Fix regression in AQE optimizations| -|[#4343](https://github.com/NVIDIA/spark-rapids/pull/4343)|Fix issue with binding to hash agg columns with computation| -|[#4285](https://github.com/NVIDIA/spark-rapids/pull/4285)|Add support for regexp_extract on the GPU| -|[#4349](https://github.com/NVIDIA/spark-rapids/pull/4349)|Fix PYTHONPATH in pre-merge| -|[#4269](https://github.com/NVIDIA/spark-rapids/pull/4269)|The option for the nightly script not deploying jars [skip ci]| -|[#4335](https://github.com/NVIDIA/spark-rapids/pull/4335)|Fix the issue of exporting Column RDD| -|[#4336](https://github.com/NVIDIA/spark-rapids/pull/4336)|Split expensive pytest files in cases level [skip ci]| -|[#4328](https://github.com/NVIDIA/spark-rapids/pull/4328)|Change the explanation of why the operator will not work on GPU| -|[#4338](https://github.com/NVIDIA/spark-rapids/pull/4338)|Use scala Int.box instead of Integer constructors | -|[#4340](https://github.com/NVIDIA/spark-rapids/pull/4340)|Remove the unnecessary parameter `dataType` in `resolveColumnVector` method| -|[#4256](https://github.com/NVIDIA/spark-rapids/pull/4256)|Allow returning an EmptyHashedRelation when a broadcast result is empty| -|[#4333](https://github.com/NVIDIA/spark-rapids/pull/4333)|Add tests about writing empty table to ORC/PAQUET| -|[#4337](https://github.com/NVIDIA/spark-rapids/pull/4337)|Support GpuFirst and GpuLast on nested types under reduction aggregations| -|[#4331](https://github.com/NVIDIA/spark-rapids/pull/4331)|Fix parquet options builder calls| -|[#4310](https://github.com/NVIDIA/spark-rapids/pull/4310)|Fix typo in shim class name| -|[#4326](https://github.com/NVIDIA/spark-rapids/pull/4326)|Fix 4315 decrease concurrentGpuTasks to avoid sum test OOM| -|[#4266](https://github.com/NVIDIA/spark-rapids/pull/4266)|Check revisions for all shim jars while build all| -|[#4282](https://github.com/NVIDIA/spark-rapids/pull/4282)|Use data type to create an inspector for a foldable GPU expression.| -|[#3144](https://github.com/NVIDIA/spark-rapids/pull/3144)|Optimize AQE with Spark 3.2+ to avoid redundant transitions| -|[#4317](https://github.com/NVIDIA/spark-rapids/pull/4317)|[BUG] Update nightly test script to dynamically set mem_fraction [skip ci]| -|[#4206](https://github.com/NVIDIA/spark-rapids/pull/4206)|Porting GpuRowToColumnar converters to InternalColumnarRDDConverter| -|[#4272](https://github.com/NVIDIA/spark-rapids/pull/4272)|Full support for SUM overflow detection on decimal| -|[#4255](https://github.com/NVIDIA/spark-rapids/pull/4255)|Make regexp pattern `[^a]` consistent with Spark for multiline strings| -|[#4306](https://github.com/NVIDIA/spark-rapids/pull/4306)|Revert commonizing the int96ParquetRebase* functions | -|[#4299](https://github.com/NVIDIA/spark-rapids/pull/4299)|Fix auto merge conflict 4298 [skip ci]| -|[#4159](https://github.com/NVIDIA/spark-rapids/pull/4159)|Optimize sample perf| -|[#4235](https://github.com/NVIDIA/spark-rapids/pull/4235)|Commonize v2 shim| -|[#4274](https://github.com/NVIDIA/spark-rapids/pull/4274)|Add tests for timestamps that overflowed before.| -|[#4271](https://github.com/NVIDIA/spark-rapids/pull/4271)|Skip test_regexp_replace_null_pattern_fallback on Spark 3.1.1 and later| -|[#4278](https://github.com/NVIDIA/spark-rapids/pull/4278)|Use mamba for cudf conda install [skip ci]| -|[#4270](https://github.com/NVIDIA/spark-rapids/pull/4270)|Document exponent differences when casting floating point to string [skip ci]| -|[#4268](https://github.com/NVIDIA/spark-rapids/pull/4268)|Fix merge conflict with branch-21.12| -|[#4093](https://github.com/NVIDIA/spark-rapids/pull/4093)|Add tests for regexp() and regexp_like()| -|[#4259](https://github.com/NVIDIA/spark-rapids/pull/4259)|fix regression in cast from string to float that caused signed NaN to be considered valid| -|[#4241](https://github.com/NVIDIA/spark-rapids/pull/4241)|fix bug in parsing regex character classes that start with `^` and contain an unescaped `]`| -|[#4224](https://github.com/NVIDIA/spark-rapids/pull/4224)|Support row-based Hive UDFs| -|[#4221](https://github.com/NVIDIA/spark-rapids/pull/4221)|GpuCast from ArrayType to StringType| -|[#4007](https://github.com/NVIDIA/spark-rapids/pull/4007)|Implement duplicate key handling for GpuCreateMap| -|[#4251](https://github.com/NVIDIA/spark-rapids/pull/4251)|Skip test_regexp_replace_null_pattern_fallback on Databricks| -|[#4247](https://github.com/NVIDIA/spark-rapids/pull/4247)|Disable failing CastOpSuite test| -|[#4239](https://github.com/NVIDIA/spark-rapids/pull/4239)|Make EOL anchor behavior match CPU for strings ending with newline| -|[#4153](https://github.com/NVIDIA/spark-rapids/pull/4153)|Regexp: Only transpile once per expression rather than once per batch| -|[#4230](https://github.com/NVIDIA/spark-rapids/pull/4230)|Change to build tools module with all the versions by default| -|[#4223](https://github.com/NVIDIA/spark-rapids/pull/4223)|Fixes a minor deprecation warning| -|[#4215](https://github.com/NVIDIA/spark-rapids/pull/4215)|Rebalance testing load| -|[#4214](https://github.com/NVIDIA/spark-rapids/pull/4214)|Fix pre_merge ci_2 [skip ci]| -|[#4212](https://github.com/NVIDIA/spark-rapids/pull/4212)|Remove an unused method with its outdated comment| -|[#4211](https://github.com/NVIDIA/spark-rapids/pull/4211)|Update test_floor_ceil_overflow to be more lenient on exception type| -|[#4203](https://github.com/NVIDIA/spark-rapids/pull/4203)|Move all the GpuShuffleExchangeExec shim v2 classes to org.apache.spark| -|[#4193](https://github.com/NVIDIA/spark-rapids/pull/4193)|Rename 311until320-apache to 311until320-noncdh| -|[#4197](https://github.com/NVIDIA/spark-rapids/pull/4197)|Ignore failing string to timestamp tests temporarily| -|[#4160](https://github.com/NVIDIA/spark-rapids/pull/4160)|Fix merge issues for branch 22.02| -|[#4081](https://github.com/NVIDIA/spark-rapids/pull/4081)|Convert String to DecimalType without casting to FloatType| -|[#4132](https://github.com/NVIDIA/spark-rapids/pull/4132)|Fix auto merge conflict 4131 [skip ci]| -|[#4099](https://github.com/NVIDIA/spark-rapids/pull/4099)|[REVIEW] Init version 22.02.0| -|[#4113](https://github.com/NVIDIA/spark-rapids/pull/4113)|Fix pre-merge CI 2 conditions [skip ci]| - -## Older Releases -Changelog of older releases can be found at [docs/archives](/docs/archives) diff --git a/docs/archives/CHANGELOG_23.02_to_23.12.md b/docs/archives/CHANGELOG_23.02_to_23.12.md deleted file mode 100644 index 485d394d2d7..00000000000 --- a/docs/archives/CHANGELOG_23.02_to_23.12.md +++ /dev/null @@ -1,1566 +0,0 @@ - -# Change log -Generated on 2024-04-10 -## Release 23.12 - -### Features -||| -|:---|:---| -|[#6832](https://github.com/NVIDIA/spark-rapids/issues/6832)|[FEA] Convert Timestamp/Timezone tests/checks to be per operator instead of generic | -|[#9805](https://github.com/NVIDIA/spark-rapids/issues/9805)|[FEA] Support ```current_date``` expression function with CST (UTC + 8) timezone support| -|[#9515](https://github.com/NVIDIA/spark-rapids/issues/9515)|[FEA] Support temporal types in to_json| -|[#9872](https://github.com/NVIDIA/spark-rapids/issues/9872)|[FEA][JSON] Support Decimal type in `to_json`| -|[#9802](https://github.com/NVIDIA/spark-rapids/issues/9802)|[FEA] Support FromUTCTimestamp on the GPU with a non-UTC time zone| -|[#6831](https://github.com/NVIDIA/spark-rapids/issues/6831)|[FEA] Support timestamp transitions to and from UTC for single time zones with no repeating rules| -|[#9590](https://github.com/NVIDIA/spark-rapids/issues/9590)|[FEA][JSON] Support temporal types in `from_json`| -|[#9804](https://github.com/NVIDIA/spark-rapids/issues/9804)|[FEA] Support CPU path for from_utc_timestamp function with timezone| -|[#9461](https://github.com/NVIDIA/spark-rapids/issues/9461)|[FEA] Validate nvcomp-3.0 with spark rapids plugin| -|[#8832](https://github.com/NVIDIA/spark-rapids/issues/8832)|[FEA] rewrite join conditions where only part of it can fit on the AST| -|[#9059](https://github.com/NVIDIA/spark-rapids/issues/9059)|[FEA] Support spark.sql.parquet.datetimeRebaseModeInRead=LEGACY| -|[#9037](https://github.com/NVIDIA/spark-rapids/issues/9037)|[FEA] Support spark.sql.parquet.int96RebaseModeInWrite= LEGACY| -|[#9632](https://github.com/NVIDIA/spark-rapids/issues/9632)|[FEA] Take into account `org.apache.spark.timeZone` in Parquet/Avro from Spark 3.2| -|[#8770](https://github.com/NVIDIA/spark-rapids/issues/8770)|[FEA] add more metrics to Eventlogs or Executor logs| -|[#9597](https://github.com/NVIDIA/spark-rapids/issues/9597)|[FEA][JSON] Support boolean type in `from_json`| -|[#9516](https://github.com/NVIDIA/spark-rapids/issues/9516)|[FEA] Add support for JSON data source option `ignoreNullFields=false` in `to_json`| -|[#9520](https://github.com/NVIDIA/spark-rapids/issues/9520)|[FEA] Add support for `LAST()` as running window function| -|[#9518](https://github.com/NVIDIA/spark-rapids/issues/9518)|[FEA] Add support for relevant JSON data source options in `to_json`| -|[#9218](https://github.com/NVIDIA/spark-rapids/issues/9218)|[FEA] Support stack function| -|[#9532](https://github.com/NVIDIA/spark-rapids/issues/9532)|[FEA] Support Delta Lake 2.3.0| -|[#1525](https://github.com/NVIDIA/spark-rapids/issues/1525)|[FEA] Support Scala 2.13| -|[#7279](https://github.com/NVIDIA/spark-rapids/issues/7279)|[FEA] Support OverwriteByExpressionExecV1 for Delta Lake| -|[#9326](https://github.com/NVIDIA/spark-rapids/issues/9326)|[FEA] Specify `recover_with_null` when reading JSON files| -|[#8780](https://github.com/NVIDIA/spark-rapids/issues/8780)|[FEA] Support to_json function| -|[#7278](https://github.com/NVIDIA/spark-rapids/issues/7278)|[FEA] Support AppendDataExecV1 for Delta Lake| -|[#6266](https://github.com/NVIDIA/spark-rapids/issues/6266)|[FEA] Support Percentile| -|[#7277](https://github.com/NVIDIA/spark-rapids/issues/7277)|[FEA] Support AtomicReplaceTableAsSelect for Delta Lake| -|[#7276](https://github.com/NVIDIA/spark-rapids/issues/7276)|[FEA] Support AtomicCreateTableAsSelect for Delta Lake| - -### Performance -||| -|:---|:---| -|[#8137](https://github.com/NVIDIA/spark-rapids/issues/8137)|[FEA] Upgrade to UCX 1.15| -|[#8157](https://github.com/NVIDIA/spark-rapids/issues/8157)|[FEA] Add string comparison to AST expressions| -|[#9398](https://github.com/NVIDIA/spark-rapids/issues/9398)|[FEA] Compress/encrypt spill to disk| - -### Bugs Fixed -||| -|:---|:---| -|[#9687](https://github.com/NVIDIA/spark-rapids/issues/9687)|[BUG] `test_in_set` fails when DATAGEN_SEED=1698940723| -|[#9659](https://github.com/NVIDIA/spark-rapids/issues/9659)|[BUG] executor crash intermittantly in scala2.13-built spark332 integration tests| -|[#9923](https://github.com/NVIDIA/spark-rapids/issues/9923)|[BUG] Failed case about ```test_timestamp_seconds_rounding_necessary[Decimal(20,7)][DATAGEN_SEED=1701412018] – src.main.python.date_time_test```| -|[#9982](https://github.com/NVIDIA/spark-rapids/issues/9982)|[BUG] test "convert large InternalRow iterator to cached batch single col" failed with arena pool| -|[#9683](https://github.com/NVIDIA/spark-rapids/issues/9683)|[BUG] test_map_scalars_supported_key_types fails with DATAGEN_SEED=1698940723| -|[#9976](https://github.com/NVIDIA/spark-rapids/issues/9976)|[BUG] test_part_write_round_trip[Float] Failed on -0.0 partition| -|[#9948](https://github.com/NVIDIA/spark-rapids/issues/9948)|[BUG] parquet reader data corruption in nested schema after https://github.com/rapidsai/cudf/pull/13302| -|[#9867](https://github.com/NVIDIA/spark-rapids/issues/9867)|[BUG] Unable to use Spark Rapids with Spark Thrift Server| -|[#9934](https://github.com/NVIDIA/spark-rapids/issues/9934)|[BUG] test_delta_multi_part_write_round_trip_unmanaged and test_delta_part_write_round_trip_unmanaged failed DATA_SEED=1701608331 | -|[#9933](https://github.com/NVIDIA/spark-rapids/issues/9933)|[BUG] collection_ops_test.py::test_sequence_too_long_sequence[Long(not_null)][DATAGEN_SEED=1701553915, INJECT_OOM]| -|[#9837](https://github.com/NVIDIA/spark-rapids/issues/9837)|[BUG] test_part_write_round_trip failed| -|[#9932](https://github.com/NVIDIA/spark-rapids/issues/9932)|[BUG] Failed test_multi_tier_ast[DATAGEN_SEED=1701445668] on CI| -|[#9829](https://github.com/NVIDIA/spark-rapids/issues/9829)|[BUG] Java OOM when testing non-UTC time zone with lots of cases fallback.| -|[#9403](https://github.com/NVIDIA/spark-rapids/issues/9403)|[BUG] test_cogroup_apply_udf[Short(not_null)] failed with pandas 2.1.X| -|[#9684](https://github.com/NVIDIA/spark-rapids/issues/9684)|[BUG] test_coalesce fails with DATAGEN_SEED=1698940723| -|[#9685](https://github.com/NVIDIA/spark-rapids/issues/9685)|[BUG] test_case_when fails with DATAGEN_SEED=1698940723| -|[#9776](https://github.com/NVIDIA/spark-rapids/issues/9776)|[BUG] fastparquet compatibility tests fail with data mismatch if TZ is not set and system timezone is not UTC| -|[#9733](https://github.com/NVIDIA/spark-rapids/issues/9733)|[BUG] Complex AST expressions can crash with non-matching operand type error| -|[#9877](https://github.com/NVIDIA/spark-rapids/issues/9877)|[BUG] Fix resource leak in to_json| -|[#9722](https://github.com/NVIDIA/spark-rapids/issues/9722)|[BUG] test_floor_scale_zero fails with DATAGEN_SEED=1700009407| -|[#9846](https://github.com/NVIDIA/spark-rapids/issues/9846)|[BUG] test_ceil_scale_zero may fail with different datagen_seed| -|[#9781](https://github.com/NVIDIA/spark-rapids/issues/9781)|[BUG] test_cast_string_date_valid_format fails on DATAGEN_SEED=1700250017| -|[#9714](https://github.com/NVIDIA/spark-rapids/issues/9714)|Scala Map class not found when executing the benchmark on Spark 3.5.0 with Scala 2.13| -|[#9856](https://github.com/NVIDIA/spark-rapids/issues/9856)|collection_ops_test.py failed on Dataproc-2.1 with: Column 'None' does not exist| -|[#9397](https://github.com/NVIDIA/spark-rapids/issues/9397)|[BUG] RapidsShuffleManager MULTITHREADED on Databricks, we see loss of executors due to Rpc issues| -|[#9738](https://github.com/NVIDIA/spark-rapids/issues/9738)|[BUG] `test_delta_part_write_round_trip_unmanaged` and `test_delta_multi_part_write_round_trip_unmanaged` fail with `DATAGEN_SEED=1700105176`| -|[#9771](https://github.com/NVIDIA/spark-rapids/issues/9771)|[BUG] ast_test.py::test_X[(String, True)][DATAGEN_SEED=1700205785] failed| -|[#9782](https://github.com/NVIDIA/spark-rapids/issues/9782)|[BUG] Error messages appear in a clean build| -|[#9798](https://github.com/NVIDIA/spark-rapids/issues/9798)|[BUG] GpuCheckOverflowInTableInsert should be added to databricks shim| -|[#9820](https://github.com/NVIDIA/spark-rapids/issues/9820)|[BUG] test_parquet_write_roundtrip_datetime_with_legacy_rebase fails with "year 0 is out of range"| -|[#9817](https://github.com/NVIDIA/spark-rapids/issues/9817)|[BUG] FAILED dpp_test.py::test_dpp_reuse_broadcast_exchange[false-0-parquet][DATAGEN_SEED=1700572856, IGNORE_ORDER]| -|[#9768](https://github.com/NVIDIA/spark-rapids/issues/9768)|[BUG] `cast decimal to string` ScalaTest relies on a side effects | -|[#9711](https://github.com/NVIDIA/spark-rapids/issues/9711)|[BUG] test_lte fails with DATAGEN_SEED=1699987762| -|[#9751](https://github.com/NVIDIA/spark-rapids/issues/9751)|[BUG] cmp_test test_gte failed with DATAGEN_SEED=1700149611| -|[#9469](https://github.com/NVIDIA/spark-rapids/issues/9469)|[BUG] [main] ERROR com.nvidia.spark.rapids.GpuOverrideUtil - Encountered an exception applying GPU overrides java.lang.IllegalStateException: the broadcast must be on the GPU too| -|[#9648](https://github.com/NVIDIA/spark-rapids/issues/9648)|[BUG] Existence default values in schema are not being honored| -|[#9676](https://github.com/NVIDIA/spark-rapids/issues/9676)|Fix Delta Lake Integration tests; `test_delta_atomic_create_table_as_select` and `test_delta_atomic_replace_table_as_select`| -|[#9701](https://github.com/NVIDIA/spark-rapids/issues/9701)|[BUG] test_ts_formats_round_trip and test_datetime_roundtrip_with_legacy_rebase fail with DATAGEN_SEED=1699915317| -|[#9691](https://github.com/NVIDIA/spark-rapids/issues/9691)|[BUG] Repeated Maven invocations w/o changes recompile too many Scala sources despite recompileMode=incremental | -|[#9547](https://github.com/NVIDIA/spark-rapids/issues/9547)|Update buildall and doc to generate bloop projects for test debugging| -|[#9697](https://github.com/NVIDIA/spark-rapids/issues/9697)|[BUG] Iceberg multiple file readers can not read files if the file paths contain encoded URL unsafe chars| -|[#9681](https://github.com/NVIDIA/spark-rapids/issues/9681)|Databricks Build Failing For 330db+| -|[#9521](https://github.com/NVIDIA/spark-rapids/issues/9521)|[BUG] Multi Threaded Shuffle Writer needs flow control| -|[#9675](https://github.com/NVIDIA/spark-rapids/issues/9675)|Failing Delta Lake Tests for Databricks 13.3 Due to WriteIntoDeltaCommand| -|[#9669](https://github.com/NVIDIA/spark-rapids/issues/9669)|[BUG] Rebase exception states not in UTC but timezone is Etc/UTC| -|[#7940](https://github.com/NVIDIA/spark-rapids/issues/7940)|[BUG] UCX peer connection issue in multi-nic single node cluster| -|[#9650](https://github.com/NVIDIA/spark-rapids/issues/9650)|[BUG] Github workflow for missing scala2.13 updates fails to detect when pom is new| -|[#9621](https://github.com/NVIDIA/spark-rapids/issues/9621)|[BUG] Scala 2.13 with-classifier profile is picking up Scala2.12 spark.version| -|[#9636](https://github.com/NVIDIA/spark-rapids/issues/9636)|[BUG] All parquet integration tests failed "Part of the plan is not columnar class" in databricks runtimes| -|[#9108](https://github.com/NVIDIA/spark-rapids/issues/9108)|[BUG] nullability on some decimal operations is wrong| -|[#9625](https://github.com/NVIDIA/spark-rapids/issues/9625)|[BUG] Typo in github Maven check install-modules | -|[#9603](https://github.com/NVIDIA/spark-rapids/issues/9603)|[BUG] fastparquet_compatibility_test fails on dataproc| -|[#8729](https://github.com/NVIDIA/spark-rapids/issues/8729)|[BUG] nightly integration test failed OOM kill in JDK11 ENV| -|[#9589](https://github.com/NVIDIA/spark-rapids/issues/9589)|[BUG] Scala 2.13 build hard-codes Java 8 target | -|[#9581](https://github.com/NVIDIA/spark-rapids/issues/9581)|Delta Lake 2.4 missing equals/hashCode override for file format and some metrics for merge| -|[#9507](https://github.com/NVIDIA/spark-rapids/issues/9507)|[BUG] Spark 3.2+/ParquetFilterSuite/Parquet filter pushdown - timestamp/ FAILED | -|[#9540](https://github.com/NVIDIA/spark-rapids/issues/9540)|[BUG] Job failed with SparkUpgradeException no matter which value are set for spark.sql.parquet.datetimeRebaseModeInRead| -|[#9545](https://github.com/NVIDIA/spark-rapids/issues/9545)|[BUG] Dataproc 2.0 test_reading_file_rewritten_with_fastparquet tests failing| -|[#9552](https://github.com/NVIDIA/spark-rapids/issues/9552)|[BUG] Inconsistent CDH dependency overrides across submodules| -|[#9571](https://github.com/NVIDIA/spark-rapids/issues/9571)|[BUG] non-deterministic compiled SQLExecPlugin.class with scala 2.13 deployment| -|[#9569](https://github.com/NVIDIA/spark-rapids/issues/9569)|[BUG] test_window_running failed in 3.1.2+3.1.3| -|[#9480](https://github.com/NVIDIA/spark-rapids/issues/9480)|[BUG] mapInPandas doesn't invoke udf on empty partitions| -|[#8644](https://github.com/NVIDIA/spark-rapids/issues/8644)|[BUG] Parquet file with malformed dictionary does not error when loaded| -|[#9310](https://github.com/NVIDIA/spark-rapids/issues/9310)|[BUG] Improve support for reading JSON files with malformed rows| -|[#9457](https://github.com/NVIDIA/spark-rapids/issues/9457)|[BUG] CDH 332 unit tests failing| -|[#9404](https://github.com/NVIDIA/spark-rapids/issues/9404)|[BUG] Spark reports a decimal error when create lit scalar when generate Decimal(34, -5) data.| -|[#9110](https://github.com/NVIDIA/spark-rapids/issues/9110)|[BUG] GPU Reader fails due to partition column creating column larger then cudf column size limit| -|[#8631](https://github.com/NVIDIA/spark-rapids/issues/8631)|[BUG] Parquet load failure on repeated_no_annotation.parquet| -|[#9364](https://github.com/NVIDIA/spark-rapids/issues/9364)|[BUG] CUDA illegal access error is triggering split and retry logic| - -### PRs -||| -|:---|:---| -|[#10384](https://github.com/NVIDIA/spark-rapids/pull/10384)|[DOC] Update docs for 23.12.2 release [skip ci] | -|[#10341](https://github.com/NVIDIA/spark-rapids/pull/10341)|Update changelog for v23.12.2 [skip ci]| -|[#10340](https://github.com/NVIDIA/spark-rapids/pull/10340)|Copyright to 2024 [skip ci]| -|[#10323](https://github.com/NVIDIA/spark-rapids/pull/10323)|Upgrade version to 23.12.2-SNAPSHOT| -|[#10329](https://github.com/NVIDIA/spark-rapids/pull/10329)|update download page for v23.12.2 release [skip ci]| -|[#10274](https://github.com/NVIDIA/spark-rapids/pull/10274)|PythonRunner Changes| -|[#10124](https://github.com/NVIDIA/spark-rapids/pull/10124)|Update changelog for v23.12.1 [skip ci]| -|[#10123](https://github.com/NVIDIA/spark-rapids/pull/10123)|Change version to v23.12.1 [skip ci]| -|[#10122](https://github.com/NVIDIA/spark-rapids/pull/10122)|Init changelog for v23.12.1 [skip ci]| -|[#10121](https://github.com/NVIDIA/spark-rapids/pull/10121)|[DOC] update download page for db hot fix [skip ci]| -|[#10116](https://github.com/NVIDIA/spark-rapids/pull/10116)|Upgrade to 23.12.1-SNAPSHOT| -|[#10069](https://github.com/NVIDIA/spark-rapids/pull/10069)|Revert "Support split broadcast join condition into ast and non-ast […| -|[#9470](https://github.com/NVIDIA/spark-rapids/pull/9470)|Use float to string kernel| -|[#9481](https://github.com/NVIDIA/spark-rapids/pull/9481)|Use parse_url kernel for PROTOCOL parsing| -|[#9935](https://github.com/NVIDIA/spark-rapids/pull/9935)|Init 23.12 changelog [skip ci]| -|[#9943](https://github.com/NVIDIA/spark-rapids/pull/9943)|[DOC] Update docs for 23.12.0 release [skip ci]| -|[#10014](https://github.com/NVIDIA/spark-rapids/pull/10014)|Add documentation for how to run tests with a fixed datagen seed [skip ci]| -|[#9954](https://github.com/NVIDIA/spark-rapids/pull/9954)|Update private and JNI version to released 23.12.0| -|[#10009](https://github.com/NVIDIA/spark-rapids/pull/10009)|Using fix seed to unblock 23.12 release; Move the blocked issues to 24.02| -|[#10007](https://github.com/NVIDIA/spark-rapids/pull/10007)|Fix Java OOM in non-UTC case with lots of xfail (#9944)| -|[#9985](https://github.com/NVIDIA/spark-rapids/pull/9985)|Avoid allocating GPU memory out of RMM managed pool in test| -|[#9970](https://github.com/NVIDIA/spark-rapids/pull/9970)|Avoid leading and trailing zeros in test_timestamp_seconds_rounding_necessary| -|[#9978](https://github.com/NVIDIA/spark-rapids/pull/9978)|Avoid using floating point values as partition values in tests| -|[#9979](https://github.com/NVIDIA/spark-rapids/pull/9979)|Add compatibility notes for writing ORC with lost Gregorian days [skip ci]| -|[#9949](https://github.com/NVIDIA/spark-rapids/pull/9949)|Override the seed for `test_map_scalars_supported_key_types ` for version of Spark before 3.4.0 [Databricks]| -|[#9961](https://github.com/NVIDIA/spark-rapids/pull/9961)|Avoid using floating point for partition values in Delta Lake tests| -|[#9960](https://github.com/NVIDIA/spark-rapids/pull/9960)|Fix LongGen accidentally using special cases when none are desired| -|[#9950](https://github.com/NVIDIA/spark-rapids/pull/9950)|Avoid generating NaNs as partition values in test_part_write_round_trip| -|[#9940](https://github.com/NVIDIA/spark-rapids/pull/9940)|Fix 'year 0 is out of range' by setting a fix seed| -|[#9946](https://github.com/NVIDIA/spark-rapids/pull/9946)|Fix test_multi_tier_ast to ignore ordering of output rows| -|[#9928](https://github.com/NVIDIA/spark-rapids/pull/9928)|Test `inset` with `NaN` only for Spark from 3.1.3| -|[#9906](https://github.com/NVIDIA/spark-rapids/pull/9906)|Fix test_initcap to use the intended limited character set| -|[#9831](https://github.com/NVIDIA/spark-rapids/pull/9831)|Skip fastparquet timestamp tests when plugin cannot read/write timestamps| -|[#9893](https://github.com/NVIDIA/spark-rapids/pull/9893)|Add multiple expression tier regression test for AST| -|[#9889](https://github.com/NVIDIA/spark-rapids/pull/9889)|Fix test_cast_string_ts_valid_format test| -|[#9833](https://github.com/NVIDIA/spark-rapids/pull/9833)|Fix a hang for Pandas UDFs on DB 13.3| -|[#9873](https://github.com/NVIDIA/spark-rapids/pull/9873)|Add support for decimal in `to_json`| -|[#9890](https://github.com/NVIDIA/spark-rapids/pull/9890)|Remove Databricks 13.3 from release 23.12| -|[#9874](https://github.com/NVIDIA/spark-rapids/pull/9874)|Fix zero-scale floor and ceil tests| -|[#9879](https://github.com/NVIDIA/spark-rapids/pull/9879)|Fix resource leak in to_json| -|[#9600](https://github.com/NVIDIA/spark-rapids/pull/9600)|Add date and timestamp support to to_json| -|[#9871](https://github.com/NVIDIA/spark-rapids/pull/9871)|Fix test_cast_string_date_valid_format generating year 0| -|[#9885](https://github.com/NVIDIA/spark-rapids/pull/9885)|Preparation for non-UTC nightly CI [skip ci]| -|[#9810](https://github.com/NVIDIA/spark-rapids/pull/9810)|Support from_utc_timestamp on the GPU for non-UTC timezones (non-DST)| -|[#9865](https://github.com/NVIDIA/spark-rapids/pull/9865)|Fix problems with nulls in sequence tests| -|[#9864](https://github.com/NVIDIA/spark-rapids/pull/9864)|Add compatibility documentation with respect to decimal overflow detection [skip ci]| -|[#9860](https://github.com/NVIDIA/spark-rapids/pull/9860)|Fixing FAQ deadlink in plugin code [skip ci]| -|[#9840](https://github.com/NVIDIA/spark-rapids/pull/9840)|Avoid using NaNs as Delta Lake partition values| -|[#9773](https://github.com/NVIDIA/spark-rapids/pull/9773)|xfail all the impacted cases when using non-UTC time zone| -|[#9849](https://github.com/NVIDIA/spark-rapids/pull/9849)|Instantly Delete pre-merge content of stage workspace if success| -|[#9848](https://github.com/NVIDIA/spark-rapids/pull/9848)|Force datagen_seed for test_ceil_scale_zero and test_decimal_round| -|[#9677](https://github.com/NVIDIA/spark-rapids/pull/9677)|Enable build for Databricks 13.3| -|[#9809](https://github.com/NVIDIA/spark-rapids/pull/9809)|Re-enable AST string integration cases| -|[#9835](https://github.com/NVIDIA/spark-rapids/pull/9835)|Avoid pre-Gregorian dates in schema_evolution_test| -|[#9786](https://github.com/NVIDIA/spark-rapids/pull/9786)|Check paths for existence to prevent ignorable error messages during build| -|[#9824](https://github.com/NVIDIA/spark-rapids/pull/9824)|UCX 1.15 upgrade| -|[#9800](https://github.com/NVIDIA/spark-rapids/pull/9800)|Add GpuCheckOverflowInTableInsert to Databricks 11.3+| -|[#9821](https://github.com/NVIDIA/spark-rapids/pull/9821)|Update timestamp gens to avoid "year 0 is out of range" errors| -|[#9826](https://github.com/NVIDIA/spark-rapids/pull/9826)|Set seed to 0 for test_hash_reduction_sum| -|[#9720](https://github.com/NVIDIA/spark-rapids/pull/9720)|Support timestamp in `from_json`| -|[#9818](https://github.com/NVIDIA/spark-rapids/pull/9818)|Specify nullable=False when generating filter values in dpp tests| -|[#9689](https://github.com/NVIDIA/spark-rapids/pull/9689)|Support CPU path for from_utc_timestamp function with timezone | -|[#9769](https://github.com/NVIDIA/spark-rapids/pull/9769)|Use withGpuSparkSession to customize SparkConf| -|[#9780](https://github.com/NVIDIA/spark-rapids/pull/9780)|Fix NaN handling in GpuLessThanOrEqual and GpuGreaterThanOrEqual| -|[#9795](https://github.com/NVIDIA/spark-rapids/pull/9795)|xfail AST string tests| -|[#9666](https://github.com/NVIDIA/spark-rapids/pull/9666)|Add support for parsing strings as dates in `from_json`| -|[#9673](https://github.com/NVIDIA/spark-rapids/pull/9673)|Fix the broadcast joins issues caused by InputFileBlockRule| -|[#9785](https://github.com/NVIDIA/spark-rapids/pull/9785)|Force datagen_seed for 9781 and 9784 [skip ci]| -|[#9765](https://github.com/NVIDIA/spark-rapids/pull/9765)|Let GPU scans fall back when default values exist in schema| -|[#9729](https://github.com/NVIDIA/spark-rapids/pull/9729)|Fix Delta Lake atomic table operations on spark341db| -|[#9770](https://github.com/NVIDIA/spark-rapids/pull/9770)|[BUG] Fix the doc for Maven and Scala 2.13 test example [skip ci]| -|[#9761](https://github.com/NVIDIA/spark-rapids/pull/9761)|Fix bug in tagging of JsonToStructs| -|[#9758](https://github.com/NVIDIA/spark-rapids/pull/9758)|Remove forced seed from Delta Lake part_write_round_trip_unmanaged tests| -|[#9652](https://github.com/NVIDIA/spark-rapids/pull/9652)|Add time zone config to set non-UTC| -|[#9736](https://github.com/NVIDIA/spark-rapids/pull/9736)|Fix `TimestampGen` to generate value not too close to the minimum allowed timestamp| -|[#9698](https://github.com/NVIDIA/spark-rapids/pull/9698)|Speed up build: unnecessary invalidation in the incremental recompile mode| -|[#9748](https://github.com/NVIDIA/spark-rapids/pull/9748)|Fix Delta Lake part_write_round_trip_unmanaged tests with floating point| -|[#9702](https://github.com/NVIDIA/spark-rapids/pull/9702)|Support split BroadcastNestedLoopJoin condition for AST and non-AST| -|[#9746](https://github.com/NVIDIA/spark-rapids/pull/9746)|Force test_hypot to be single seed for now| -|[#9745](https://github.com/NVIDIA/spark-rapids/pull/9745)|Avoid generating null filter values in test_delta_dfp_reuse_broadcast_exchange| -|[#9741](https://github.com/NVIDIA/spark-rapids/pull/9741)|Set seed=0 for the delta lake part roundtrip tests| -|[#9660](https://github.com/NVIDIA/spark-rapids/pull/9660)|Fully support date/time legacy rebase for nested input| -|[#9672](https://github.com/NVIDIA/spark-rapids/pull/9672)|Support String type for AST| -|[#9716](https://github.com/NVIDIA/spark-rapids/pull/9716)|Initiate project version 24.02.0-SNAPSHOT| -|[#9732](https://github.com/NVIDIA/spark-rapids/pull/9732)|Temporarily force `datagen_seed=0` for `test_re_replace_all` to unblock CI| -|[#9726](https://github.com/NVIDIA/spark-rapids/pull/9726)|Fix leak in BatchWithPartitionData| -|[#9717](https://github.com/NVIDIA/spark-rapids/pull/9717)|Encode the file path from Iceberg when converting to a PartitionedFile| -|[#9441](https://github.com/NVIDIA/spark-rapids/pull/9441)|Add a random seed specific to datagen cases| -|[#9649](https://github.com/NVIDIA/spark-rapids/pull/9649)|Support `spark.sql.parquet.datetimeRebaseModeInRead=LEGACY` and `spark.sql.parquet.int96RebaseModeInRead=LEGACY`| -|[#9612](https://github.com/NVIDIA/spark-rapids/pull/9612)|Escape quotes and newlines when converting strings to json format in to_json| -|[#9644](https://github.com/NVIDIA/spark-rapids/pull/9644)|Add Partial Delta Lake Support for Databricks 13.3| -|[#9690](https://github.com/NVIDIA/spark-rapids/pull/9690)|Changed `extractExecutedPlan` to consider ResultQueryStageExec for Databricks 13.3| -|[#9686](https://github.com/NVIDIA/spark-rapids/pull/9686)|Removed Maven Profiles From `tests/pom.xml`| -|[#9509](https://github.com/NVIDIA/spark-rapids/pull/9509)|Fine-grained spill metrics| -|[#9658](https://github.com/NVIDIA/spark-rapids/pull/9658)|Support `spark.sql.parquet.int96RebaseModeInWrite=LEGACY`| -|[#9695](https://github.com/NVIDIA/spark-rapids/pull/9695)|Revert "Support split non-AST-able join condition for BroadcastNested…| -|[#9693](https://github.com/NVIDIA/spark-rapids/pull/9693)|Enable automerge from 23.12 to 24.02 [skip ci]| -|[#9679](https://github.com/NVIDIA/spark-rapids/pull/9679)|[Doc] update the dead link in download page [skip ci]| -|[#9678](https://github.com/NVIDIA/spark-rapids/pull/9678)|Add flow control for multithreaded shuffle writer| -|[#9635](https://github.com/NVIDIA/spark-rapids/pull/9635)|Support split non-AST-able join condition for BroadcastNestedLoopJoin| -|[#9646](https://github.com/NVIDIA/spark-rapids/pull/9646)|Fix Integration Test Failures for Databricks 13.3 Support| -|[#9670](https://github.com/NVIDIA/spark-rapids/pull/9670)|Normalize file timezone and handle missing file timezone in datetimeRebaseUtils| -|[#9657](https://github.com/NVIDIA/spark-rapids/pull/9657)|Update verify check to handle new pom files [skip ci]| -|[#9663](https://github.com/NVIDIA/spark-rapids/pull/9663)|Making User Guide info in bold and adding it as top right link in github.io [skip ci]| -|[#9609](https://github.com/NVIDIA/spark-rapids/pull/9609)|Add valid retry solution to mvn-verify [skip ci]| -|[#9655](https://github.com/NVIDIA/spark-rapids/pull/9655)|Document problem with handling of invalid characters in CSV reader| -|[#9620](https://github.com/NVIDIA/spark-rapids/pull/9620)|Add support for parsing boolean values in `from_json`| -|[#9615](https://github.com/NVIDIA/spark-rapids/pull/9615)|Bloop updates - require JDK11 in buildall + docs, build bloop for all targets.| -|[#9631](https://github.com/NVIDIA/spark-rapids/pull/9631)|Refactor Parquet readers| -|[#9637](https://github.com/NVIDIA/spark-rapids/pull/9637)|Added Support For Various Execs for Databricks 13.3 | -|[#9640](https://github.com/NVIDIA/spark-rapids/pull/9640)|Add support for `ignoreNullFields=false` in `to_json`| -|[#9623](https://github.com/NVIDIA/spark-rapids/pull/9623)|Running window optimization for `LAST()`| -|[#9641](https://github.com/NVIDIA/spark-rapids/pull/9641)|Revert "Support rebase checking for nested dates and timestamps (#9617)"| -|[#9423](https://github.com/NVIDIA/spark-rapids/pull/9423)|Re-enable `from_json` / `JsonToStructs`| -|[#9624](https://github.com/NVIDIA/spark-rapids/pull/9624)|Add jenkins-level retry for pre-merge build in databricks runtimes| -|[#9608](https://github.com/NVIDIA/spark-rapids/pull/9608)|Fix nullability issues for some decimal operations| -|[#9617](https://github.com/NVIDIA/spark-rapids/pull/9617)|Support rebase checking for nested dates and timestamps| -|[#9611](https://github.com/NVIDIA/spark-rapids/pull/9611)|Move simple classes after refactoring to sql-plugin-api| -|[#9618](https://github.com/NVIDIA/spark-rapids/pull/9618)|Remove unused dataTypes argument from HostShuffleCoalesceIterator| -|[#9626](https://github.com/NVIDIA/spark-rapids/pull/9626)|Fix ENV typo in pre-merge github actions [skip ci]| -|[#9593](https://github.com/NVIDIA/spark-rapids/pull/9593)|PythonRunner and RapidsErrorUtils Changes For Databricks 13.3| -|[#9607](https://github.com/NVIDIA/spark-rapids/pull/9607)|Integration tests: Install specific fastparquet version.| -|[#9610](https://github.com/NVIDIA/spark-rapids/pull/9610)|Propagate local properties to broadcast execs| -|[#9544](https://github.com/NVIDIA/spark-rapids/pull/9544)|Support batching for `RANGE` running window aggregations. Including on| -|[#9601](https://github.com/NVIDIA/spark-rapids/pull/9601)|Remove usage of deprecated scala.Proxy| -|[#9591](https://github.com/NVIDIA/spark-rapids/pull/9591)|Enable implicit JDK profile activation| -|[#9586](https://github.com/NVIDIA/spark-rapids/pull/9586)|Merge metrics and file format fixes to Delta 2.4 support| -|[#9594](https://github.com/NVIDIA/spark-rapids/pull/9594)|Revert "Ignore failing Parquet filter test to unblock CI (#9519)"| -|[#9454](https://github.com/NVIDIA/spark-rapids/pull/9454)|Support encryption and compression in disk store| -|[#9439](https://github.com/NVIDIA/spark-rapids/pull/9439)|Support stack function| -|[#9583](https://github.com/NVIDIA/spark-rapids/pull/9583)|Fix fastparquet tests to work with HDFS| -|[#9508](https://github.com/NVIDIA/spark-rapids/pull/9508)|Consolidate deps switching in an intermediate pom| -|[#9562](https://github.com/NVIDIA/spark-rapids/pull/9562)|Delta Lake 2.3.0 support| -|[#9576](https://github.com/NVIDIA/spark-rapids/pull/9576)|Move Stack classes to wrapper classes to fix non-deterministic build issue| -|[#9572](https://github.com/NVIDIA/spark-rapids/pull/9572)|Add retry for CrossJoinIterator and ConditionalNestedLoopJoinIterator| -|[#9575](https://github.com/NVIDIA/spark-rapids/pull/9575)|Fix `test_window_running*()` for `NTH_VALUE IGNORE NULLS`.| -|[#9574](https://github.com/NVIDIA/spark-rapids/pull/9574)|Fix broken #endif scala comments [skip ci]| -|[#9568](https://github.com/NVIDIA/spark-rapids/pull/9568)|Enforce Apache 3.3.0+ for Scala 2.13| -|[#9557](https://github.com/NVIDIA/spark-rapids/pull/9557)|Support launching Map Pandas UDF on empty partitions| -|[#9489](https://github.com/NVIDIA/spark-rapids/pull/9489)|Batching support for ROW-based `FIRST()` window function| -|[#9510](https://github.com/NVIDIA/spark-rapids/pull/9510)|Add Databricks 13.3 shim boilerplate code and refactor Databricks 12.2 shim| -|[#9554](https://github.com/NVIDIA/spark-rapids/pull/9554)|Fix fastparquet installation for| -|[#9536](https://github.com/NVIDIA/spark-rapids/pull/9536)|Add CPU POC of TimeZoneDB; Test some time zones by comparing CPU POC and Spark| -|[#9558](https://github.com/NVIDIA/spark-rapids/pull/9558)|Support integration test against scala2.13 spark binaries[skip ci]| -|[#8592](https://github.com/NVIDIA/spark-rapids/pull/8592)|Scala 2.13 Support| -|[#9551](https://github.com/NVIDIA/spark-rapids/pull/9551)|Enable malformed Parquet failure test| -|[#9546](https://github.com/NVIDIA/spark-rapids/pull/9546)|Support OverwriteByExpressionExecV1 for Delta Lake tables| -|[#9527](https://github.com/NVIDIA/spark-rapids/pull/9527)|Support Split And Retry for GpuProjectAstExec| -|[#9541](https://github.com/NVIDIA/spark-rapids/pull/9541)|Move simple classes to API| -|[#9548](https://github.com/NVIDIA/spark-rapids/pull/9548)|Append new authorized user to blossom-ci whitelist [skip ci]| -|[#9418](https://github.com/NVIDIA/spark-rapids/pull/9418)|Fix STRUCT comparison between Pandas and Spark dataframes in fastparquet tests| -|[#9468](https://github.com/NVIDIA/spark-rapids/pull/9468)|Add SplitAndRetry to GpuRunningWindowIterator| -|[#9486](https://github.com/NVIDIA/spark-rapids/pull/9486)|Add partial support for `to_json`| -|[#9538](https://github.com/NVIDIA/spark-rapids/pull/9538)|Fix tiered project breaking higher order functions| -|[#9539](https://github.com/NVIDIA/spark-rapids/pull/9539)|Add delta-24x to delta-lake/README.md [skip ci]| -|[#9534](https://github.com/NVIDIA/spark-rapids/pull/9534)|Add pyarrow tests for Databricks runtime| -|[#9444](https://github.com/NVIDIA/spark-rapids/pull/9444)|Remove redundant pass-through shuffle manager classes| -|[#9531](https://github.com/NVIDIA/spark-rapids/pull/9531)|Fix relative path for spark-shell nightly test [skip ci]| -|[#9525](https://github.com/NVIDIA/spark-rapids/pull/9525)|Follow-up to dbdeps consolidation| -|[#9506](https://github.com/NVIDIA/spark-rapids/pull/9506)|Move ProxyShuffleInternalManagerBase to api| -|[#9504](https://github.com/NVIDIA/spark-rapids/pull/9504)|Add a spark-shell smoke test to premerge and nightly| -|[#9519](https://github.com/NVIDIA/spark-rapids/pull/9519)|Ignore failing Parquet filter test to unblock CI| -|[#9478](https://github.com/NVIDIA/spark-rapids/pull/9478)|Support AppendDataExecV1 for Delta Lake tables| -|[#9366](https://github.com/NVIDIA/spark-rapids/pull/9366)|Add tests to check compatibility with `fastparquet`| -|[#9419](https://github.com/NVIDIA/spark-rapids/pull/9419)|Add retry to RoundRobin Partitioner and Range Partitioner| -|[#9502](https://github.com/NVIDIA/spark-rapids/pull/9502)|Install Dependencies Needed For Databricks 13.3| -|[#9296](https://github.com/NVIDIA/spark-rapids/pull/9296)|Implement `percentile` aggregation| -|[#9488](https://github.com/NVIDIA/spark-rapids/pull/9488)|Add Shim JSON Headers for Databricks 13.3| -|[#9443](https://github.com/NVIDIA/spark-rapids/pull/9443)|Add AtomicReplaceTableAsSelectExec support for Delta Lake| -|[#9476](https://github.com/NVIDIA/spark-rapids/pull/9476)|Refactor common Delta Lake test code| -|[#9463](https://github.com/NVIDIA/spark-rapids/pull/9463)|Fix Cloudera 3.3.2 shim for handling CheckOverflowInTableInsert and orc zstd support| -|[#9460](https://github.com/NVIDIA/spark-rapids/pull/9460)|Update links in old release notes to new doc locations [skip ci]| -|[#9405](https://github.com/NVIDIA/spark-rapids/pull/9405)|Wrap scalar generation into spark session in integration test| -|[#9459](https://github.com/NVIDIA/spark-rapids/pull/9459)|Fix 332cdh build [skip ci]| -|[#9425](https://github.com/NVIDIA/spark-rapids/pull/9425)|Add support for AtomicCreateTableAsSelect with Delta Lake| -|[#9434](https://github.com/NVIDIA/spark-rapids/pull/9434)|Add retry support to `HostToGpuCoalesceIterator.concatAllAndPutOnGPU`| -|[#9453](https://github.com/NVIDIA/spark-rapids/pull/9453)|Update codeowner and blossom-ci ACL [skip ci]| -|[#9396](https://github.com/NVIDIA/spark-rapids/pull/9396)|Add support for Cloudera CDS-3.3.2| -|[#9380](https://github.com/NVIDIA/spark-rapids/pull/9380)|Fix parsing of Parquet legacy list-of-struct format| -|[#9438](https://github.com/NVIDIA/spark-rapids/pull/9438)|Fix auto merge conflict 9437 [skip ci]| -|[#9424](https://github.com/NVIDIA/spark-rapids/pull/9424)|Refactor aggregate functions| -|[#9414](https://github.com/NVIDIA/spark-rapids/pull/9414)|Add retry to GpuHashJoin.filterNulls| -|[#9388](https://github.com/NVIDIA/spark-rapids/pull/9388)|Add developer documentation about working with data sources [skip ci]| -|[#9369](https://github.com/NVIDIA/spark-rapids/pull/9369)|Improve JSON empty row fix to use less memory| -|[#9373](https://github.com/NVIDIA/spark-rapids/pull/9373)|Fix auto merge conflict 9372| -|[#9308](https://github.com/NVIDIA/spark-rapids/pull/9308)|Initiate arm64 CI support [skip ci]| -|[#9292](https://github.com/NVIDIA/spark-rapids/pull/9292)|Init project version 23.12.0-SNAPSHOT| - - -## Release 23.10 - -### Features -||| -|:---|:---| -|[#9220](https://github.com/NVIDIA/spark-rapids/issues/9220)|[FEA] Add GPU support for converting binary data to a hex string in REPL| -|[#9171](https://github.com/NVIDIA/spark-rapids/issues/9171)|[FEA] Add GPU version of ToPrettyString| -|[#5314](https://github.com/NVIDIA/spark-rapids/issues/5314)|[FEA] Support window.rowsBetween(Window.unboundedPreceding, -1) | -|[#9057](https://github.com/NVIDIA/spark-rapids/issues/9057)|[FEA] Add unbounded to unbounded fixers for min and max| -|[#8121](https://github.com/NVIDIA/spark-rapids/issues/8121)|[FEA] Add Spark 3.5.0 shim layer| -|[#9224](https://github.com/NVIDIA/spark-rapids/issues/9224)|[FEA] Allow } and }} to be transpiled to static strings| -|[#8596](https://github.com/NVIDIA/spark-rapids/issues/8596)|[FEA] Support spark.sql.legacy.parquet.datetimeRebaseModeInWrite=LEGACY| -|[#8767](https://github.com/NVIDIA/spark-rapids/issues/8767)|[AUDIT][SPARK-43302][SQL] Make Python UDAF an AggregateFunction| -|[#9055](https://github.com/NVIDIA/spark-rapids/issues/9055)|[FEA] Support Spark 3.3.3 official release| -|[#8672](https://github.com/NVIDIA/spark-rapids/issues/8672)|[FEA] Make GPU readers easier to debug on failure (any failure including OOM)| -|[#8965](https://github.com/NVIDIA/spark-rapids/issues/8965)|[FEA] Enable Bloom filter join acceleration by default| -|[#8625](https://github.com/NVIDIA/spark-rapids/issues/8625)|[FEA] Support outputTimestampType being INT96| - -### Performance -||| -|:---|:---| -|[#9512](https://github.com/NVIDIA/spark-rapids/issues/9512)|[DOC] Multi-Threaded shuffle documentation is not accurate on the read side| -|[#7803](https://github.com/NVIDIA/spark-rapids/issues/7803)|[FEA] Accelerate Bloom filtered joins | - -### Bugs Fixed -||| -|:---|:---| -|[#8662](https://github.com/NVIDIA/spark-rapids/issues/8662)|[BUG] Dataproc spark-rapids.sh fails due to cuda driver version issue| -|[#9428](https://github.com/NVIDIA/spark-rapids/issues/9428)|[Audit] SPARK-44448 Wrong results for dense_rank() <= k| -|[#9485](https://github.com/NVIDIA/spark-rapids/issues/9485)|[BUG] GpuSemaphore can deadlock if there are multiple threads per task| -|[#9498](https://github.com/NVIDIA/spark-rapids/issues/9498)|[BUG] spark 3.5.0 shim spark-shell is broken in spark-rapids 23.10 and 23.12| -|[#9060](https://github.com/NVIDIA/spark-rapids/issues/9060)|[BUG] OOM error in split and retry with multifile coalesce reader with parquet data| -|[#8916](https://github.com/NVIDIA/spark-rapids/issues/8916)|[BUG] Databricks - move init scripts off DBFS| -|[#9416](https://github.com/NVIDIA/spark-rapids/issues/9416)|[BUG] CDH build failed due to missing dependencies| -|[#9357](https://github.com/NVIDIA/spark-rapids/issues/9357)|[BUG] json_test failed on "NameError: name 'TimestampNTZType' is not defined"| -|[#9271](https://github.com/NVIDIA/spark-rapids/issues/9271)|[BUG] ThreadPool size is deduced incorrectly in MultiFileReaderThreadPool on YARN clusters| -|[#9309](https://github.com/NVIDIA/spark-rapids/issues/9309)|[BUG] bround and round do not return the correct result for some decimal values.| -|[#9153](https://github.com/NVIDIA/spark-rapids/issues/9153)|[BUG] netty OOM with MULTITHREADED shuffle| -|[#9311](https://github.com/NVIDIA/spark-rapids/issues/9311)|[BUG] test_hash_groupby_collect_list fails| -|[#9180](https://github.com/NVIDIA/spark-rapids/issues/9180)|[FEA][AUDIT][SPARK-44641] Incorrect result in certain scenarios when SPJ is not triggered| -|[#9290](https://github.com/NVIDIA/spark-rapids/issues/9290)|[BUG] delta_lake_test FAILED on "column mapping mode id is not supported for this Delta version"| -|[#9255](https://github.com/NVIDIA/spark-rapids/issues/9255)|[BUG] Unable to read DeltaTable with columnMapping.mode = name| -|[#9261](https://github.com/NVIDIA/spark-rapids/issues/9261)|[BUG] Leaks and Double Frees in Unit Tests| -|[#9246](https://github.com/NVIDIA/spark-rapids/issues/9246)|[BUG] `test_predefined_character_classes` failed with seed 4| -|[#9208](https://github.com/NVIDIA/spark-rapids/issues/9208)|[BUG] SplitAndRetryOOM query14_part1 at 100TB with spark.executor.cores=64| -|[#9106](https://github.com/NVIDIA/spark-rapids/issues/9106)|[BUG] Configuring GDS breaks new host spillable buffers and batches| -|[#9131](https://github.com/NVIDIA/spark-rapids/issues/9131)|[BUG] ConcurrentModificationException in ScalableTaskCompletion| -|[#9263](https://github.com/NVIDIA/spark-rapids/issues/9263)|[BUG] Unit test logging is not captured when running against Spark 3.5.0| -|[#9168](https://github.com/NVIDIA/spark-rapids/issues/9168)|[BUG] Calling RmmSpark.getAndResetNumRetryThrow from tests is not working| -|[#8776](https://github.com/NVIDIA/spark-rapids/issues/8776)|[BUG] FileCacheIntegrationSuite intermittent failure| -|[#9223](https://github.com/NVIDIA/spark-rapids/issues/9223)|[BUG] Failed to create memory map on query14_part1 at 100TB with spark.executor.cores=64| -|[#9116](https://github.com/NVIDIA/spark-rapids/issues/9116)|[BUG] spark350 shim build failed in mvn-verify github checks and nightly due to dependencies not released| -|[#8984](https://github.com/NVIDIA/spark-rapids/issues/8984)|[BUG] Check that keys are not null when creating a map| -|[#9233](https://github.com/NVIDIA/spark-rapids/issues/9233)|[BUG] test_parquet_testing_error_files - Failed: DID NOT RAISE in databricks runtime 12.2| -|[#9142](https://github.com/NVIDIA/spark-rapids/issues/9142)|[BUG] AWS EMR 6.12 NDS SF3k query9 Failure on g4dn.4xlarge| -|[#9214](https://github.com/NVIDIA/spark-rapids/issues/9214)|[BUG] mvn resolve dependencies failed missing rapids-4-spark-sql-plugin-api_2.12 of 311 shim| -|[#9204](https://github.com/NVIDIA/spark-rapids/issues/9204)|[BUG] SplitAndRetryOOM query78 at 100TB with spark.executor.cores=64| -|[#9213](https://github.com/NVIDIA/spark-rapids/issues/9213)|[BUG] Missing revision info in databricks shims failed nightly build| -|[#9206](https://github.com/NVIDIA/spark-rapids/issues/9206)|[BUG] test_datetime_roundtrip_with_legacy_rebase failed in databricks runtimes| -|[#9165](https://github.com/NVIDIA/spark-rapids/issues/9165)|[BUG] Data gen for key groups produces type-mismatch columns| -|[#9129](https://github.com/NVIDIA/spark-rapids/issues/9129)|[BUG] Writing Parquet map(map) column can not set the outer key as non-null.| -|[#9194](https://github.com/NVIDIA/spark-rapids/issues/9194)|[BUG] missing sql-plugin-api databricks artifacts in the nightly CI | -|[#9167](https://github.com/NVIDIA/spark-rapids/issues/9167)|[BUG] Ensure no udf-compiler internal nodes escape| -|[#9092](https://github.com/NVIDIA/spark-rapids/issues/9092)|[BUG] NDS query 64 falls back to CPU only for a shuffle| -|[#9071](https://github.com/NVIDIA/spark-rapids/issues/9071)|[BUG] `test_numeric_running_sum_window_no_part_unbounded` failed in MT tests| -|[#9154](https://github.com/NVIDIA/spark-rapids/issues/9154)|[BUG] Spark 3.5.0 nightly build failures (test_parquet_testing_error_files)| -|[#9149](https://github.com/NVIDIA/spark-rapids/issues/9149)|[BUG] compile failed in databricks runtimes due to new added TestReport| -|[#9041](https://github.com/NVIDIA/spark-rapids/issues/9041)|[BUG] Fix regression in Python UDAF support when running against Spark 3.5.0| -|[#9064](https://github.com/NVIDIA/spark-rapids/issues/9064)|[BUG][Spark 3.5.0] Re-enable test_hive_empty_simple_udf when 3.5.0-rc2 is available| -|[#9065](https://github.com/NVIDIA/spark-rapids/issues/9065)|[BUG][Spark 3.5.0] Reinstate cast map/array to string tests when 3.5.0-rc2 is available| -|[#9119](https://github.com/NVIDIA/spark-rapids/issues/9119)|[BUG] Predicate pushdown doesn't work for parquet files written by GPU| -|[#9103](https://github.com/NVIDIA/spark-rapids/issues/9103)|[BUG] test_select_complex_field fails in MT tests| -|[#9086](https://github.com/NVIDIA/spark-rapids/issues/9086)|[BUG] GpuBroadcastNestedLoopJoinExec can assert in doUnconditionalJoin| -|[#8939](https://github.com/NVIDIA/spark-rapids/issues/8939)|[BUG] q95 odd task failure in query95 at 30TB| -|[#9082](https://github.com/NVIDIA/spark-rapids/issues/9082)|[BUG] Race condition while spilling and aliasing a RapidsBuffer (regression)| -|[#9069](https://github.com/NVIDIA/spark-rapids/issues/9069)|[BUG] ParquetFormatScanSuite does not pass locally| -|[#8980](https://github.com/NVIDIA/spark-rapids/issues/8980)|[BUG] invalid escape sequences in pytests| -|[#7807](https://github.com/NVIDIA/spark-rapids/issues/7807)|[BUG] Round robin partitioning sort check falls back to CPU for cases that can be supported| -|[#8482](https://github.com/NVIDIA/spark-rapids/issues/8482)|[BUG] Potential leak on SplitAndRetry when iterator not fully drained| -|[#8942](https://github.com/NVIDIA/spark-rapids/issues/8942)|[BUG] NDS query 14 parts 1 and 2 both fail at SF100K| -|[#8778](https://github.com/NVIDIA/spark-rapids/issues/8778)|[BUG] GPU Parquet output for TIMESTAMP_MICROS is misinteterpreted by fastparquet as nanos| - -### PRs -||| -|:---|:---| -|[#9304](https://github.com/NVIDIA/spark-rapids/pull/9304)|Specify recoverWithNull when reading JSON files| -|[#9474](https://github.com/NVIDIA/spark-rapids/pull/9474)| Improve configuration handling in BatchWithPartitionData| -|[#9289](https://github.com/NVIDIA/spark-rapids/pull/9289)|Add tests to check compatibility with pyarrow| -|[#9522](https://github.com/NVIDIA/spark-rapids/pull/9522)|Update 23.10 changelog [skip ci]| -|[#9501](https://github.com/NVIDIA/spark-rapids/pull/9501)|Fix GpuSemaphore to support multiple threads per task| -|[#9500](https://github.com/NVIDIA/spark-rapids/pull/9500)|Fix Spark 3.5.0 shell classloader issue with the plugin| -|[#9230](https://github.com/NVIDIA/spark-rapids/pull/9230)|Fix reading partition value columns larger than cudf column size limit| -|[#9427](https://github.com/NVIDIA/spark-rapids/pull/9427)|[DOC] Update docs for 23.10.0 release [skip ci]| -|[#9421](https://github.com/NVIDIA/spark-rapids/pull/9421)|Init changelog of 23.10 [skip ci]| -|[#9445](https://github.com/NVIDIA/spark-rapids/pull/9445)|Only run test_csv_infer_schema_timestamp_ntz tests with PySpark >= 3.4.1| -|[#9420](https://github.com/NVIDIA/spark-rapids/pull/9420)|Update private and jni dep version to released 23.10.0| -|[#9415](https://github.com/NVIDIA/spark-rapids/pull/9415)|[BUG] fix docker modified check in premerge [skip ci]| -|[#9407](https://github.com/NVIDIA/spark-rapids/pull/9407)|[Doc]Update docs for 23.08.2 version[skip ci]| -|[#9392](https://github.com/NVIDIA/spark-rapids/pull/9392)|Only run test_json_ts_formats_round_trip_ntz tests with PySpark >= 3.4.1| -|[#9401](https://github.com/NVIDIA/spark-rapids/pull/9401)|Remove using mamba before they fix the incompatibility issue [skip ci]| -|[#9381](https://github.com/NVIDIA/spark-rapids/pull/9381)|Change the executor core calculation to take into account the cluster manager| -|[#9351](https://github.com/NVIDIA/spark-rapids/pull/9351)|Put back in full decimal support for format_number| -|[#9374](https://github.com/NVIDIA/spark-rapids/pull/9374)|GpuCoalesceBatches should throw SplitAndRetyOOM on GPU OOM error| -|[#9238](https://github.com/NVIDIA/spark-rapids/pull/9238)|Simplified handling of GPU core dumps| -|[#9362](https://github.com/NVIDIA/spark-rapids/pull/9362)|[DOC] Removing User Guide pages that will be source of truth on docs.nvidia…| -|[#9365](https://github.com/NVIDIA/spark-rapids/pull/9365)|Update DataWriteCommandExec docs to reflect ORC support for nested types| -|[#9277](https://github.com/NVIDIA/spark-rapids/pull/9277)|[Doc]Remove CUDA related requirement from download page.[Skip CI]| -|[#9352](https://github.com/NVIDIA/spark-rapids/pull/9352)|Refine rules for skipping `test_csv_infer_schema_timestamp_ntz_*` tests| -|[#9334](https://github.com/NVIDIA/spark-rapids/pull/9334)|Add NaNs to Data Generators In Floating-Point Testing| -|[#9344](https://github.com/NVIDIA/spark-rapids/pull/9344)|Update MULTITHREADED shuffle maxBytesInFlight default to 128MB| -|[#9330](https://github.com/NVIDIA/spark-rapids/pull/9330)|Add Hao to blossom-ci whitelist| -|[#9328](https://github.com/NVIDIA/spark-rapids/pull/9328)|Building different Cuda versions section profile does not take effect [skip ci]| -|[#9329](https://github.com/NVIDIA/spark-rapids/pull/9329)|Add kuhushukla to blossom ci yml| -|[#9281](https://github.com/NVIDIA/spark-rapids/pull/9281)|Support `format_number`| -|[#9335](https://github.com/NVIDIA/spark-rapids/pull/9335)|Temporarily skip failing tests test_csv_infer_schema_timestamp_ntz*| -|[#9318](https://github.com/NVIDIA/spark-rapids/pull/9318)|Update authorized user in blossom-ci whitelist [skip ci]| -|[#9221](https://github.com/NVIDIA/spark-rapids/pull/9221)|Add GPU version of ToPrettyString| -|[#9321](https://github.com/NVIDIA/spark-rapids/pull/9321)|[DOC] Fix some incorrect config links in doc [skip ci]| -|[#9314](https://github.com/NVIDIA/spark-rapids/pull/9314)|Fix RMM crash in FileCacheIntegrationSuite with ARENA memory allocator| -|[#9287](https://github.com/NVIDIA/spark-rapids/pull/9287)|Allow checkpoint and restore on non-deterministic expressions in GpuFilter and GpuProject| -|[#9146](https://github.com/NVIDIA/spark-rapids/pull/9146)|Improve some CSV integration tests| -|[#9159](https://github.com/NVIDIA/spark-rapids/pull/9159)|Update tests and documentation for `spark.sql.timestampType` when reading CSV/JSON| -|[#9313](https://github.com/NVIDIA/spark-rapids/pull/9313)|Sort results of collect_list test before comparing since it is not guaranteed| -|[#9286](https://github.com/NVIDIA/spark-rapids/pull/9286)|[FEA][AUDIT][SPARK-44641] Incorrect result in certain scenarios when SPJ is not triggered| -|[#9229](https://github.com/NVIDIA/spark-rapids/pull/9229)|Support negative preceding/following for ROW-based window functions| -|[#9297](https://github.com/NVIDIA/spark-rapids/pull/9297)|Append new authorized user to blossom-ci whitelist [skip ci]| -|[#9294](https://github.com/NVIDIA/spark-rapids/pull/9294)|Fix test_delta_read_column_mapping test failures on Spark 3.2.x and 3.3.x| -|[#9285](https://github.com/NVIDIA/spark-rapids/pull/9285)|Add CastOptions to make GpuCast extendible to handle more options| -|[#9279](https://github.com/NVIDIA/spark-rapids/pull/9279)|Fix file format checks to be exact and handle Delta Lake column mapping| -|[#9283](https://github.com/NVIDIA/spark-rapids/pull/9283)|Refactor ExternalSource to move some APIs to converted GPU format or scan| -|[#9264](https://github.com/NVIDIA/spark-rapids/pull/9264)|Fix leak in test and double free in corner case| -|[#9280](https://github.com/NVIDIA/spark-rapids/pull/9280)|Fix some issues found with different seeds in integration tests| -|[#9257](https://github.com/NVIDIA/spark-rapids/pull/9257)|Have host spill use the new HostAlloc API| -|[#9253](https://github.com/NVIDIA/spark-rapids/pull/9253)|Enforce Scala method syntax over deprecated procedure syntax| -|[#9273](https://github.com/NVIDIA/spark-rapids/pull/9273)|Add arm64 profile to build arm artifacts| -|[#9270](https://github.com/NVIDIA/spark-rapids/pull/9270)|Remove GDS spilling| -|[#9267](https://github.com/NVIDIA/spark-rapids/pull/9267)|Roll our own BufferedIterator so we can close cleanly| -|[#9266](https://github.com/NVIDIA/spark-rapids/pull/9266)|Specify correct dependency versions for 350 build| -|[#9262](https://github.com/NVIDIA/spark-rapids/pull/9262)|Add Delta Lake support for Spark 3.4.1 and Delta Lake tests on Spark 3.4.x| -|[#9256](https://github.com/NVIDIA/spark-rapids/pull/9256)|Test Parquet double column stat without NaN| -|[#9254](https://github.com/NVIDIA/spark-rapids/pull/9254)|[Doc]update the emr getting started doc for emr-6130 release[skip ci]| -|[#9228](https://github.com/NVIDIA/spark-rapids/pull/9228)|Add in unbounded to unbounded optimization for min/max| -|[#9252](https://github.com/NVIDIA/spark-rapids/pull/9252)|Add Spark 3.5.0 to list of supported Spark versions [skip ci]| -|[#9251](https://github.com/NVIDIA/spark-rapids/pull/9251)|Enable a couple of retry asserts in internal row to cudf row iterator suite| -|[#9239](https://github.com/NVIDIA/spark-rapids/pull/9239)|Handle escaping the dangling right ] and right } in the regexp transpiler| -|[#9090](https://github.com/NVIDIA/spark-rapids/pull/9090)|Add test cases for Parquet statistics| -|[#9240](https://github.com/NVIDIA/spark-rapids/pull/9240)|Fix flaky ORC filecache test| -|[#9053](https://github.com/NVIDIA/spark-rapids/pull/9053)|[DOC] update the turning guide document issues [skip ci]| -|[#9211](https://github.com/NVIDIA/spark-rapids/pull/9211)|Allow skipping host spill for a direct device->disk spill| -|[#9234](https://github.com/NVIDIA/spark-rapids/pull/9234)|Enable Spark 350 builds| -|[#9237](https://github.com/NVIDIA/spark-rapids/pull/9237)|Check for null keys when creating map| -|[#9235](https://github.com/NVIDIA/spark-rapids/pull/9235)|xfail fixed_length_byte_array.parquet test due to rapidsai/cudf#14104| -|[#9231](https://github.com/NVIDIA/spark-rapids/pull/9231)|Use conda libmamba solver to resolve intermittent libarchive issue [skip ci]| -|[#8404](https://github.com/NVIDIA/spark-rapids/pull/8404)|Add in support for FIXED_LEN_BYTE_ARRAY as binary| -|[#9225](https://github.com/NVIDIA/spark-rapids/pull/9225)|Add in a HostAlloc API for high priority and add in spilling| -|[#9207](https://github.com/NVIDIA/spark-rapids/pull/9207)|Support SplitAndRetry for GpuRangeExec| -|[#9217](https://github.com/NVIDIA/spark-rapids/pull/9217)|Fix leak in aggregate when there are retries| -|[#9200](https://github.com/NVIDIA/spark-rapids/pull/9200)|Fix a few minor things with scale test| -|[#9222](https://github.com/NVIDIA/spark-rapids/pull/9222)|Deploy classified aggregator for Databricks [skip ci]| -|[#9209](https://github.com/NVIDIA/spark-rapids/pull/9209)|Fix tests for datetime rebase in Databricks| -|[#9181](https://github.com/NVIDIA/spark-rapids/pull/9181)|[DOC] address document issues [skip ci]| -|[#9132](https://github.com/NVIDIA/spark-rapids/pull/9132)|Support `spark.sql.parquet.datetimeRebaseModeInWrite=LEGACY`| -|[#9196](https://github.com/NVIDIA/spark-rapids/pull/9196)|Fix host memory leak for R2C| -|[#9192](https://github.com/NVIDIA/spark-rapids/pull/9192)|Throw overflow exception when interval seconds are outside of range [0, 59]| -|[#9150](https://github.com/NVIDIA/spark-rapids/pull/9150)|add error section in report and the rest queries| -|[#9189](https://github.com/NVIDIA/spark-rapids/pull/9189)|Expose host store spill| -|[#9147](https://github.com/NVIDIA/spark-rapids/pull/9147)|Make map column non-nullable when it's a key in another map.| -|[#9193](https://github.com/NVIDIA/spark-rapids/pull/9193)|Support Retry for GpuLocalLimitExec and GpuGlobalLimitExec| -|[#9183](https://github.com/NVIDIA/spark-rapids/pull/9183)|Add test to verify UDT fallback for parquet| -|[#9195](https://github.com/NVIDIA/spark-rapids/pull/9195)|Deploy sql-plugin-api artifact in DBR CI pipelines [skip ci]| -|[#9170](https://github.com/NVIDIA/spark-rapids/pull/9170)|Add in new HostAlloc API| -|[#9182](https://github.com/NVIDIA/spark-rapids/pull/9182)|Consolidate Spark vendor shim dependency management| -|[#9190](https://github.com/NVIDIA/spark-rapids/pull/9190)|Prevent returning internal compiler expressions when compiling UDFs| -|[#9164](https://github.com/NVIDIA/spark-rapids/pull/9164)|Support Retry for GpuTopN and GpuSortEachBatchIterator| -|[#9134](https://github.com/NVIDIA/spark-rapids/pull/9134)|Fix shuffle fallback due to AQE on AWS EMR| -|[#9188](https://github.com/NVIDIA/spark-rapids/pull/9188)|Fix flaky tests in FileCacheIntegrationSuite| -|[#9148](https://github.com/NVIDIA/spark-rapids/pull/9148)|Add minimum Maven module eventually containing all non-shimmable source code| -|[#9169](https://github.com/NVIDIA/spark-rapids/pull/9169)|Add retry-without-split in InternalRowToColumnarBatchIterator| -|[#9172](https://github.com/NVIDIA/spark-rapids/pull/9172)|Remove doSetSpillable in favor of setSpillable| -|[#9152](https://github.com/NVIDIA/spark-rapids/pull/9152)|Add test cases for testing Parquet compression types| -|[#9157](https://github.com/NVIDIA/spark-rapids/pull/9157)|XFAIL parquet `lz4_raw` tests for Spark 3.5.0 or later| -|[#9128](https://github.com/NVIDIA/spark-rapids/pull/9128)|Test parquet predicate pushdown for basic types and fields having dots in names| -|[#9158](https://github.com/NVIDIA/spark-rapids/pull/9158)|Add json4s dependencies for Databricks integration_tests build| -|[#9102](https://github.com/NVIDIA/spark-rapids/pull/9102)|Add retry support to GpuOutOfCoreSortIterator.mergeSortEnoughToOutput | -|[#9089](https://github.com/NVIDIA/spark-rapids/pull/9089)|Add application to run Scale Test| -|[#9143](https://github.com/NVIDIA/spark-rapids/pull/9143)|[DOC] update spark.rapids.sql.concurrentGpuTasks default value in tuning guide [skip ci]| -|[#8476](https://github.com/NVIDIA/spark-rapids/pull/8476)|Use retry with split in GpuCachedDoublePassWindowIterator| -|[#9141](https://github.com/NVIDIA/spark-rapids/pull/9141)|Removed resultDecimalType in GpuIntegralDecimalDivide| -|[#9099](https://github.com/NVIDIA/spark-rapids/pull/9099)|Spark 3.5.0 follow-on work (rc2 support + Python UDAF)| -|[#9140](https://github.com/NVIDIA/spark-rapids/pull/9140)|Bump Jython to 2.7.3| -|[#9136](https://github.com/NVIDIA/spark-rapids/pull/9136)|Moving row column conversion code from cudf to jni| -|[#9133](https://github.com/NVIDIA/spark-rapids/pull/9133)|Add 350 tag to InSubqueryShims| -|[#9124](https://github.com/NVIDIA/spark-rapids/pull/9124)|Import `scala.collection` intead of `collection`| -|[#9122](https://github.com/NVIDIA/spark-rapids/pull/9122)|Fall back to CPU if `spark.sql.execution.arrow.useLargeVarTypes` is true| -|[#9115](https://github.com/NVIDIA/spark-rapids/pull/9115)|[DOC] updates documentation related to java compatibility [skip ci]| -|[#9098](https://github.com/NVIDIA/spark-rapids/pull/9098)|Add SpillableHostColumnarBatch| -|[#9091](https://github.com/NVIDIA/spark-rapids/pull/9091)|GPU support for DynamicPruningExpression and InSubqueryExec| -|[#9117](https://github.com/NVIDIA/spark-rapids/pull/9117)|Temply disable spark 350 shim build in nightly [skip ci]| -|[#9113](https://github.com/NVIDIA/spark-rapids/pull/9113)|Instantiate execution plan capture callback via shim loader| -|[#8969](https://github.com/NVIDIA/spark-rapids/pull/8969)|Initial support for Spark 3.5.0-rc1| -|[#9100](https://github.com/NVIDIA/spark-rapids/pull/9100)|Support broadcast nested loop existence joins with no condition| -|[#8925](https://github.com/NVIDIA/spark-rapids/pull/8925)|Add GpuConv operator for the `conv` 10<->16 expression| -|[#9109](https://github.com/NVIDIA/spark-rapids/pull/9109)|[DOC] adding java 11 to download docs [skip ci]| -|[#9085](https://github.com/NVIDIA/spark-rapids/pull/9085)|Retry with smaller split on `CudfColumnSizeOverflowException`| -|[#8961](https://github.com/NVIDIA/spark-rapids/pull/8961)|Save Databricks init scripts in the workspace| -|[#9088](https://github.com/NVIDIA/spark-rapids/pull/9088)|Add retry and SplitAndRetry support to AcceleratedColumnarToRowIterator| -|[#9095](https://github.com/NVIDIA/spark-rapids/pull/9095)|Support released spark 3.3.3| -|[#9084](https://github.com/NVIDIA/spark-rapids/pull/9084)|Fix race when a rapids buffer is aliased while it is spilled| -|[#9093](https://github.com/NVIDIA/spark-rapids/pull/9093)|Update ParquetFormatScanSuite to not call CUDF directly| -|[#9068](https://github.com/NVIDIA/spark-rapids/pull/9068)|Test ORC predicate pushdown (PPD) with timestamps decimals booleans| -|[#9054](https://github.com/NVIDIA/spark-rapids/pull/9054)|Initial entry point to data generation for scale test| -|[#9070](https://github.com/NVIDIA/spark-rapids/pull/9070)|Spillable host buffer| -|[#9066](https://github.com/NVIDIA/spark-rapids/pull/9066)|Add retry support to RowToColumnarIterator| -|[#9073](https://github.com/NVIDIA/spark-rapids/pull/9073)|Stop using invalid escape sequences| -|[#9018](https://github.com/NVIDIA/spark-rapids/pull/9018)|Add test for selecting a single complex field array and its parent struct array| -|[#9067](https://github.com/NVIDIA/spark-rapids/pull/9067)|Add array support for round robin partition; Refactor pluginSupportedOrderableSig| -|[#9072](https://github.com/NVIDIA/spark-rapids/pull/9072)|Revert "Implement SumUnboundedToUnboundedFixer (#8934)"| -|[#9056](https://github.com/NVIDIA/spark-rapids/pull/9056)|Add in configs for host memory limits| -|[#9061](https://github.com/NVIDIA/spark-rapids/pull/9061)|Fix import order| -|[#8934](https://github.com/NVIDIA/spark-rapids/pull/8934)|Implement SumUnboundedToUnboundedFixer| -|[#9051](https://github.com/NVIDIA/spark-rapids/pull/9051)|Use number of threads on executor instead of driver to set core count| -|[#9040](https://github.com/NVIDIA/spark-rapids/pull/9040)|Fix issues from 23.08 merge in join_test| -|[#9045](https://github.com/NVIDIA/spark-rapids/pull/9045)|Fix auto merge conflict 9043 [skip ci]| -|[#9009](https://github.com/NVIDIA/spark-rapids/pull/9009)|Add in a layer of indirection for task completion callbacks| -|[#9013](https://github.com/NVIDIA/spark-rapids/pull/9013)|Create a two-shim jar by default on Databricks| -|[#8995](https://github.com/NVIDIA/spark-rapids/pull/8995)|Add test case for ORC statistics test| -|[#8970](https://github.com/NVIDIA/spark-rapids/pull/8970)|Add ability to debug dump input data only on errors| -|[#9003](https://github.com/NVIDIA/spark-rapids/pull/9003)|Fix auto merge conflict 9002 [skip ci]| -|[#8989](https://github.com/NVIDIA/spark-rapids/pull/8989)|Mark lazy spillables as allowSpillable in during gatherer construction| -|[#8988](https://github.com/NVIDIA/spark-rapids/pull/8988)|Move big data generator to a separate module| -|[#8987](https://github.com/NVIDIA/spark-rapids/pull/8987)|Fix host memory buffer leaks in SerializationSuite| -|[#8968](https://github.com/NVIDIA/spark-rapids/pull/8968)|Enable GPU acceleration of Bloom filter join expressions by default| -|[#8947](https://github.com/NVIDIA/spark-rapids/pull/8947)|Add ArrowUtilsShims in preparation for Spark 3.5.0| -|[#8946](https://github.com/NVIDIA/spark-rapids/pull/8946)|[Spark 3.5.0] Shim access to StructType.fromAttributes| -|[#8824](https://github.com/NVIDIA/spark-rapids/pull/8824)|Drop the in-range check at INT96 output path| -|[#8924](https://github.com/NVIDIA/spark-rapids/pull/8924)|Deprecate and delegate GpuCV.debug to cudf TableDebug| -|[#8915](https://github.com/NVIDIA/spark-rapids/pull/8915)|Move LegacyBehaviorPolicy references to shim layer| -|[#8918](https://github.com/NVIDIA/spark-rapids/pull/8918)|Output unified diff when GPU output deviates| -|[#8857](https://github.com/NVIDIA/spark-rapids/pull/8857)|Remove the pageable pool| -|[#8854](https://github.com/NVIDIA/spark-rapids/pull/8854)|Fix auto merge conflict 8853 [skip ci]| -|[#8805](https://github.com/NVIDIA/spark-rapids/pull/8805)|Bump up dep versions to 23.10.0-SNAPSHOT| -|[#8796](https://github.com/NVIDIA/spark-rapids/pull/8796)|Init version 23.10.0-SNAPSHOT| - -## Release 23.08 - -### Features -||| -|:---|:---| -|[#5509](https://github.com/NVIDIA/spark-rapids/issues/5509)|[FEA] Support order-by on Array| -|[#7876](https://github.com/NVIDIA/spark-rapids/issues/7876)|[FEA] Add initial support for Databricks 12.2 ML LTS| -|[#8547](https://github.com/NVIDIA/spark-rapids/issues/8547)|[FEA] Add support for Delta Lake 2.4 with Spark 3.4| -|[#8633](https://github.com/NVIDIA/spark-rapids/issues/8633)|[FEA] Add support for xxHash64 function| -|[#4929](https://github.com/NVIDIA/spark-rapids/issues/4929)|[FEA] Support min/max aggregation/reduction for arrays of structs and arrays of strings| -|[#8668](https://github.com/NVIDIA/spark-rapids/issues/8668)|[FEA] Support min and max for arrays| -|[#4887](https://github.com/NVIDIA/spark-rapids/issues/4887)|[FEA] Hash partitioning on ArrayType| -|[#6680](https://github.com/NVIDIA/spark-rapids/issues/6680)|[FEA] Support hashaggregate for Array[Any]| -|[#8085](https://github.com/NVIDIA/spark-rapids/issues/8085)|[FEA] Add support for MillisToTimestamp| -|[#7801](https://github.com/NVIDIA/spark-rapids/issues/7801)|[FEA] Window Expression orderBy column is not supported in a window range function, found DoubleType| -|[#8556](https://github.com/NVIDIA/spark-rapids/issues/8556)|[FEA] [Delta Lake] Add support for new metrics in MERGE| -|[#308](https://github.com/NVIDIA/spark-rapids/issues/308)|[FEA] Spark 3.1 adding support for TIMESTAMP_SECONDS, TIMESTAMP_MILLIS and TIMESTAMP_MICROS functions| -|[#8122](https://github.com/NVIDIA/spark-rapids/issues/8122)|[FEA] Add spark 3.4.1 snapshot shim| -|[#8525](https://github.com/NVIDIA/spark-rapids/issues/8525)|[FEA] Add support for org.apache.spark.sql.functions.flatten| -|[#8202](https://github.com/NVIDIA/spark-rapids/issues/8202)|[FEA] List supported Spark builds when the Shim is not found| - -### Performance -||| -|:---|:---| -|[#8231](https://github.com/NVIDIA/spark-rapids/issues/8231)|[FEA] Add filecache support to ORC scans| -|[#8141](https://github.com/NVIDIA/spark-rapids/issues/8141)|[FEA] Explore how to best deal with large numbers of aggregations in the short term| - -### Bugs Fixed -||| -|:---|:---| -|[#9034](https://github.com/NVIDIA/spark-rapids/issues/9034)|[BUG] java.lang.ClassCastException: com.nvidia.spark.rapids.RuleNotFoundExprMeta cannot be cast to com.nvidia.spark.rapids.GeneratorExprMeta| -|[#9032](https://github.com/NVIDIA/spark-rapids/issues/9032)|[BUG] Multiple NDS queries fail with Spark-3.4.1 with bloom filter exception| -|[#8962](https://github.com/NVIDIA/spark-rapids/issues/8962)|[BUG] Nightly build failed: ExecutionPlanCaptureCallback$.class is not bitwise-identical across shims| -|[#9021](https://github.com/NVIDIA/spark-rapids/issues/9021)|[BUG] test_map_scalars_supported_key_types failed in dataproc 2.1| -|[#9020](https://github.com/NVIDIA/spark-rapids/issues/9020)|[BUG] auto-disable snapshot shims test in github action for pre-release branch| -|[#9010](https://github.com/NVIDIA/spark-rapids/issues/9010)|[BUG] Customer failure 23.08: Cannot compute hash of a table with a LIST of STRUCT columns.| -|[#8922](https://github.com/NVIDIA/spark-rapids/issues/8922)|[BUG] integration map_test:test_map_scalars_supported_key_types failures| -|[#8982](https://github.com/NVIDIA/spark-rapids/issues/8982)|[BUG] Nightly prerelease failures - OrcSuite| -|[#8978](https://github.com/NVIDIA/spark-rapids/issues/8978)|[BUG] compiling error due to OrcSuite&OrcStatisticShim in databricks runtimes| -|[#8610](https://github.com/NVIDIA/spark-rapids/issues/8610)|[BUG] query 95 @ SF30K fails with OOM exception| -|[#8955](https://github.com/NVIDIA/spark-rapids/issues/8955)|[BUG] Bloom filter join tests can fail with multiple join columns| -|[#45](https://github.com/NVIDIA/spark-rapids/issues/45)|[BUG] very large shuffles can fail| -|[#8779](https://github.com/NVIDIA/spark-rapids/issues/8779)|[BUG] Put shared Databricks test script together for ease of maintenance| -|[#8930](https://github.com/NVIDIA/spark-rapids/issues/8930)|[BUG] checkoutSCM plugin is unstable for pre-merge CI, it is often unable to clone submodules| -|[#8923](https://github.com/NVIDIA/spark-rapids/issues/8923)|[BUG] Mortgage test failing with 'JavaPackage' error on AWS Databricks| -|[#8303](https://github.com/NVIDIA/spark-rapids/issues/8303)|[BUG] GpuExpression columnarEval can return scalars from subqueries that may be unhandled| -|[#8318](https://github.com/NVIDIA/spark-rapids/issues/8318)|[BUG][Databricks 12.2] GpuRowBasedHiveGenericUDF ClassCastException| -|[#8822](https://github.com/NVIDIA/spark-rapids/issues/8822)|[BUG] Early terminate CI if submodule init failed| -|[#8847](https://github.com/NVIDIA/spark-rapids/issues/8847)|[BUG] github actions CI messed up w/ JDK versions intermittently| -|[#8716](https://github.com/NVIDIA/spark-rapids/issues/8716)|[BUG] `test_hash_groupby_collect_set_on_nested_type` and `test_hash_reduction_collect_set_on_nested_type` failed| -|[#8827](https://github.com/NVIDIA/spark-rapids/issues/8827)|[BUG] databricks cudf_udf night build failing with pool size exceeded errors| -|[#8630](https://github.com/NVIDIA/spark-rapids/issues/8630)|[BUG] Parquet with RLE encoded booleans loads corrupted data| -|[#8735](https://github.com/NVIDIA/spark-rapids/issues/8735)|[BUG] test_orc_column_name_with_dots fails in nightly EGX tests| -|[#6980](https://github.com/NVIDIA/spark-rapids/issues/6980)|[BUG] Partitioned writes release GPU semaphore with unspillable GPU memory| -|[#8784](https://github.com/NVIDIA/spark-rapids/issues/8784)|[BUG] hash_aggregate_test.py::test_min_max_in_groupby_and_reduction failed on "TypeError: object of type 'NoneType' has no len()"| -|[#8756](https://github.com/NVIDIA/spark-rapids/issues/8756)|[BUG] [Databricks 12.2] RapidsDeltaWrite queries that reference internal metadata fail to run| -|[#8636](https://github.com/NVIDIA/spark-rapids/issues/8636)|[BUG] AWS Databricks 12.2 integration tests failed due to Iceberg check| -|[#8754](https://github.com/NVIDIA/spark-rapids/issues/8754)|[BUG] databricks build broke after adding bigDataGen| -|[#8726](https://github.com/NVIDIA/spark-rapids/issues/8726)|[BUG] Test "parquet_write_test.py::test_hive_timestamp_value[INJECT_OOM]" failed on Databricks | -|[#8690](https://github.com/NVIDIA/spark-rapids/issues/8690)|[BUG buildall script does not support JDK11 profile| -|[#8702](https://github.com/NVIDIA/spark-rapids/issues/8702)|[BUG] test_min_max_for_single_level_struct failed| -|[#8727](https://github.com/NVIDIA/spark-rapids/issues/8727)|[BUG] test_column_add_after_partition failed in databricks 10.4 runtime| -|[#8669](https://github.com/NVIDIA/spark-rapids/issues/8669)|[BUG] SpillableColumnarBatch doesn't always take ownership| -|[#8655](https://github.com/NVIDIA/spark-rapids/issues/8655)|[BUG] There are some potential device memory leaks in `AbstractGpuCoalesceIterator`| -|[#8685](https://github.com/NVIDIA/spark-rapids/issues/8685)|[BUG] install build fails with Maven 3.9.3| -|[#8156](https://github.com/NVIDIA/spark-rapids/issues/8156)|[BUG] Install phase for modules with Spark build classifier fails for install plugin versions 3.0.0+| -|[#1130](https://github.com/NVIDIA/spark-rapids/issues/1130)|[BUG] TIMESTAMP_MILLIS not handled in isDateTimeRebaseNeeded| -|[#7676](https://github.com/NVIDIA/spark-rapids/issues/7676)|[BUG] SparkShimsImpl class initialization in SparkShimsSuite for 340 too eager| -|[#8278](https://github.com/NVIDIA/spark-rapids/issues/8278)|[BUG] NDS query 16 hangs at SF30K| -|[#8665](https://github.com/NVIDIA/spark-rapids/issues/8665)|[BUG] EGX nightly tests fail to detect Spark version on startup| -|[#8647](https://github.com/NVIDIA/spark-rapids/issues/8647)|[BUG] array_test.py::test_array_min_max[Float][INJECT_OOM] failed mismatched CPU and GPU output in nightly| -|[#8640](https://github.com/NVIDIA/spark-rapids/issues/8640)|[BUG] Optimize Databricks pre-merge scripts, move it out into a new CI file| -|[#8308](https://github.com/NVIDIA/spark-rapids/issues/8308)|[BUG] Device Memory leak seen in integration_tests when AssertEmptyNulls are enabled| -|[#8602](https://github.com/NVIDIA/spark-rapids/issues/8602)|[BUG] AutoCloseable Broadcast results are getting closed by Spark| -|[#8603](https://github.com/NVIDIA/spark-rapids/issues/8603)|[BUG] SerializeConcatHostBuffersDeserializeBatch.writeObject fails with ArrayIndexOutOfBoundsException on rows-only table| -|[#8615](https://github.com/NVIDIA/spark-rapids/issues/8615)|[BUG] RapidsShuffleThreadedWriterSuite temp shuffle file test failure| -|[#6872](https://github.com/NVIDIA/spark-rapids/issues/6872)|[BUG] awk: cmd. line:1: warning: regexp escape sequence `\ ' is not a known regexp operator| -|[#8588](https://github.com/NVIDIA/spark-rapids/issues/8588)|[BUG] Spark 3.3.x integration tests failed due to missing jars| -|[#7775](https://github.com/NVIDIA/spark-rapids/issues/7775)|[BUG] scala version hardcoded irrespective of Spark dependency| -|[#8548](https://github.com/NVIDIA/spark-rapids/issues/8548)|[BUG] cache_test:test_batch_no_cols test FAILED on spark-3.3.0+| -|[#8579](https://github.com/NVIDIA/spark-rapids/issues/8579)|[BUG] build failed on Databricks clusters "GpuDeleteCommand.scala:104: type mismatch" | -|[#8187](https://github.com/NVIDIA/spark-rapids/issues/8187)|[BUG] Integration test test_window_running_no_part can produce non-empty nulls (cudf scan)| -|[#8493](https://github.com/NVIDIA/spark-rapids/issues/8493)|[BUG] branch-23.08 fails to build on Databricks 12.2| - -### PRs -||| -|:---|:---| -|[#9407](https://github.com/NVIDIA/spark-rapids/pull/9407)|[Doc]Update docs for 23.08.2 version[skip ci]| -|[#9382](https://github.com/NVIDIA/spark-rapids/pull/9382)|Bump up project version to 23.08.2| -|[#8476](https://github.com/NVIDIA/spark-rapids/pull/8476)|Use retry with split in GpuCachedDoublePassWindowIterator| -|[#9048](https://github.com/NVIDIA/spark-rapids/pull/9048)|Update 23.08 changelog 23/08/15 [skip ci]| -|[#9044](https://github.com/NVIDIA/spark-rapids/pull/9044)|[DOC] update release version from v2308.0 to 2308.1 [skip ci]| -|[#9036](https://github.com/NVIDIA/spark-rapids/pull/9036)|Fix meta class cast exception when generator not supported| -|[#9042](https://github.com/NVIDIA/spark-rapids/pull/9042)|Bump up project version to 23.08.1-SNAPSHOT| -|[#9035](https://github.com/NVIDIA/spark-rapids/pull/9035)|Handle null values when merging Bloom filters| -|[#9029](https://github.com/NVIDIA/spark-rapids/pull/9029)|Update 23.08 changelog to latest [skip ci]| -|[#9023](https://github.com/NVIDIA/spark-rapids/pull/9023)|Allow WindowLocalExec to run on CPU for a map test.| -|[#9024](https://github.com/NVIDIA/spark-rapids/pull/9024)|Do not trigger snapshot spark version test in pre-release maven-verify checks [skip ci]| -|[#8975](https://github.com/NVIDIA/spark-rapids/pull/8975)|Init 23.08 changelog [skip ci]| -|[#9016](https://github.com/NVIDIA/spark-rapids/pull/9016)|Fix issue where murmur3 tried to work on array of structs| -|[#9014](https://github.com/NVIDIA/spark-rapids/pull/9014)|Updating link to download jar [skip ci]| -|[#9006](https://github.com/NVIDIA/spark-rapids/pull/9006)|Revert test changes to fix binary dedup error| -|[#9001](https://github.com/NVIDIA/spark-rapids/pull/9001)|[Doc]update the emr getting started doc for emr-6120 release[skip ci]| -|[#8949](https://github.com/NVIDIA/spark-rapids/pull/8949)|Update JNI and private version to released 23.08.0| -|[#8977](https://github.com/NVIDIA/spark-rapids/pull/8977)|Create an anonymous subclass of AdaptiveSparkPlanHelper in ExecutionPlanCaptureCallback.scala| -|[#8972](https://github.com/NVIDIA/spark-rapids/pull/8972)|[Doc]Add best practice doc[skip ci]| -|[#8948](https://github.com/NVIDIA/spark-rapids/pull/8948)|[Doc]update download docs for 2308 version[skip ci]| -|[#8971](https://github.com/NVIDIA/spark-rapids/pull/8971)|Fix test_map_scalars_supported_key_types| -|[#8990](https://github.com/NVIDIA/spark-rapids/pull/8990)|Remove doc references to 312db [skip ci]| -|[#8960](https://github.com/NVIDIA/spark-rapids/pull/8960)|[Doc] address profiling tool formatted issue [skip ci]| -|[#8983](https://github.com/NVIDIA/spark-rapids/pull/8983)|Revert OrcSuite to fix deployment build| -|[#8979](https://github.com/NVIDIA/spark-rapids/pull/8979)|Fix Databricks build error for new added ORC test cases| -|[#8920](https://github.com/NVIDIA/spark-rapids/pull/8920)|Add test case to test orc dictionary encoding with lots of rows for nested types| -|[#8940](https://github.com/NVIDIA/spark-rapids/pull/8940)|Add test case for ORC statistics test| -|[#8909](https://github.com/NVIDIA/spark-rapids/pull/8909)|Match Spark's NaN handling in collect_set| -|[#8892](https://github.com/NVIDIA/spark-rapids/pull/8892)|Experimental support for BloomFilterAggregate expression in a reduction context| -|[#8957](https://github.com/NVIDIA/spark-rapids/pull/8957)|Fix building dockerfile.cuda hanging at tzdata installation [skip ci]| -|[#8944](https://github.com/NVIDIA/spark-rapids/pull/8944)|Fix issues around bloom filter with multple columns| -|[#8744](https://github.com/NVIDIA/spark-rapids/pull/8744)|Add test for selecting a single complex field array and its parent struct array| -|[#8936](https://github.com/NVIDIA/spark-rapids/pull/8936)|Device synchronize prior to freeing a set of RapidsBuffer| -|[#8935](https://github.com/NVIDIA/spark-rapids/pull/8935)|Don't go over shuffle limits on CPU| -|[#8927](https://github.com/NVIDIA/spark-rapids/pull/8927)|Skipping test_map_scalars_supported_key_types because of distributed …| -|[#8931](https://github.com/NVIDIA/spark-rapids/pull/8931)|Clone submodule using git command instead of checkoutSCM plugin| -|[#8917](https://github.com/NVIDIA/spark-rapids/pull/8917)|Databricks shim version for integration test| -|[#8775](https://github.com/NVIDIA/spark-rapids/pull/8775)|Support BloomFilterMightContain expression| -|[#8833](https://github.com/NVIDIA/spark-rapids/pull/8833)|Binary and ternary handling of scalar audit and some fixes| -|[#7233](https://github.com/NVIDIA/spark-rapids/pull/7233)|[FEA] Support `order by` on single-level array| -|[#8893](https://github.com/NVIDIA/spark-rapids/pull/8893)|Fix regression in Hive Generic UDF support on Databricks 12.2| -|[#8828](https://github.com/NVIDIA/spark-rapids/pull/8828)|Put shared part together for Databricks test scripts| -|[#8872](https://github.com/NVIDIA/spark-rapids/pull/8872)|Terminate CI if fail to clone submodule| -|[#8787](https://github.com/NVIDIA/spark-rapids/pull/8787)|Add in support for ExponentialDistribution| -|[#8868](https://github.com/NVIDIA/spark-rapids/pull/8868)|Add a test case for testing ORC version V_0_11 and V_0_12| -|[#8795](https://github.com/NVIDIA/spark-rapids/pull/8795)|Add ORC writing test cases for not implicitly lowercase columns| -|[#8871](https://github.com/NVIDIA/spark-rapids/pull/8871)|Adjust parallelism in spark-tests script to reduce memory footprint [skip ci]| -|[#8869](https://github.com/NVIDIA/spark-rapids/pull/8869)|Specify expected JAVA_HOME and bin for mvn-verify-check [skip ci]| -|[#8785](https://github.com/NVIDIA/spark-rapids/pull/8785)|Add test cases for ORC writing according to options orc.compress and compression| -|[#8810](https://github.com/NVIDIA/spark-rapids/pull/8810)|Fall back to CPU for deletion vectors writes on Databricks| -|[#8830](https://github.com/NVIDIA/spark-rapids/pull/8830)|Update documentation to add Databricks 12.2 as a supported platform [skip ci]| -|[#8799](https://github.com/NVIDIA/spark-rapids/pull/8799)|Add tests to cover some odd corner cases with nulls and empty arrays| -|[#8783](https://github.com/NVIDIA/spark-rapids/pull/8783)|Fix collect_set_on_nested_type tests failed| -|[#8855](https://github.com/NVIDIA/spark-rapids/pull/8855)|Fix bug: Check GPU file instead of CPU file [skip ci]| -|[#8852](https://github.com/NVIDIA/spark-rapids/pull/8852)|Update test scripts and dockerfiles to match cudf conda pkg change [skip ci]| -|[#8848](https://github.com/NVIDIA/spark-rapids/pull/8848)|Try mitigate mismatched JDK versions in mvn-verify checks [skip ci]| -|[#8825](https://github.com/NVIDIA/spark-rapids/pull/8825)|Add a case to test ORC writing/reading with lots of nulls| -|[#8802](https://github.com/NVIDIA/spark-rapids/pull/8802)|Treat unbounded windows as truly non-finite.| -|[#8798](https://github.com/NVIDIA/spark-rapids/pull/8798)|Add ORC writing test cases for dictionary compression| -|[#8829](https://github.com/NVIDIA/spark-rapids/pull/8829)|Enable rle_boolean_encoding.parquet test| -|[#8667](https://github.com/NVIDIA/spark-rapids/pull/8667)|Make state spillable in partitioned writer| -|[#8801](https://github.com/NVIDIA/spark-rapids/pull/8801)|Fix shuffling an empty Struct() column with UCX| -|[#8748](https://github.com/NVIDIA/spark-rapids/pull/8748)|Add driver log warning when GPU is limiting scheduling resource| -|[#8786](https://github.com/NVIDIA/spark-rapids/pull/8786)|Add support for row-based execution in RapidsDeltaWrite| -|[#8791](https://github.com/NVIDIA/spark-rapids/pull/8791)|Auto merge to branch-23.10 from branch-23.08[skip ci]| -|[#8790](https://github.com/NVIDIA/spark-rapids/pull/8790)|Update ubuntu dockerfiles default to 20.04 and deprecating centos one [skip ci]| -|[#8777](https://github.com/NVIDIA/spark-rapids/pull/8777)|Install python packages with shared scripts on Databricks| -|[#8772](https://github.com/NVIDIA/spark-rapids/pull/8772)|Test concurrent writer update file metrics| -|[#8646](https://github.com/NVIDIA/spark-rapids/pull/8646)|Add testing of Parquet files from apache/parquet-testing| -|[#8684](https://github.com/NVIDIA/spark-rapids/pull/8684)|Add 'submodule update --init' when build spark-rapids| -|[#8769](https://github.com/NVIDIA/spark-rapids/pull/8769)|Remove iceberg scripts from Databricks test scripts| -|[#8773](https://github.com/NVIDIA/spark-rapids/pull/8773)|Add a test case for reading/write null to ORC| -|[#8749](https://github.com/NVIDIA/spark-rapids/pull/8749)|Add test cases for read/write User Defined Type (UDT) to ORC| -|[#8768](https://github.com/NVIDIA/spark-rapids/pull/8768)|Add support for xxhash64| -|[#8751](https://github.com/NVIDIA/spark-rapids/pull/8751)|Ensure columnarEval always returns a GpuColumnVector| -|[#8765](https://github.com/NVIDIA/spark-rapids/pull/8765)|Add in support for maps to big data gen| -|[#8758](https://github.com/NVIDIA/spark-rapids/pull/8758)|Normal and Multi Distributions for BigDataGen| -|[#8755](https://github.com/NVIDIA/spark-rapids/pull/8755)|Add in dependency for databricks on integration tests| -|[#8737](https://github.com/NVIDIA/spark-rapids/pull/8737)|Fix parquet_write_test.py::test_hive_timestamp_value failure for Databricks| -|[#8745](https://github.com/NVIDIA/spark-rapids/pull/8745)|Conventional jar layout is not required for JDK9+| -|[#8706](https://github.com/NVIDIA/spark-rapids/pull/8706)|Add a tool to support generating large amounts of data| -|[#8747](https://github.com/NVIDIA/spark-rapids/pull/8747)|xfail hash_groupby_collect_set and hash_reduction_collect_set on nested type cases| -|[#8689](https://github.com/NVIDIA/spark-rapids/pull/8689)|Support nested arrays for `min`/`max` aggregations in groupby and reduction| -|[#8699](https://github.com/NVIDIA/spark-rapids/pull/8699)|Regression test for array of struct with a single field name "element" in Parquet| -|[#8733](https://github.com/NVIDIA/spark-rapids/pull/8733)|Avoid generating numeric null partition values on Databricks 10.4| -|[#8728](https://github.com/NVIDIA/spark-rapids/pull/8728)|Use specific mamba version and install libarchive explictly [skip ci]| -|[#8594](https://github.com/NVIDIA/spark-rapids/pull/8594)|String generation from complex regex in integration tests| -|[#8700](https://github.com/NVIDIA/spark-rapids/pull/8700)|Add regression test to ensure Parquet doesn't interpret timestamp values differently from Hive 0.14.0+| -|[#8711](https://github.com/NVIDIA/spark-rapids/pull/8711)|Factor out modules shared among shim profiles| -|[#8697](https://github.com/NVIDIA/spark-rapids/pull/8697)|Spillable columnar batch takes ownership and improve code coverage| -|[#8705](https://github.com/NVIDIA/spark-rapids/pull/8705)|Add schema evolution integration tests for partitioned data| -|[#8673](https://github.com/NVIDIA/spark-rapids/pull/8673)|Fix some potential memory leaks| -|[#8707](https://github.com/NVIDIA/spark-rapids/pull/8707)|Update config docs for new filecache configs [skip ci]| -|[#8695](https://github.com/NVIDIA/spark-rapids/pull/8695)|Always create the main artifact along with a shim-classifier artifact| -|[#8704](https://github.com/NVIDIA/spark-rapids/pull/8704)|Add tests for column names with dots| -|[#8703](https://github.com/NVIDIA/spark-rapids/pull/8703)|Comment out min/max agg test for nested structs to unblock CI| -|[#8698](https://github.com/NVIDIA/spark-rapids/pull/8698)|Cache last ORC stripe footer to avoid redundant remote reads| -|[#8687](https://github.com/NVIDIA/spark-rapids/pull/8687)|Handle TIMESTAMP_MILLIS for rebase check| -|[#8688](https://github.com/NVIDIA/spark-rapids/pull/8688)|Enable the 340 shim test| -|[#8656](https://github.com/NVIDIA/spark-rapids/pull/8656)|Return result from filecache message instead of null| -|[#8659](https://github.com/NVIDIA/spark-rapids/pull/8659)|Filter out nulls for build batches when needed in hash joins| -|[#8682](https://github.com/NVIDIA/spark-rapids/pull/8682)|[DOC] Update CUDA requirements in documentation and Dockerfiles[skip ci]| -|[#8637](https://github.com/NVIDIA/spark-rapids/pull/8637)|Support Float order-by columns for RANGE window functions| -|[#8681](https://github.com/NVIDIA/spark-rapids/pull/8681)|changed container name to adapt to blossom-lib refactor [skip ci]| -|[#8573](https://github.com/NVIDIA/spark-rapids/pull/8573)|Add support for Delta Lake 2.4.0| -|[#8671](https://github.com/NVIDIA/spark-rapids/pull/8671)|Fix use-after-freed bug in `GpuFloatArrayMin`| -|[#8650](https://github.com/NVIDIA/spark-rapids/pull/8650)|Support TIMESTAMP_SECONDS, TIMESTAMP_MILLIS and TIMESTAMP_MICROS| -|[#8495](https://github.com/NVIDIA/spark-rapids/pull/8495)|Speed up PCBS CPU read path by not recalculating as much| -|[#8389](https://github.com/NVIDIA/spark-rapids/pull/8389)|Add filecache support for ORC| -|[#8658](https://github.com/NVIDIA/spark-rapids/pull/8658)|Check if need to run Databricks pre-merge| -|[#8649](https://github.com/NVIDIA/spark-rapids/pull/8649)|Add Spark 3.4.1 shim| -|[#8624](https://github.com/NVIDIA/spark-rapids/pull/8624)|Rename numBytesAdded/Removed metrics and add deletion vector metrics in Databricks 12.2 shims| -|[#8645](https://github.com/NVIDIA/spark-rapids/pull/8645)|Fix "PytestUnknownMarkWarning: Unknown pytest.mark.inject_oom" warning| -|[#8608](https://github.com/NVIDIA/spark-rapids/pull/8608)|Matrix stages to dynamically build Databricks shims| -|[#8517](https://github.com/NVIDIA/spark-rapids/pull/8517)|Revert "Disable asserts for non-empty nulls (#8183)"| -|[#8628](https://github.com/NVIDIA/spark-rapids/pull/8628)|Enable Delta Write fallback tests on Databricks 12.2| -|[#8632](https://github.com/NVIDIA/spark-rapids/pull/8632)|Fix GCP examples and getting started guide [skip ci]| -|[#8638](https://github.com/NVIDIA/spark-rapids/pull/8638)|Support nested structs for `min`/`max` aggregations in groupby and reduction| -|[#8639](https://github.com/NVIDIA/spark-rapids/pull/8639)|Add iceberg test for nightly DB12.2 IT pipeline[skip ci]| -|[#8618](https://github.com/NVIDIA/spark-rapids/pull/8618)|Heuristic to speed up partial aggregates that get larger| -|[#8605](https://github.com/NVIDIA/spark-rapids/pull/8605)|[Doc] Fix demo link in index.md [skip ci]| -|[#8619](https://github.com/NVIDIA/spark-rapids/pull/8619)|Enable output batches metric for GpuShuffleCoalesceExec by default| -|[#8617](https://github.com/NVIDIA/spark-rapids/pull/8617)|Fixes broadcast spill serialization/deserialization| -|[#8531](https://github.com/NVIDIA/spark-rapids/pull/8531)|filecache: Modify FileCacheLocalityManager.init to pass in Spark context| -|[#8613](https://github.com/NVIDIA/spark-rapids/pull/8613)|Try print JVM core dump files if any test failures in CI| -|[#8616](https://github.com/NVIDIA/spark-rapids/pull/8616)|Wait for futures in multi-threaded writers even on exception| -|[#8578](https://github.com/NVIDIA/spark-rapids/pull/8578)|Add in metric to see how much computation time is lost due to retry| -|[#8590](https://github.com/NVIDIA/spark-rapids/pull/8590)|Drop ".dev0" suffix from Spark SNASHOT distro builds| -|[#8604](https://github.com/NVIDIA/spark-rapids/pull/8604)|Upgrade scalatest version to 3.2.16| -|[#8555](https://github.com/NVIDIA/spark-rapids/pull/8555)|Support `flatten` SQL function| -|[#8599](https://github.com/NVIDIA/spark-rapids/pull/8599)|Fix broken links in advanced_configs.md| -|[#8589](https://github.com/NVIDIA/spark-rapids/pull/8589)|Revert to the JVM-based Spark version extraction in pytests| -|[#8582](https://github.com/NVIDIA/spark-rapids/pull/8582)|Fix databricks shims build errors caused by DB updates| -|[#8564](https://github.com/NVIDIA/spark-rapids/pull/8564)|Fold `verify-all-modules-with-headSparkVersion` into `verify-all-modules` [skip ci]| -|[#8553](https://github.com/NVIDIA/spark-rapids/pull/8553)|Handle empty batch in ParquetCachedBatchSerializer| -|[#8575](https://github.com/NVIDIA/spark-rapids/pull/8575)|Corrected typos in CONTRIBUTING.md [skip ci]| -|[#8574](https://github.com/NVIDIA/spark-rapids/pull/8574)|Remove maxTaskFailures=4 for pre-3.1.1 Spark| -|[#8503](https://github.com/NVIDIA/spark-rapids/pull/8503)|Remove hard-coded version numbers for dependencies when building on| -|[#8544](https://github.com/NVIDIA/spark-rapids/pull/8544)|Fix auto merge conflict 8543 [skip ci]| -|[#8521](https://github.com/NVIDIA/spark-rapids/pull/8521)|List supported Spark versions when no shim found| -|[#8520](https://github.com/NVIDIA/spark-rapids/pull/8520)|Add support for first, last, nth, and collect_list aggregations for BinaryType| -|[#8509](https://github.com/NVIDIA/spark-rapids/pull/8509)|Remove legacy spark version check| -|[#8494](https://github.com/NVIDIA/spark-rapids/pull/8494)|Fix 23.08 build on Databricks 12.2| -|[#8487](https://github.com/NVIDIA/spark-rapids/pull/8487)|Move MockTaskContext to tests project| -|[#8426](https://github.com/NVIDIA/spark-rapids/pull/8426)|Pre-merge CI to support Databricks 12.2| -|[#8282](https://github.com/NVIDIA/spark-rapids/pull/8282)|Databricks 12.2 Support| -|[#8407](https://github.com/NVIDIA/spark-rapids/pull/8407)|Bump up dep version to 23.08.0-SNAPSHOT| -|[#8359](https://github.com/NVIDIA/spark-rapids/pull/8359)|Init version 23.08.0-SNAPSHOT| - -## Release 23.06 - -### Features -||| -|:---|:---| -|[#6201](https://github.com/NVIDIA/spark-rapids/issues/6201)|[FEA] experiment with memoizing datagens in the integration_tests| -|[#8079](https://github.com/NVIDIA/spark-rapids/issues/8079)|[FEA] Release Spark 3.4 Support| -|[#7043](https://github.com/NVIDIA/spark-rapids/issues/7043)|[FEA] Support Empty2Null expression on Spark 3.4.0| -|[#8222](https://github.com/NVIDIA/spark-rapids/issues/8222)|[FEA] String Split Unsupported escaped character '.'| -|[#8211](https://github.com/NVIDIA/spark-rapids/issues/8211)|[FEA] Add tencent blob store uri to spark rapids cloudScheme defaults| -|[#4103](https://github.com/NVIDIA/spark-rapids/issues/4103)|[FEA] jdk17 support| -|[#7094](https://github.com/NVIDIA/spark-rapids/issues/7094)|[FEA] Add a shim layer for Spark 3.2.4| -|[#6202](https://github.com/NVIDIA/spark-rapids/issues/6202)|[SPARK-39528][SQL] Use V2 Filter in SupportsRuntimeFiltering| -|[#6034](https://github.com/NVIDIA/spark-rapids/issues/6034)|[FEA] Support `offset` parameter in `TakeOrderedAndProject`| -|[#8196](https://github.com/NVIDIA/spark-rapids/issues/8196)|[FEA] Add retry handling to GpuGenerateExec.fixedLenLazyArrayGenerate path| -|[#7891](https://github.com/NVIDIA/spark-rapids/issues/7891)|[FEA] Support StddevSamp with cast(col as double) for input| -|[#62](https://github.com/NVIDIA/spark-rapids/issues/62)|[FEA] stddevsamp function| -|[#7867](https://github.com/NVIDIA/spark-rapids/issues/7867)|[FEA] support json to struct function| -|[#7883](https://github.com/NVIDIA/spark-rapids/issues/7883)|[FEA] support order by string in windowing function| -|[#7882](https://github.com/NVIDIA/spark-rapids/issues/7882)|[FEA] support StringTranslate function| -|[#7843](https://github.com/NVIDIA/spark-rapids/issues/7843)|[FEA] build with CUDA 12| -|[#8045](https://github.com/NVIDIA/spark-rapids/issues/8045)|[FEA] Support repetition in choice on regular expressions| -|[#6882](https://github.com/NVIDIA/spark-rapids/issues/6882)|[FEA] Regular expressions - support line anchors in choice| -|[#7901](https://github.com/NVIDIA/spark-rapids/issues/7901)|[FEA] better rlike function supported| -|[#7784](https://github.com/NVIDIA/spark-rapids/issues/7784)|[FEA] Add Spark 3.3.3-SNAPSHOT to shims| -|[#7260](https://github.com/NVIDIA/spark-rapids/issues/7260)|[FEA] Create a new Expression execution framework| - -### Performance -||| -|:---|:---| -|[#7870](https://github.com/NVIDIA/spark-rapids/issues/7870)|[FEA] Turn on spark.rapids.sql.castDecimalToString.enabled by default| -|[#7321](https://github.com/NVIDIA/spark-rapids/issues/7321)|[FEA] Improve performance of small file ORC reads from blobstores| -|[#7672](https://github.com/NVIDIA/spark-rapids/issues/7672)|Make all buffers/columnar batches spillable by default| - -### Bugs Fixed -||| -|:---|:---| -|[#6339](https://github.com/NVIDIA/spark-rapids/issues/6339)|[BUG] 0 in some cases for decimal being cast to a string returns different results.| -|[#8522](https://github.com/NVIDIA/spark-rapids/issues/8522)|[BUG] `from_json` function failed testing with input column containing empty or null string| -|[#8483](https://github.com/NVIDIA/spark-rapids/issues/8483)|[BUG] `test_read_compressed_hive_text` fails on CDH| -|[#8330](https://github.com/NVIDIA/spark-rapids/issues/8330)|[BUG] Handle Decimal128 computation with overflow of Remainder on Spark 3.4| -|[#8448](https://github.com/NVIDIA/spark-rapids/issues/8448)|[BUG] GpuRegExpReplaceWithBackref with empty string input produces incorrect result on GPU in Spark 3.1.1| -|[#8323](https://github.com/NVIDIA/spark-rapids/issues/8323)|[BUG] regexp_replace hangs with specific inputs and patterns| -|[#8473](https://github.com/NVIDIA/spark-rapids/issues/8473)|[BUG] Complete aggregation with non-trivial grouping expression fails| -|[#8440](https://github.com/NVIDIA/spark-rapids/issues/8440)|[BUG] the jar with scaladoc overwrites the jar with javadoc | -|[#8469](https://github.com/NVIDIA/spark-rapids/issues/8469)|[BUG] Multi-threaded reader can't be toggled on/off| -|[#8460](https://github.com/NVIDIA/spark-rapids/issues/8460)|[BUG] Compile failure on Databricks 11.3 with GpuHiveTableScanExec.scala| -|[#8114](https://github.com/NVIDIA/spark-rapids/issues/8114)|[BUG] [AUDIT] [SPARK-42478] Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory| -|[#6786](https://github.com/NVIDIA/spark-rapids/issues/6786)|[BUG] NDS q95 fails with OOM at 10TB| -|[#8419](https://github.com/NVIDIA/spark-rapids/issues/8419)|[BUG] Hive Text reader fails for GZIP compressed input| -|[#8409](https://github.com/NVIDIA/spark-rapids/issues/8409)|[BUG] JVM agent crashed SIGFPE cudf::detail::repeat in integration tests| -|[#8411](https://github.com/NVIDIA/spark-rapids/issues/8411)|[BUG] Close called too many times in Gpu json reader| -|[#8400](https://github.com/NVIDIA/spark-rapids/issues/8400)|[BUG] Cloudera IT test failures - test_timesub_from_subquery| -|[#8240](https://github.com/NVIDIA/spark-rapids/issues/8240)|[BUG] NDS power run hits GPU OOM on Databricks.| -|[#8375](https://github.com/NVIDIA/spark-rapids/issues/8375)|[BUG] test_empty_filter[>] failed in 23.06 nightly| -|[#8363](https://github.com/NVIDIA/spark-rapids/issues/8363)|[BUG] ORC reader NullPointerExecption| -|[#8281](https://github.com/NVIDIA/spark-rapids/issues/8281)|[BUG] ParquetCachedBatchSerializer is crashing on count| -|[#8331](https://github.com/NVIDIA/spark-rapids/issues/8331)|[BUG] Filter on dates with subquery results in ArrayIndexOutOfBoundsException| -|[#8293](https://github.com/NVIDIA/spark-rapids/issues/8293)|[BUG] GpuTimeAdd throws UnsupportedOperationException takes column and interval as an argument only| -|[#8161](https://github.com/NVIDIA/spark-rapids/issues/8161)|Add support for Remainder[DecimalType] for Spark 3.4 and DB 11.3| -|[#8321](https://github.com/NVIDIA/spark-rapids/issues/8321)|[BUG] `test_read_hive_fixed_length_char` integ test fails on Spark 3.4| -|[#8225](https://github.com/NVIDIA/spark-rapids/issues/8225)|[BUG] GpuGetArrayItem only supports ints as the ordinal.| -|[#8294](https://github.com/NVIDIA/spark-rapids/issues/8294)|[BUG] ORC `CHAR(N)` columns written from Hive unreadable with RAPIDS plugin| -|[#8186](https://github.com/NVIDIA/spark-rapids/issues/8186)|[BUG] integration test test_cast_nested can fail with non-empty nulls| -|[#6190](https://github.com/NVIDIA/spark-rapids/issues/6190)|[SPARK-39731][SQL] Fix issue in CSV data sources when parsing dates in "yyyyMMdd" format with CORRECTED time parser policy| -|[#8185](https://github.com/NVIDIA/spark-rapids/issues/8185)|[BUG] Scala Test md5 can produce non-empty nulls (merge and set validity)| -|[#8235](https://github.com/NVIDIA/spark-rapids/issues/8235)|[BUG] Java agent crashed intermittently running integration tests| -|[#7485](https://github.com/NVIDIA/spark-rapids/issues/7485)|[BUG] stop using mergeAndSetValidity for any nested type| -|[#8263](https://github.com/NVIDIA/spark-rapids/issues/8263)|[BUG] Databricks 11.3 - Task failed while writing rows for Delta table - java.lang.Integer cannot be cast to java.lang.Long| -|[#7898](https://github.com/NVIDIA/spark-rapids/issues/7898)|Override `canonicalized` method to the Expressions| -|[#8254](https://github.com/NVIDIA/spark-rapids/issues/8254)|[BUG] Unable to determine Databricks version in azure Databricks instances| -|[#6967](https://github.com/NVIDIA/spark-rapids/issues/6967)|[BUG] Parquet List corner cases fail to be parsed| -|[#6991](https://github.com/NVIDIA/spark-rapids/issues/6991)|[BUG] Integration test failures in Spark - 3.4 SNAPSHOT build| -|[#7773](https://github.com/NVIDIA/spark-rapids/issues/7773)|[BUG] udf test failed cudf-py 23.04 ENV setup on databricks 11.3 runtime| -|[#7934](https://github.com/NVIDIA/spark-rapids/issues/7934)|[BUG] User app fails with OOM - GpuOutOfCoreSortIterator| -|[#8214](https://github.com/NVIDIA/spark-rapids/issues/8214)|[BUG] Exception when counting rows in an ORC file that has no column names| -|[#8160](https://github.com/NVIDIA/spark-rapids/issues/8160)|[BUG] Arithmetic_ops_test failure for Spark 3.4| -|[#7495](https://github.com/NVIDIA/spark-rapids/issues/7495)|Update GpuDataSource to match the change in Spark 3.4| -|[#8189](https://github.com/NVIDIA/spark-rapids/issues/8189)|[BUG] test_array_element_at_zero_index_fail test failures in Spark 3.4 | -|[#8043](https://github.com/NVIDIA/spark-rapids/issues/8043)|[BUG] Host memory leak in SerializedBatchIterator| -|[#8194](https://github.com/NVIDIA/spark-rapids/issues/8194)|[BUG] JVM agent crash intermittently in CI integration test | -|[#6182](https://github.com/NVIDIA/spark-rapids/issues/6182)|[SPARK-39319][CORE][SQL] Make query contexts as a part of `SparkThrowable`| -|[#7491](https://github.com/NVIDIA/spark-rapids/issues/7491)|[AUDIT][SPARK-41448][SQL] Make consistent MR job IDs in FileBatchWriter and FileFormatWriter| -|[#8149](https://github.com/NVIDIA/spark-rapids/issues/8149)|[BUG] dataproc init script does not fail clearly with newer versions of CUDA| -|[#7624](https://github.com/NVIDIA/spark-rapids/issues/7624)|[BUG] `test_parquet_write_encryption_option_fallback` failed| -|[#8019](https://github.com/NVIDIA/spark-rapids/issues/8019)|[BUG] Spark-3.4 - Integration test failures due to GpuCreateDataSourceTableAsSelectCommand| -|[#8017](https://github.com/NVIDIA/spark-rapids/issues/8017)|[BUG]Spark-3.4 Integration tests failure due to InsertIntoHadoopFsRelationCommand not running on GPU| -|[#7492](https://github.com/NVIDIA/spark-rapids/issues/7492)|[AUDIT][SPARK-41468][SQL][FOLLOWUP] Handle NamedLambdaVariables in EquivalentExpressions| -|[#6987](https://github.com/NVIDIA/spark-rapids/issues/6987)|[BUG] Unit Test failures in Spark-3.4 SNAPSHOT build| -|[#8171](https://github.com/NVIDIA/spark-rapids/issues/8171)|[BUG] ORC read failure when reading decimals with different precision/scale from write schema| -|[#7216](https://github.com/NVIDIA/spark-rapids/issues/7216)|[BUG] The PCBS tests fail on Spark 340| -|[#8016](https://github.com/NVIDIA/spark-rapids/issues/8016)|[BUG] Spark-3.4 - Integration tests failure due to missing InsertIntoHiveTable operator in GPU | -|[#8166](https://github.com/NVIDIA/spark-rapids/issues/8166)|Databricks Delta defaults to LEGACY for int96RebaseModeInWrite| -|[#8147](https://github.com/NVIDIA/spark-rapids/issues/8147)|[BUG] test_substring_column failed| -|[#8164](https://github.com/NVIDIA/spark-rapids/issues/8164)|[BUG] failed AnsiCastShim build in datasbricks 11.3 runtime| -|[#7757](https://github.com/NVIDIA/spark-rapids/issues/7757)|[BUG] Unit tests failure in AnsiCastOpSuite on Spark-3.4| -|[#7756](https://github.com/NVIDIA/spark-rapids/issues/7756)|[BUG] Unit test failure in AdaptiveQueryExecSuite on Spark-3.4| -|[#8153](https://github.com/NVIDIA/spark-rapids/issues/8153)|[BUG] `get-shim-versions-from-dist` workflow failing in CI| -|[#7961](https://github.com/NVIDIA/spark-rapids/issues/7961)|[BUG] understand why unspill can throw an OutOfMemoryError and not a RetryOOM| -|[#7755](https://github.com/NVIDIA/spark-rapids/issues/7755)|[BUG] Unit tests failures in WindowFunctionSuite and CostBasedOptimizerSuite on Spark-3.4| -|[#7752](https://github.com/NVIDIA/spark-rapids/issues/7752)|[BUG] Test in CastOpSuite fails on Spark-3.4| -|[#7754](https://github.com/NVIDIA/spark-rapids/issues/7754)|[BUG] unit test `nz timestamp` fails on Spark-3.4| -|[#7018](https://github.com/NVIDIA/spark-rapids/issues/7018)|[BUG] The unit test `sorted partitioned write` fails on Spark 3.4| -|[#8015](https://github.com/NVIDIA/spark-rapids/issues/8015)|[BUG] Spark 3.4 - Integration tests failure due to unsupported KnownNullable operator in Window| -|[#7751](https://github.com/NVIDIA/spark-rapids/issues/7751)|[BUG] Unit test `Write encrypted ORC fallback` fails on Spark-3.4| -|[#8117](https://github.com/NVIDIA/spark-rapids/issues/8117)|[BUG] Compile error in RapidsErrorUtils when building against Spark 3.4.0 release | -|[#5659](https://github.com/NVIDIA/spark-rapids/issues/5659)|[BUG] Minimize false positives when falling back to CPU for end of line/string anchors and newlines| -|[#8012](https://github.com/NVIDIA/spark-rapids/issues/8012)|[BUG] Integration tests failing due to CreateDataSourceTableAsSelectCommand in Spark-3.4| -|[#8061](https://github.com/NVIDIA/spark-rapids/issues/8061)|[BUG] join_test failed in integration tests| -|[#8018](https://github.com/NVIDIA/spark-rapids/issues/8018)|[BUG] Spark-3.4 - Integration test failures in window aggregations for decimal types| -|[#7581](https://github.com/NVIDIA/spark-rapids/issues/7581)|[BUG] INC AFTER CLOSE for ColumnVector during shutdown in the join code| - -### PRs -||| -|:---|:---| -|[#7465](https://github.com/NVIDIA/spark-rapids/pull/7465)|Add support for arrays in hashaggregate| -|[#8584](https://github.com/NVIDIA/spark-rapids/pull/8584)|Update 23.06 changelog 6/19 [skip ci]| -|[#8581](https://github.com/NVIDIA/spark-rapids/pull/8581)|Fix 321db 330db shims build errors caused by DB updates| -|[#8570](https://github.com/NVIDIA/spark-rapids/pull/8570)|Update changelog to latest [skip ci]| -|[#8567](https://github.com/NVIDIA/spark-rapids/pull/8567)|Fixed a link in config doc[skip ci]| -|[#8562](https://github.com/NVIDIA/spark-rapids/pull/8562)|Update changelog to latest 230612 [skip ci]| -|[#8560](https://github.com/NVIDIA/spark-rapids/pull/8560)|Fix relative path in config doc [skip ci]| -|[#8557](https://github.com/NVIDIA/spark-rapids/pull/8557)|Disable `JsonToStructs` for input schema other than `Map`| -|[#8549](https://github.com/NVIDIA/spark-rapids/pull/8549)|Revert "Handle caching empty batch (#8507)"| -|[#8507](https://github.com/NVIDIA/spark-rapids/pull/8507)|Handle caching empty batch| -|[#8528](https://github.com/NVIDIA/spark-rapids/pull/8528)|Update deps JNI and private version to 23.06.0| -|[#8492](https://github.com/NVIDIA/spark-rapids/pull/8492)|[Doc]update download docs for 2306 version[skip ci]| -|[#8510](https://github.com/NVIDIA/spark-rapids/pull/8510)|[Doc] address getting-started-on-prem document issues [skip ci]| -|[#8537](https://github.com/NVIDIA/spark-rapids/pull/8537)|Add limitation for the UCX shuffle keep_alive workaround [skip ci]| -|[#8526](https://github.com/NVIDIA/spark-rapids/pull/8526)|Fix `from_json` function failure when input contains empty or null strings| -|[#8529](https://github.com/NVIDIA/spark-rapids/pull/8529)|Init changelog 23.06 [skip ci]| -|[#8338](https://github.com/NVIDIA/spark-rapids/pull/8338)|Moved some configs to an advanced config page| -|[#8441](https://github.com/NVIDIA/spark-rapids/pull/8441)|Memoizing DataGens in integration tests| -|[#8516](https://github.com/NVIDIA/spark-rapids/pull/8516)|Avoid calling Table.merge with BinaryType columns| -|[#8515](https://github.com/NVIDIA/spark-rapids/pull/8515)|Fix warning about deprecated parquet config| -|[#8427](https://github.com/NVIDIA/spark-rapids/pull/8427)|[Doc] address Spark RAPIDS NVAIE VDR issues [skip ci]| -|[#8486](https://github.com/NVIDIA/spark-rapids/pull/8486)|Move task completion listener registration to after variables are initialized| -|[#8481](https://github.com/NVIDIA/spark-rapids/pull/8481)|Removed spark.rapids.sql.castDecimalToString.enabled and enabled GPU decimal to string by default| -|[#8485](https://github.com/NVIDIA/spark-rapids/pull/8485)|Disable `test_read_compressed_hive_text` on CDH.| -|[#8488](https://github.com/NVIDIA/spark-rapids/pull/8488)|Adds note on multi-threaded shuffle targetting <= 200 partitions and on TCP keep-alive for UCX [skip ci]| -|[#8414](https://github.com/NVIDIA/spark-rapids/pull/8414)|Add support for computing remainder with Decimal128 operands with more precision on Spark 3.4| -|[#8467](https://github.com/NVIDIA/spark-rapids/pull/8467)|Add retry support to GpuExpandExec| -|[#8433](https://github.com/NVIDIA/spark-rapids/pull/8433)|Add regression test for regexp_replace hanging with some inputs| -|[#8477](https://github.com/NVIDIA/spark-rapids/pull/8477)|Fix input binding of grouping expressions for complete aggregations| -|[#8464](https://github.com/NVIDIA/spark-rapids/pull/8464)|Remove NOP Maven javadoc plugin definition| -|[#8402](https://github.com/NVIDIA/spark-rapids/pull/8402)|Bring back UCX 1.14| -|[#8470](https://github.com/NVIDIA/spark-rapids/pull/8470)|Ensure the MT shuffle reader enables/disables with spark.rapids.shuff…| -|[#8462](https://github.com/NVIDIA/spark-rapids/pull/8462)|Fix compressed Hive text read on| -|[#8458](https://github.com/NVIDIA/spark-rapids/pull/8458)|Add check for negative id when creating new MR job id| -|[#8435](https://github.com/NVIDIA/spark-rapids/pull/8435)|Add in a few more retry improvements| -|[#8437](https://github.com/NVIDIA/spark-rapids/pull/8437)|Implement the bug fix for SPARK-41448 and shim it for Spark 3.2.4 and Spark 3.3.{2,3}| -|[#8420](https://github.com/NVIDIA/spark-rapids/pull/8420)|Fix reads for GZIP compressed Hive Text.| -|[#8445](https://github.com/NVIDIA/spark-rapids/pull/8445)|Document errors/warns in the logs during catalog shutdown [skip ci]| -|[#8438](https://github.com/NVIDIA/spark-rapids/pull/8438)|Revert "skip test_array_repeat_with_count_scalar for now (#8424)"| -|[#8385](https://github.com/NVIDIA/spark-rapids/pull/8385)|Reduce memory usage in GpuFileFormatDataWriter and GpuDynamicPartitionDataConcurrentWriter| -|[#8304](https://github.com/NVIDIA/spark-rapids/pull/8304)|Support combining small files for multi-threaded ORC reads| -|[#8413](https://github.com/NVIDIA/spark-rapids/pull/8413)|Stop double closing in json scan + skip test| -|[#8430](https://github.com/NVIDIA/spark-rapids/pull/8430)|Update docs for spark.rapids.filecache.checkStale default change [skip ci]| -|[#8424](https://github.com/NVIDIA/spark-rapids/pull/8424)|skip test_array_repeat_with_count_scalar to wait for fix #8409| -|[#8405](https://github.com/NVIDIA/spark-rapids/pull/8405)|Change TimeAdd/Sub subquery tests to use min/max| -|[#8408](https://github.com/NVIDIA/spark-rapids/pull/8408)|Document conventional dist jar layout for single-shim deployments [skip ci]| -|[#8394](https://github.com/NVIDIA/spark-rapids/pull/8394)|Removed "peak device memory" metric| -|[#8378](https://github.com/NVIDIA/spark-rapids/pull/8378)|Use spillable batch with retry in GpuCachedDoublePassWindowIterator| -|[#8392](https://github.com/NVIDIA/spark-rapids/pull/8392)|Update IDEA dev instructions [skip ci]| -|[#8387](https://github.com/NVIDIA/spark-rapids/pull/8387)|Rename inconsinstent profiles in api_validation| -|[#8374](https://github.com/NVIDIA/spark-rapids/pull/8374)|Avoid processing empty batch in ParquetCachedBatchSerializer| -|[#8386](https://github.com/NVIDIA/spark-rapids/pull/8386)|Fix check to do positional indexing in ORC| -|[#8360](https://github.com/NVIDIA/spark-rapids/pull/8360)|use matrix to combine multiple jdk* jobs in maven-verify CI [skip ci]| -|[#8371](https://github.com/NVIDIA/spark-rapids/pull/8371)|Fix V1 column name match is case-sensitive when dropping partition by columns| -|[#8368](https://github.com/NVIDIA/spark-rapids/pull/8368)|Doc Update: Clarify both line anchors ^ and $ for regular expression compatibility [skip ci]| -|[#8377](https://github.com/NVIDIA/spark-rapids/pull/8377)|Avoid a possible race in test_empty_filter| -|[#8354](https://github.com/NVIDIA/spark-rapids/pull/8354)|[DOCS] Updating tools docs in spark-rapids [skip ci]| -|[#8341](https://github.com/NVIDIA/spark-rapids/pull/8341)|Enable CachedBatchWriterSuite.testCompressColBatch| -|[#8264](https://github.com/NVIDIA/spark-rapids/pull/8264)|Make tables spillable by default| -|[#8364](https://github.com/NVIDIA/spark-rapids/pull/8364)|Fix NullPointerException in ORC multithreaded reader where we access context that could be null| -|[#8322](https://github.com/NVIDIA/spark-rapids/pull/8322)|Avoid out of bounds on GpuInMemoryTableScan when reading no columns| -|[#8342](https://github.com/NVIDIA/spark-rapids/pull/8342)|Elimnate javac warnings| -|[#8334](https://github.com/NVIDIA/spark-rapids/pull/8334)|Add in support for filter on empty batch| -|[#8355](https://github.com/NVIDIA/spark-rapids/pull/8355)|Speed up github verify checks [skip ci]| -|[#8356](https://github.com/NVIDIA/spark-rapids/pull/8356)|Enable auto-merge from branch-23.06 to branch-23.08 [skip ci]| -|[#8339](https://github.com/NVIDIA/spark-rapids/pull/8339)|Fix withResource order in GpuGenerateExec| -|[#8340](https://github.com/NVIDIA/spark-rapids/pull/8340)|Stop calling contiguousSplit without splits from GpuSortExec| -|[#8333](https://github.com/NVIDIA/spark-rapids/pull/8333)|Fix GpuTimeAdd handling both input expressions being GpuScalar| -|[#8302](https://github.com/NVIDIA/spark-rapids/pull/8302)|Add support for DecimalType in Remainder for Spark 3.4 and DB 11.3| -|[#8325](https://github.com/NVIDIA/spark-rapids/pull/8325)|Disable `test_read_hive_fixed_length_char` on Spark 3.4+.| -|[#8327](https://github.com/NVIDIA/spark-rapids/pull/8327)|Enable spark.sql.legacy.parquet.nanosAsLong for Spark 3.2.4| -|[#8328](https://github.com/NVIDIA/spark-rapids/pull/8328)|Fix Hive text file write to deal with CUDF changes| -|[#8309](https://github.com/NVIDIA/spark-rapids/pull/8309)|Fix GpuTopN with offset for multiple batches| -|[#8306](https://github.com/NVIDIA/spark-rapids/pull/8306)|Update code to deal with new retry semantics| -|[#8307](https://github.com/NVIDIA/spark-rapids/pull/8307)|Full ordinal support in GetArrayItem| -|[#8243](https://github.com/NVIDIA/spark-rapids/pull/8243)|Enable retry for Parquet writes| -|[#8295](https://github.com/NVIDIA/spark-rapids/pull/8295)|Fix ORC reader for `CHAR(N)` columns written from Hive| -|[#8298](https://github.com/NVIDIA/spark-rapids/pull/8298)|Append new authorized user to blossom-ci whitelist [skip ci]| -|[#8276](https://github.com/NVIDIA/spark-rapids/pull/8276)|Fallback to CPU for `enableDateTimeParsingFallback` configuration| -|[#8296](https://github.com/NVIDIA/spark-rapids/pull/8296)|Fix Multithreaded Readers working with Unity Catalog on Databricks| -|[#8273](https://github.com/NVIDIA/spark-rapids/pull/8273)|Add support for escaped dot in character class in regexp parser| -|[#8266](https://github.com/NVIDIA/spark-rapids/pull/8266)|Add test to confirm correct behavior for decimal average in Spark 3.4| -|[#8291](https://github.com/NVIDIA/spark-rapids/pull/8291)|Fix delta stats tracker conf| -|[#8287](https://github.com/NVIDIA/spark-rapids/pull/8287)|Fix Delta write stats if data schema is missing columns relative to table schema| -|[#8286](https://github.com/NVIDIA/spark-rapids/pull/8286)|Add Tencent cosn:// to default cloud schemes| -|[#8283](https://github.com/NVIDIA/spark-rapids/pull/8283)|Add split and retry support for filter| -|[#8290](https://github.com/NVIDIA/spark-rapids/pull/8290)|Pre-merge docker build stage to support containerd runtime [skip ci]| -|[#8257](https://github.com/NVIDIA/spark-rapids/pull/8257)|Support cuda12 jar's release [skip CI]| -|[#8274](https://github.com/NVIDIA/spark-rapids/pull/8274)|Add a unit test for reordered canonicalized expressions in BinaryComparison| -|[#8265](https://github.com/NVIDIA/spark-rapids/pull/8265)|Small code cleanup for pattern matching on Decimal type| -|[#8255](https://github.com/NVIDIA/spark-rapids/pull/8255)|Enable locals,patvars,privates unused Scalac checks| -|[#8234](https://github.com/NVIDIA/spark-rapids/pull/8234)|JDK17 build support in CI| -|[#8256](https://github.com/NVIDIA/spark-rapids/pull/8256)|Use env var with version files as fallback for IT DBR version| -|[#8239](https://github.com/NVIDIA/spark-rapids/pull/8239)|Add Spark 3.2.4 shim| -|[#8221](https://github.com/NVIDIA/spark-rapids/pull/8221)|[Doc] update getting started guide based on latest databricks env [skip ci]| -|[#8224](https://github.com/NVIDIA/spark-rapids/pull/8224)|Fix misinterpretation of Parquet's legacy ARRAY schemas.| -|[#8241](https://github.com/NVIDIA/spark-rapids/pull/8241)|Update to filecache API changes| -|[#8244](https://github.com/NVIDIA/spark-rapids/pull/8244)|Remove semicolon at the end of the package statement in Scala files| -|[#8245](https://github.com/NVIDIA/spark-rapids/pull/8245)|Remove redundant open of ORC file| -|[#8252](https://github.com/NVIDIA/spark-rapids/pull/8252)|Fix auto merge conflict 8250 [skip ci]| -|[#8170](https://github.com/NVIDIA/spark-rapids/pull/8170)|Update GpuRunningWindowExec to use OOM retry framework| -|[#8218](https://github.com/NVIDIA/spark-rapids/pull/8218)|Update to add 340 build and unit test in premerge and in JDK 11 build| -|[#8232](https://github.com/NVIDIA/spark-rapids/pull/8232)|Add integration tests for inferred schema| -|[#8223](https://github.com/NVIDIA/spark-rapids/pull/8223)|Use SupportsRuntimeV2Filtering in Spark 3.4.0| -|[#8233](https://github.com/NVIDIA/spark-rapids/pull/8233)|cudf-udf integration test against python3.9 [skip ci]| -|[#8226](https://github.com/NVIDIA/spark-rapids/pull/8226)|Offset support for TakeOrderedAndProject| -|[#8237](https://github.com/NVIDIA/spark-rapids/pull/8237)|Use weak keys in executor broadcast plan cache| -|[#8229](https://github.com/NVIDIA/spark-rapids/pull/8229)|Upgrade to jacoco 0.8.8 for JDK 17 support| -|[#8216](https://github.com/NVIDIA/spark-rapids/pull/8216)|Add oom retry handling for GpuGenerate.fixedLenLazyArrayGenerate| -|[#8191](https://github.com/NVIDIA/spark-rapids/pull/8191)|Add in retry-work to GPU OutOfCore Sort| -|[#8228](https://github.com/NVIDIA/spark-rapids/pull/8228)|Partial JDK 17 support| -|[#8227](https://github.com/NVIDIA/spark-rapids/pull/8227)|Adjust defaults for better performance out of the box| -|[#8212](https://github.com/NVIDIA/spark-rapids/pull/8212)|Add file caching| -|[#8179](https://github.com/NVIDIA/spark-rapids/pull/8179)|Fall back to CPU for try_cast in Spark 3.4.0| -|[#8220](https://github.com/NVIDIA/spark-rapids/pull/8220)|Batch install-file executions in a single JVM| -|[#8215](https://github.com/NVIDIA/spark-rapids/pull/8215)|Fix count from ORC files with no column names| -|[#8192](https://github.com/NVIDIA/spark-rapids/pull/8192)|Handle PySparkException in case of literal expressions| -|[#8190](https://github.com/NVIDIA/spark-rapids/pull/8190)|Fix element_at_index_zero integration test by using newer error message from Spark 3.4.0| -|[#8203](https://github.com/NVIDIA/spark-rapids/pull/8203)|Clean up queued batches on task failures in RapidsShuffleThreadedBlockIterator| -|[#8207](https://github.com/NVIDIA/spark-rapids/pull/8207)|Support `std` aggregation in reduction| -|[#8174](https://github.com/NVIDIA/spark-rapids/pull/8174)|[FEA] support json to struct function | -|[#8195](https://github.com/NVIDIA/spark-rapids/pull/8195)|Bump mockito to 3.12.4| -|[#8193](https://github.com/NVIDIA/spark-rapids/pull/8193)|Increase databricks cluster autotermination to 6.5 hours [skip ci]| -|[#8182](https://github.com/NVIDIA/spark-rapids/pull/8182)|Support STRING order-by columns for RANGE window functions| -|[#8167](https://github.com/NVIDIA/spark-rapids/pull/8167)|Add oom retry handling to GpuGenerateExec.doGenerate path| -|[#8183](https://github.com/NVIDIA/spark-rapids/pull/8183)|Disable asserts for non-empty nulls| -|[#8177](https://github.com/NVIDIA/spark-rapids/pull/8177)|Fix 340 shim of GpuCreateDataSourceTableAsSelectCommand and shim GpuDataSource for 3.4.0| -|[#8159](https://github.com/NVIDIA/spark-rapids/pull/8159)|Verify CPU fallback class when creating HIVE table [Databricks]| -|[#8180](https://github.com/NVIDIA/spark-rapids/pull/8180)|Follow-up for ORC Decimal read failure (#8172)| -|[#8172](https://github.com/NVIDIA/spark-rapids/pull/8172)|Fix ORC decimal read when precision/scale changes| -|[#7227](https://github.com/NVIDIA/spark-rapids/pull/7227)|Fix PCBS integration tests for Spark-3.4| -|[#8175](https://github.com/NVIDIA/spark-rapids/pull/8175)|Restore test_substring_column| -|[#8162](https://github.com/NVIDIA/spark-rapids/pull/8162)|Support Java 17 for packaging| -|[#8169](https://github.com/NVIDIA/spark-rapids/pull/8169)|Fix AnsiCastShim for 330db| -|[#8168](https://github.com/NVIDIA/spark-rapids/pull/8168)|[DOC] Updating profiling/qualification docs for usability improvements [skip ci]| -|[#8144](https://github.com/NVIDIA/spark-rapids/pull/8144)|Add 340 shim for GpuInsertIntoHiveTable| -|[#8143](https://github.com/NVIDIA/spark-rapids/pull/8143)|Add handling for SplitAndRetryOOM in nextCbFromGatherer| -|[#8102](https://github.com/NVIDIA/spark-rapids/pull/8102)|Rewrite two tests from AnsiCastOpSuite in Python and make compatible with Spark 3.4.0| -|[#8152](https://github.com/NVIDIA/spark-rapids/pull/8152)|Fix Spark-3.4 test failure in AdaptiveQueryExecSuite| -|[#8154](https://github.com/NVIDIA/spark-rapids/pull/8154)|Use repo1.maven.org/maven2 instead of default apache central url | -|[#8150](https://github.com/NVIDIA/spark-rapids/pull/8150)|xfail test_substring_column| -|[#8128](https://github.com/NVIDIA/spark-rapids/pull/8128)|Fix CastOpSuite failures with Spark 3.4| -|[#8145](https://github.com/NVIDIA/spark-rapids/pull/8145)|Fix nz timestamp unit tests| -|[#8146](https://github.com/NVIDIA/spark-rapids/pull/8146)|Set version of slf4j for Spark 3.4.0| -|[#8058](https://github.com/NVIDIA/spark-rapids/pull/8058)|Add retry to BatchByKeyIterator| -|[#8142](https://github.com/NVIDIA/spark-rapids/pull/8142)|Enable ParquetWriterSuite test 'sorted partitioned write' on Spark 3.4.0| -|[#8035](https://github.com/NVIDIA/spark-rapids/pull/8035)|[FEA] support StringTranslate function| -|[#8136](https://github.com/NVIDIA/spark-rapids/pull/8136)|Add GPU support for KnownNullable expression (Spark 3.4.0)| -|[#8096](https://github.com/NVIDIA/spark-rapids/pull/8096)|Add OOM retry handling for existence joins| -|[#8139](https://github.com/NVIDIA/spark-rapids/pull/8139)|Fix auto merge conflict 8138 [skip ci]| -|[#8135](https://github.com/NVIDIA/spark-rapids/pull/8135)|Fix Orc writer test failure with Spark 3.4| -|[#8129](https://github.com/NVIDIA/spark-rapids/pull/8129)|Fix compile error with Spark 3.4.0 release and bump to use 3.4.0 release JAR| -|[#8093](https://github.com/NVIDIA/spark-rapids/pull/8093)|Add cuda12 build support [skip ci]| -|[#8108](https://github.com/NVIDIA/spark-rapids/pull/8108)|Make Arm methods static| -|[#8060](https://github.com/NVIDIA/spark-rapids/pull/8060)|Support repetitions in regexp choice expressions| -|[#8081](https://github.com/NVIDIA/spark-rapids/pull/8081)|Re-enable empty repetition near end-of-line anchor for rlike, regexp_extract and regexp_replace| -|[#8075](https://github.com/NVIDIA/spark-rapids/pull/8075)|Update some integration tests so that they are compatible with Spark 3.4.0| -|[#8063](https://github.com/NVIDIA/spark-rapids/pull/8063)|Update docker to support integration tests against JDK17 [skip ci]| -|[#8047](https://github.com/NVIDIA/spark-rapids/pull/8047)|Enable line/string anchors in choice| -|[#7996](https://github.com/NVIDIA/spark-rapids/pull/7996)|Sub-partitioning supports repartitioning the input data multiple times| -|[#8009](https://github.com/NVIDIA/spark-rapids/pull/8009)|Add in some more retry blocks| -|[#8051](https://github.com/NVIDIA/spark-rapids/pull/8051)|MINOR: Improve assertion error in assert_py4j_exception| -|[#8020](https://github.com/NVIDIA/spark-rapids/pull/8020)|[FEA] Add Spark 3.3.3-SNAPSHOT to shims| -|[#8034](https://github.com/NVIDIA/spark-rapids/pull/8034)|Fix the check for dedicated per-shim files [skip ci]| -|[#7978](https://github.com/NVIDIA/spark-rapids/pull/7978)|Update JNI and private deps version to 23.06.0-SNAPSHOT| -|[#7965](https://github.com/NVIDIA/spark-rapids/pull/7965)|Remove stale references to the pre-shimplify dirs| -|[#7948](https://github.com/NVIDIA/spark-rapids/pull/7948)|Init plugin version 23.06.0-SNAPSHOT| - -## Release 23.04 - -### Features -||| -|:---|:---| -|[#7992](https://github.com/NVIDIA/spark-rapids/issues/7992)|[Audit][SPARK-40819][SQL][3.3] Timestamp nanos behaviour regression (parquet reader)| -|[#7985](https://github.com/NVIDIA/spark-rapids/issues/7985)|[FEA] Expose Alluxio master URL to support K8s Env| -|[#7880](https://github.com/NVIDIA/spark-rapids/issues/7880)|[FEA] retry framework task level metrics| -|[#7394](https://github.com/NVIDIA/spark-rapids/issues/7394)|[FEA] Support Delta Lake auto compaction| -|[#7920](https://github.com/NVIDIA/spark-rapids/issues/7920)|[FEA] Remove SpillCallback and executor level spill metrics| -|[#7463](https://github.com/NVIDIA/spark-rapids/issues/7463)|[FEA] Drop support for Databricks-9.1 ML LTS| -|[#7253](https://github.com/NVIDIA/spark-rapids/issues/7253)|[FEA] Implement OOM retry framework| -|[#7042](https://github.com/NVIDIA/spark-rapids/issues/7042)|[FEA] Add support in the tools event parsing for ML functions, libraries, and expressions| - -### Performance -||| -|:---|:---| -|[#7907](https://github.com/NVIDIA/spark-rapids/issues/7907)|[FEA] Optimize regexp_replace in multi-replace scenarios| -|[#7691](https://github.com/NVIDIA/spark-rapids/issues/7691)|[FEA] Upgrade and document UCX 1.14| -|[#6516](https://github.com/NVIDIA/spark-rapids/issues/6516)|[FEA] Enable RAPIDS Shuffle Manager smoke testing for the databricks environment| -|[#7695](https://github.com/NVIDIA/spark-rapids/issues/7695)|[FEA] Transpile regexp_extract expression to only have the single capture group that is needed| -|[#7393](https://github.com/NVIDIA/spark-rapids/issues/7393)|[FEA] Support Delta Lake optimized write| -|[#6561](https://github.com/NVIDIA/spark-rapids/issues/6561)|[FEA] Make SpillableColumnarBatch inform Spill code of actual usage of the batch| -|[#6864](https://github.com/NVIDIA/spark-rapids/issues/6864)|[BUG] Spilling logic can spill data that cannot be freed| - -### Bugs Fixed -||| -|:---|:---| -|[#8111](https://github.com/NVIDIA/spark-rapids/issues/8111)|[BUG] test_delta_delete_entire_table failed in databricks 10.4 runtime| -|[#8074](https://github.com/NVIDIA/spark-rapids/issues/8074)|[BUG] test_parquet_read_nano_as_longs_31x failed on Dataproc| -|[#7997](https://github.com/NVIDIA/spark-rapids/issues/7997)|[BUG] executors died with too much off heap in yarn UCX CI `udf_test`| -|[#8067](https://github.com/NVIDIA/spark-rapids/issues/8067)|[BUG] extras jar sometimes fails to load| -|[#8038](https://github.com/NVIDIA/spark-rapids/issues/8038)|[BUG] vector leaked when running NDS 3TB with memory restricted| -|[#8030](https://github.com/NVIDIA/spark-rapids/issues/8030)|[BUG] test_re_replace_no_unicode_fallback test failes on integratoin tests Yarn| -|[#7971](https://github.com/NVIDIA/spark-rapids/issues/7971)|[BUG] withRestoreOnRetry should look at Throwable causes in addition to retry OOMs| -|[#6990](https://github.com/NVIDIA/spark-rapids/issues/6990)|[BUG] Several integration test failures in Spark-3.4 SNAPSHOT build| -|[#7924](https://github.com/NVIDIA/spark-rapids/issues/7924)|[BUG] Physical plan for regexp_extract does not escape newlines| -|[#7341](https://github.com/NVIDIA/spark-rapids/issues/7341)|[BUG] Leverage OOM retry framework for ORC writes| -|[#7921](https://github.com/NVIDIA/spark-rapids/issues/7921)|[BUG] ORC writes with bloom filters enabled do not fall back to the CPU| -|[#7818](https://github.com/NVIDIA/spark-rapids/issues/7818)|[BUG] Reuse of broadcast exchange can lead to unnecessary CPU fallback| -|[#7904](https://github.com/NVIDIA/spark-rapids/issues/7904)|[BUG] test_write_sql_save_table sporadically fails on Pascal| -|[#7922](https://github.com/NVIDIA/spark-rapids/issues/7922)|[BUG] YARN IT test test_optimized_hive_ctas_basic failures| -|[#7933](https://github.com/NVIDIA/spark-rapids/issues/7933)|[BUG] NDS running hits DPP error on Databricks 10.4 when enable Alluxio cache.| -|[#7850](https://github.com/NVIDIA/spark-rapids/issues/7850)|[BUG] nvcomp usage for the UCX mode of the shuffle manager is broken| -|[#7927](https://github.com/NVIDIA/spark-rapids/issues/7927)|[BUG] Shimplify adding new shim layer fails| -|[#6138](https://github.com/NVIDIA/spark-rapids/issues/6138)|[BUG] cast timezone-awareness check positive for date/time-unrelated types| -|[#7914](https://github.com/NVIDIA/spark-rapids/issues/7914)|[BUG] Parquet read with integer upcast crashes| -|[#6961](https://github.com/NVIDIA/spark-rapids/issues/6961)|[BUG] Using `\d` (or others) inside a character class results in "Unsupported escape character" | -|[#7908](https://github.com/NVIDIA/spark-rapids/issues/7908)|[BUG] Interpolate spark.version.classifier into scala:compile `secondaryCacheDir`| -|[#7707](https://github.com/NVIDIA/spark-rapids/issues/7707)|[BUG] IndexOutOfBoundsException when joining on 2 integer columns with DPP| -|[#7892](https://github.com/NVIDIA/spark-rapids/issues/7892)|[BUG] Invalid or unsupported escape character `t` when trying to use tab in regexp_replace| -|[#7640](https://github.com/NVIDIA/spark-rapids/issues/7640)|[BUG] GPU OOM using GpuRegExpExtract| -|[#7814](https://github.com/NVIDIA/spark-rapids/issues/7814)|[BUG] GPU's output differs from CPU's for big decimals when joining by sub-partitioning algorithm| -|[#7796](https://github.com/NVIDIA/spark-rapids/issues/7796)|[BUG] Parquet chunked reader size of output exceeds column size limit| -|[#7833](https://github.com/NVIDIA/spark-rapids/issues/7833)|[BUG] run_pyspark_from_build computes 5 MiB per runner instead of 5 GiB| -|[#7855](https://github.com/NVIDIA/spark-rapids/issues/7855)|[BUG] shuffle_test test_hash_grpby_sum failed OOM in premerge CI| -|[#7858](https://github.com/NVIDIA/spark-rapids/issues/7858)|[BUG] HostToGpuCoalesceIterator leaks all host batches| -|[#7826](https://github.com/NVIDIA/spark-rapids/issues/7826)|[BUG] buildall dist jar contains aggregator dependency| -|[#7729](https://github.com/NVIDIA/spark-rapids/issues/7729)|[BUG] Active GPU thread not holding the semaphore| -|[#7820](https://github.com/NVIDIA/spark-rapids/issues/7820)|[BUG] Restore pandas require_minimum_pandas_version() check| -|[#7829](https://github.com/NVIDIA/spark-rapids/issues/7829)|[BUG] Parquet buffer time not correct with multithreaded combining reader| -|[#7819](https://github.com/NVIDIA/spark-rapids/issues/7819)|[BUG] GpuDeviceManager allows setting UVM regardless of other RMM configs| -|[#7643](https://github.com/NVIDIA/spark-rapids/issues/7643)|[BUG] Databricks init scripts can fail silently| -|[#7799](https://github.com/NVIDIA/spark-rapids/issues/7799)|[BUG] Cannot lexicographic compare a table with a LIST of STRUCT column at ai.rapids.cudf.Table.sortOrder| -|[#7767](https://github.com/NVIDIA/spark-rapids/issues/7767)|[BUG] VS Code / Metals / Bloop integration fails with java.lang.RuntimeException: 'boom' | -|[#6383](https://github.com/NVIDIA/spark-rapids/issues/6383)|[SPARK-40066][SQL] ANSI mode: always return null on invalid access to map column| -|[#7093](https://github.com/NVIDIA/spark-rapids/issues/7093)|[BUG] Spark-3.4 - Integration test failures in map_test| -|[#7779](https://github.com/NVIDIA/spark-rapids/issues/7779)|[BUG] AlluxioUtilsSuite uses illegal character underscore in URI scheme| -|[#7725](https://github.com/NVIDIA/spark-rapids/issues/7725)|[BUG] cache_test failed w/ ParquetCachedBatchSerializer in spark 3.3.2-SNAPSHOT| -|[#7639](https://github.com/NVIDIA/spark-rapids/issues/7639)|[BUG] Databricks premerge failing with cannot find pytest| -|[#7694](https://github.com/NVIDIA/spark-rapids/issues/7694)|[BUG] Spark-3.4 build breaks due to removing InternalRowSet| -|[#6598](https://github.com/NVIDIA/spark-rapids/issues/6598)|[BUG] CUDA error when casting large column vector from long to string| -|[#7739](https://github.com/NVIDIA/spark-rapids/issues/7739)|[BUG] udf_test failed in databricks 11.3 ENV| -|[#5748](https://github.com/NVIDIA/spark-rapids/issues/5748)|[BUG] 3 cast tests fails on Spark 3.4.0| -|[#7688](https://github.com/NVIDIA/spark-rapids/issues/7688)|[BUG] GpuParquetScan fails with NullPointerException - Delta CDF query| -|[#7648](https://github.com/NVIDIA/spark-rapids/issues/7648)|[BUG] java.lang.ClassCastException: SerializeConcatHostBuffersDeserializeBatch cannot be cast to.HashedRelation| -|[#6988](https://github.com/NVIDIA/spark-rapids/issues/6988)|[BUG] Integration test failures with DecimalType on Spark-3.4 SNAPSHOT build| -|[#7615](https://github.com/NVIDIA/spark-rapids/issues/7615)|[BUG] Build fails on Spark 3.4| -|[#7557](https://github.com/NVIDIA/spark-rapids/issues/7557)|[AUDIT][SPARK-41970] Introduce SparkPath for typesafety| -|[#7617](https://github.com/NVIDIA/spark-rapids/issues/7617)|[BUG] Build 340 failed due to miss shim code for GpuShuffleMeta| - -### PRs -||| -|:---|:---| -|[#8251](https://github.com/NVIDIA/spark-rapids/pull/8251)|Update 23.04 changelog w/ hotfix [skip ci]| -|[#8247](https://github.com/NVIDIA/spark-rapids/pull/8247)|Bump up plugin version to 23.04.1-SNAPSHOT| -|[#8248](https://github.com/NVIDIA/spark-rapids/pull/8248)|[Doc] update versions for 2304 hot fix [skip ci]| -|[#8246](https://github.com/NVIDIA/spark-rapids/pull/8246)|Cherry-pick hotfix: Use weak keys in executor broadcast plan cache| -|[#8092](https://github.com/NVIDIA/spark-rapids/pull/8092)|Init changelog for 23.04 [skip ci]| -|[#8109](https://github.com/NVIDIA/spark-rapids/pull/8109)|Bump up JNI and private version to released 23.04.0| -|[#7939](https://github.com/NVIDIA/spark-rapids/pull/7939)|[Doc]update download docs for 2304 version[skip ci]| -|[#8127](https://github.com/NVIDIA/spark-rapids/pull/8127)|Avoid SQL result check of Delta Lake full delete on Databricks| -|[#8098](https://github.com/NVIDIA/spark-rapids/pull/8098)|Fix loading of ORC files with missing column names| -|[#8110](https://github.com/NVIDIA/spark-rapids/pull/8110)|Update ML integration page docs page [skip ci]| -|[#8103](https://github.com/NVIDIA/spark-rapids/pull/8103)|Add license of spark-rapids private in NOTICE-binary[skip ci]| -|[#8100](https://github.com/NVIDIA/spark-rapids/pull/8100)|Update/improve EMR getting started documentation [skip ci]| -|[#8101](https://github.com/NVIDIA/spark-rapids/pull/8101)|Improve OOM exception messages| -|[#8087](https://github.com/NVIDIA/spark-rapids/pull/8087)|Add an FAQ entry on encryption support [skip ci]| -|[#8076](https://github.com/NVIDIA/spark-rapids/pull/8076)|Add in docs about RetryOOM [skip ci]| -|[#8077](https://github.com/NVIDIA/spark-rapids/pull/8077)|Temporarily skip `test_parquet_read_nano_as_longs_31x` on dataproc| -|[#8071](https://github.com/NVIDIA/spark-rapids/pull/8071)|Fix error in deploy script [skip ci]| -|[#8070](https://github.com/NVIDIA/spark-rapids/pull/8070)|Fixes closed RapidsShuffleHandleImpl leak in ShuffleBufferCatalog| -|[#8069](https://github.com/NVIDIA/spark-rapids/pull/8069)|Fix loading extra jar| -|[#8044](https://github.com/NVIDIA/spark-rapids/pull/8044)|Fall back to CPU if `spark.sql.legacy.parquet.nanosAsLong` is set| -|[#8049](https://github.com/NVIDIA/spark-rapids/pull/8049)|[DOC] Adding user tool info to main qualification docs page [skip ci]| -|[#8040](https://github.com/NVIDIA/spark-rapids/pull/8040)|Fix device vector leak in RmmRetryIterator.splitSpillableInHalfByRows| -|[#8031](https://github.com/NVIDIA/spark-rapids/pull/8031)|Fix regexp_replace integration test that should fallback when unicode is disabled| -|[#7828](https://github.com/NVIDIA/spark-rapids/pull/7828)|Fallback to arena allocator if RMM failed to initialize with async allocator| -|[#8006](https://github.com/NVIDIA/spark-rapids/pull/8006)|Handle caused-by retry exceptions in withRestoreOnRetry| -|[#8013](https://github.com/NVIDIA/spark-rapids/pull/8013)|[Doc] Adding user tools info into EMR getting started guide [skip ci]| -|[#8007](https://github.com/NVIDIA/spark-rapids/pull/8007)|Fix leak where RapidsShuffleIterator for a completed task was kept alive| -|[#8010](https://github.com/NVIDIA/spark-rapids/pull/8010)|Specify that UCX should be 1.12.1 only [skip ci]| -|[#7967](https://github.com/NVIDIA/spark-rapids/pull/7967)|Transpile simple choice-type regular expressions into lists of choices to use with string replace multi| -|[#7902](https://github.com/NVIDIA/spark-rapids/pull/7902)|Add oom retry handling for createGatherer in gpu hash joins| -|[#7986](https://github.com/NVIDIA/spark-rapids/pull/7986)|Provides a config to expose Alluxio master URL to support K8s Env| -|[#7936](https://github.com/NVIDIA/spark-rapids/pull/7936)|Stop showing internal details of ternary expressions in SparkPlan.toString| -|[#7972](https://github.com/NVIDIA/spark-rapids/pull/7972)|Add in retry for ORC writes| -|[#7975](https://github.com/NVIDIA/spark-rapids/pull/7975)|Publish documentation for private configs| -|[#7976](https://github.com/NVIDIA/spark-rapids/pull/7976)|Disable GPU write for ORC and Parquet, if bloom-filters are enabled.| -|[#7925](https://github.com/NVIDIA/spark-rapids/pull/7925)|Inject RetryOOM in CI where retry iterator is used| -|[#7970](https://github.com/NVIDIA/spark-rapids/pull/7970)|[DOCS] Updating qual tool docs from latest in tools repo| -|[#7952](https://github.com/NVIDIA/spark-rapids/pull/7952)|Add in minimal retry metrics| -|[#7884](https://github.com/NVIDIA/spark-rapids/pull/7884)|Add Python requirements file for integration tests| -|[#7958](https://github.com/NVIDIA/spark-rapids/pull/7958)|Add CheckpointRestore trait and withRestoreOnRetry| -|[#7849](https://github.com/NVIDIA/spark-rapids/pull/7849)|Fix CPU broadcast exchanges being left unreplaced due to AQE and reuse| -|[#7944](https://github.com/NVIDIA/spark-rapids/pull/7944)|Fix issue with dynamicpruning filters used in converted GPU scans when S3 paths are replaced with alluxio| -|[#7949](https://github.com/NVIDIA/spark-rapids/pull/7949)|Lazily unspill the stream batches for joins by sub-partitioning| -|[#7951](https://github.com/NVIDIA/spark-rapids/pull/7951)|Fix PMD docs URL [skip ci]| -|[#7945](https://github.com/NVIDIA/spark-rapids/pull/7945)|Enable automerge from 2304 to 2306 [skip ci]| -|[#7935](https://github.com/NVIDIA/spark-rapids/pull/7935)|Add GPU level task metrics| -|[#7930](https://github.com/NVIDIA/spark-rapids/pull/7930)|Add OOM Retry handling for join gather next| -|[#7942](https://github.com/NVIDIA/spark-rapids/pull/7942)|Revert "Upgrade to UCX 1.14.0 (#7877)"| -|[#7889](https://github.com/NVIDIA/spark-rapids/pull/7889)|Support auto-compaction for Delta tables on| -|[#7937](https://github.com/NVIDIA/spark-rapids/pull/7937)|Support hashing different types for sub-partitioning| -|[#7877](https://github.com/NVIDIA/spark-rapids/pull/7877)|Upgrade to UCX 1.14.0| -|[#7926](https://github.com/NVIDIA/spark-rapids/pull/7926)|Fixes issue where UCX compressed tables would be decompressed multiple times| -|[#7928](https://github.com/NVIDIA/spark-rapids/pull/7928)|Adjust assert for SparkShims: no longer a per-shim file [skip ci]| -|[#7895](https://github.com/NVIDIA/spark-rapids/pull/7895)|Some refactor of shuffled hash join| -|[#7894](https://github.com/NVIDIA/spark-rapids/pull/7894)|Support tagging `Cast` for timezone conditionally| -|[#7915](https://github.com/NVIDIA/spark-rapids/pull/7915)|Fix upcast of signed integral values when reading from Parquet| -|[#7879](https://github.com/NVIDIA/spark-rapids/pull/7879)|Retry for file read operations| -|[#7905](https://github.com/NVIDIA/spark-rapids/pull/7905)|[Doc] Fix some documentation issue based on VPR feedback on 23.04 branch (new PR) [skip CI] | -|[#7912](https://github.com/NVIDIA/spark-rapids/pull/7912)|[Doc] Hotfix gh-pages for compatibility page format issue [skip ci]| -|[#7913](https://github.com/NVIDIA/spark-rapids/pull/7913)|Fix resolution of GpuRapidsProcessDeltaMergeJoinExec expressions| -|[#7916](https://github.com/NVIDIA/spark-rapids/pull/7916)|Add clarification for Delta Lake optimized write fallback due to sorting [skip ci]| -|[#7906](https://github.com/NVIDIA/spark-rapids/pull/7906)|ColumnarToRowIterator should release the semaphore if parent is empty| -|[#7909](https://github.com/NVIDIA/spark-rapids/pull/7909)|Interpolate buildver into secondaryCacheDir| -|[#7844](https://github.com/NVIDIA/spark-rapids/pull/7844)|Update alluxio version to 2.9.0| -|[#7896](https://github.com/NVIDIA/spark-rapids/pull/7896)|Update regular expression parser to handle escape character sequences| -|[#7885](https://github.com/NVIDIA/spark-rapids/pull/7885)|Add Join Reordering Integration Test| -|[#7862](https://github.com/NVIDIA/spark-rapids/pull/7862)|Reduce shimming of GpuFlatMapGroupsInPandasExec| -|[#7859](https://github.com/NVIDIA/spark-rapids/pull/7859)|Remove 3.1.4-SNAPSHOT shim code| -|[#7835](https://github.com/NVIDIA/spark-rapids/pull/7835)|Update to pull the rapids spark extra plugin jar| -|[#7863](https://github.com/NVIDIA/spark-rapids/pull/7863)|[Doc] Address document issues [skip ci]| -|[#7794](https://github.com/NVIDIA/spark-rapids/pull/7794)|Implement sub partitioning for large/skewed hash joins| -|[#7864](https://github.com/NVIDIA/spark-rapids/pull/7864)|Add in basic support for OOM retry for project and filter| -|[#7878](https://github.com/NVIDIA/spark-rapids/pull/7878)|Fixing host memory calculation to properly be 5GiB| -|[#7860](https://github.com/NVIDIA/spark-rapids/pull/7860)|Enable manual copy-and-paste code detection [skip ci]| -|[#7852](https://github.com/NVIDIA/spark-rapids/pull/7852)|Use withRetry in GpuCoalesceBatches| -|[#7857](https://github.com/NVIDIA/spark-rapids/pull/7857)|Unshim getSparkShimVersion| -|[#7854](https://github.com/NVIDIA/spark-rapids/pull/7854)|Optimize `regexp_extract*` by transpiling capture groups to non-capturing groups so that only the required capturing group is manifested| -|[#7853](https://github.com/NVIDIA/spark-rapids/pull/7853)|Remove support for Databricks-9.1 ML LTS| -|[#7856](https://github.com/NVIDIA/spark-rapids/pull/7856)|Update references to reduced dependencies pom [skip ci]| -|[#7848](https://github.com/NVIDIA/spark-rapids/pull/7848)|Initialize only sql-plugin to prevent missing submodule artifacts in buildall [skip ci]| -|[#7839](https://github.com/NVIDIA/spark-rapids/pull/7839)|Add reduced pom to dist jar in the packaging phase| -|[#7822](https://github.com/NVIDIA/spark-rapids/pull/7822)|Add in support for OOM retry| -|[#7846](https://github.com/NVIDIA/spark-rapids/pull/7846)|Stop releasing semaphore in GpuUserDefinedFunction| -|[#7840](https://github.com/NVIDIA/spark-rapids/pull/7840)|Execute mvn initialize before parallel build [skip ci]| -|[#7222](https://github.com/NVIDIA/spark-rapids/pull/7222)|Automatic conversion to shimplified directory structure| -|[#7824](https://github.com/NVIDIA/spark-rapids/pull/7824)|Use withRetryNoSplit in BasicWindowCalc| -|[#7842](https://github.com/NVIDIA/spark-rapids/pull/7842)|Try fix broken blackduck scan [skip ci]| -|[#7841](https://github.com/NVIDIA/spark-rapids/pull/7841)|Hardcode scan projects [skip ci]| -|[#7830](https://github.com/NVIDIA/spark-rapids/pull/7830)|Fix buffer and Filter time with Parquet multithreaded combine reader| -|[#7678](https://github.com/NVIDIA/spark-rapids/pull/7678)|Premerge CI to drop support for Databricks-9.1 ML LTS| -|[#7823](https://github.com/NVIDIA/spark-rapids/pull/7823)|[BUG] Enable managed memory only if async allocator is not used| -|[#7821](https://github.com/NVIDIA/spark-rapids/pull/7821)|Restore pandas import check in db113 runtime| -|[#7810](https://github.com/NVIDIA/spark-rapids/pull/7810)|UnXfail large decimal window range queries| -|[#7771](https://github.com/NVIDIA/spark-rapids/pull/7771)|Add withRetry and withRetryNoSplit and PoC with hash aggregate| -|[#7815](https://github.com/NVIDIA/spark-rapids/pull/7815)|Fix the hyperlink to shimplify.py [skip ci]| -|[#7812](https://github.com/NVIDIA/spark-rapids/pull/7812)|Fallback Delta Lake optimized writes if GPU cannot support partitioning| -|[#7791](https://github.com/NVIDIA/spark-rapids/pull/7791)|Doc changes for new nested JSON reader [skip ci]| -|[#7797](https://github.com/NVIDIA/spark-rapids/pull/7797)|Add GPU support for EphemeralSubstring| -|[#7561](https://github.com/NVIDIA/spark-rapids/pull/7561)|Ant task to automatically convert to a simple shim layout| -|[#7789](https://github.com/NVIDIA/spark-rapids/pull/7789)|Update script for integration tests on Databricks| -|[#7798](https://github.com/NVIDIA/spark-rapids/pull/7798)|Do not error out DB IT test script when pytest code 5 [skip ci]| -|[#7787](https://github.com/NVIDIA/spark-rapids/pull/7787)|Document a workaround to RuntimeException 'boom' [skip ci]| -|[#7786](https://github.com/NVIDIA/spark-rapids/pull/7786)|Fix nested loop joins when there's no build-side columns| -|[#7730](https://github.com/NVIDIA/spark-rapids/pull/7730)|[FEA] Switch to `regex_program` APIs| -|[#7788](https://github.com/NVIDIA/spark-rapids/pull/7788)|Support released spark 3.3.2| -|[#7095](https://github.com/NVIDIA/spark-rapids/pull/7095)|Fix the failure in `map_test.py` on Spark 3.4| -|[#7769](https://github.com/NVIDIA/spark-rapids/pull/7769)|Fix issue where GpuSemaphore can throw NPE when logDebug is on| -|[#7780](https://github.com/NVIDIA/spark-rapids/pull/7780)|Make AlluxioUtilsSuite pass for 340| -|[#7772](https://github.com/NVIDIA/spark-rapids/pull/7772)|Fix cache test for Spark 3.3.2| -|[#7717](https://github.com/NVIDIA/spark-rapids/pull/7717)|Move Databricks variables into blossom-lib| -|[#7749](https://github.com/NVIDIA/spark-rapids/pull/7749)|Support Delta Lake optimized write on Databricks| -|[#7696](https://github.com/NVIDIA/spark-rapids/pull/7696)|Create new version of GpuBatchScanExec to fix Spark-3.4 build| -|[#7747](https://github.com/NVIDIA/spark-rapids/pull/7747)|batched full join tracking batch does not need to be lazy| -|[#7758](https://github.com/NVIDIA/spark-rapids/pull/7758)|Hardcode python 3.8 to be used in databricks runtime for cudf_udf ENV| -|[#7716](https://github.com/NVIDIA/spark-rapids/pull/7716)|Clean the code of `GpuMetrics`| -|[#7746](https://github.com/NVIDIA/spark-rapids/pull/7746)|Merge branch-23.02 into branch-23.04 [skip ci]| -|[#7740](https://github.com/NVIDIA/spark-rapids/pull/7740)|Revert 7737 workaround for cudf setup in databricks 11.3 runtime [skip ci]| -|[#7737](https://github.com/NVIDIA/spark-rapids/pull/7737)|Workaround for cudf setup in databricks 11.3 runtime| -|[#7734](https://github.com/NVIDIA/spark-rapids/pull/7734)|Temporarily skip the test_parquet_read_ignore_missing on Databricks| -|[#7728](https://github.com/NVIDIA/spark-rapids/pull/7728)|Fix estimatedNumBatches in case of OOM for Full Outer Join| -|[#7718](https://github.com/NVIDIA/spark-rapids/pull/7718)|GpuParquetScan fails with NullPointerException during combining| -|[#7712](https://github.com/NVIDIA/spark-rapids/pull/7712)|Enable Dynamic FIle Pruning on| -|[#7702](https://github.com/NVIDIA/spark-rapids/pull/7702)|Merge 23.02 into 23.04| -|[#7572](https://github.com/NVIDIA/spark-rapids/pull/7572)|Enables spillable/unspillable state for RapidsBuffer and allow buffer sharing| -|[#7687](https://github.com/NVIDIA/spark-rapids/pull/7687)|Fix window tests for Spark-3.4| -|[#7667](https://github.com/NVIDIA/spark-rapids/pull/7667)|Reenable tests originally bypassed for 3.4| -|[#7542](https://github.com/NVIDIA/spark-rapids/pull/7542)|Support WriteFilesExec in Spark-3.4 to fix several tests| -|[#7673](https://github.com/NVIDIA/spark-rapids/pull/7673)|Add missing spark shim test suites | -|[#7655](https://github.com/NVIDIA/spark-rapids/pull/7655)|Fix Spark 3.4 build| -|[#7621](https://github.com/NVIDIA/spark-rapids/pull/7621)|Document GNU sed for macOS auto-copyrighter users [skip ci]| -|[#7618](https://github.com/NVIDIA/spark-rapids/pull/7618)|Update JNI to 23.04.0-SNAPSHOT and update new delta-stub ver to 23.04| -|[#7541](https://github.com/NVIDIA/spark-rapids/pull/7541)|Init version 23.04.0-SNAPSHOT| - -## Release 23.02 - -### Features -||| -|:---|:---| -|[#6420](https://github.com/NVIDIA/spark-rapids/issues/6420)|[FEA]Support HiveTableScanExec to scan a Hive text table| -|[#4897](https://github.com/NVIDIA/spark-rapids/issues/4897)|Profiling tool: create a section to focus on I/O metrics| -|[#6419](https://github.com/NVIDIA/spark-rapids/issues/6419)|[FEA] Support write a Hive text table | -|[#7280](https://github.com/NVIDIA/spark-rapids/issues/7280)|[FEA] Support UpdateCommand for Delta Lake| -|[#7281](https://github.com/NVIDIA/spark-rapids/issues/7281)|[FEA] Support DeleteCommand for Delta Lake| -|[#5272](https://github.com/NVIDIA/spark-rapids/issues/5272)|[FEA] Support from_json to get a MapType| -|[#7007](https://github.com/NVIDIA/spark-rapids/issues/7007)|[FEA] Support Delta table MERGE INTO on Databricks.| -|[#7521](https://github.com/NVIDIA/spark-rapids/issues/7521)|[FEA] Allow user to set concurrentGpuTasks after startup| -|[#3300](https://github.com/NVIDIA/spark-rapids/issues/3300)|[FEA] Support batched full join| -|[#6698](https://github.com/NVIDIA/spark-rapids/issues/6698)|[FEA] Support json_tuple| -|[#6885](https://github.com/NVIDIA/spark-rapids/issues/6885)|[FEA] Support reverse| -|[#6879](https://github.com/NVIDIA/spark-rapids/issues/6879)|[FEA] Support Databricks 11.3 ML LTS| - -### Performance -||| -|:---|:---| -|[#7436](https://github.com/NVIDIA/spark-rapids/issues/7436)|[FEA] Pruning partition columns supports cases of GPU file scan with CPU project and filter| -|[#7219](https://github.com/NVIDIA/spark-rapids/issues/7219)|Improve performance of small file parquet reads from blobstores| -|[#6807](https://github.com/NVIDIA/spark-rapids/issues/6807)|Improve the current documentation on RapidsShuffleManager| -|[#5039](https://github.com/NVIDIA/spark-rapids/issues/5039)|[FEA] Parallelize shuffle compress/decompress with opportunistic idle task threads| -|[#7196](https://github.com/NVIDIA/spark-rapids/issues/7196)|RegExpExtract does both extract and contains_re which is inefficient| -|[#6862](https://github.com/NVIDIA/spark-rapids/issues/6862)|[FEA] GpuRowToColumnarExec for Binary is really slow compared to string| - -### Bugs Fixed -||| -|:---|:---| -|[#7069](https://github.com/NVIDIA/spark-rapids/issues/7069)|[BUG] GPU Hive Text Reader reads empty strings as null| -|[#7068](https://github.com/NVIDIA/spark-rapids/issues/7068)|[BUG] GPU Hive Text Reader skips empty lines| -|[#7448](https://github.com/NVIDIA/spark-rapids/issues/7448)|[BUG] GDS cufile test failed in elder cuda runtime| -|[#7686](https://github.com/NVIDIA/spark-rapids/issues/7686)|[BUG] Large floating point values written as `Inf` not `Infinity` in Hive text writer| -|[#7703](https://github.com/NVIDIA/spark-rapids/issues/7703)|[BUG] test_basic_hive_text_write fails| -|[#7693](https://github.com/NVIDIA/spark-rapids/issues/7693)|[BUG] `test_partitioned_sql_parquet_write` fails on CDH| -|[#7382](https://github.com/NVIDIA/spark-rapids/issues/7382)|[BUG] add dynamic partition overwrite tests for all formats| -|[#7597](https://github.com/NVIDIA/spark-rapids/issues/7597)|[BUG] Incompatible timestamps in Hive delimited text writes| -|[#7675](https://github.com/NVIDIA/spark-rapids/issues/7675)|[BUG] Multi-threaded shuffle bails with division-by-zero ArithmeticException| -|[#7530](https://github.com/NVIDIA/spark-rapids/issues/7530)|[BUG] Add tests for RapidsShuffleManager| -|[#7679](https://github.com/NVIDIA/spark-rapids/issues/7679)|[BUG] test_partitioned_parquet_write[PartitionWriteMode.Dynamic] failed in databricks runtimes| -|[#7637](https://github.com/NVIDIA/spark-rapids/issues/7637)|[BUG] GpuBroadcastNestedLoopJoinExec: Close called too many times| -|[#7595](https://github.com/NVIDIA/spark-rapids/issues/7595)|[BUG] test_mod_mixed decimal test fails on 330db (Databricks 11.3) and TBD 340| -|[#7575](https://github.com/NVIDIA/spark-rapids/issues/7575)|[BUG] On Databricks 11.3, Executor broadcast shuffles should stay on GPU even without columnar children | -|[#7607](https://github.com/NVIDIA/spark-rapids/issues/7607)|[BUG] RegularExpressionTranspilerSuite exhibits multiple failures| -|[#7574](https://github.com/NVIDIA/spark-rapids/issues/7574)|[BUG] simple json_tuple test failing with cudf error| -|[#7446](https://github.com/NVIDIA/spark-rapids/issues/7446)|[BUG] prune_partition_column_test fails for Json in UCX CI| -|[#7603](https://github.com/NVIDIA/spark-rapids/issues/7603)|[BUG] GpuIntervalUtilsTest leaks memory| -|[#7090](https://github.com/NVIDIA/spark-rapids/issues/7090)|[BUG] Refactor line terminator handling code| -|[#7472](https://github.com/NVIDIA/spark-rapids/issues/7472)|[BUG] The parquet chunked reader can fail for certain list cases.| -|[#7560](https://github.com/NVIDIA/spark-rapids/issues/7560)|[BUG] Outstanding allocations detected at shutdown in python integration tests| -|[#7516](https://github.com/NVIDIA/spark-rapids/issues/7516)|[BUG] multiple join cases cpu and gpu outputs mismatched in yarn| -|[#7537](https://github.com/NVIDIA/spark-rapids/issues/7537)|[AUDIT] [SPARK-42039][SQL] SPJ: Remove Option in KeyGroupedPartitioning#partitionValuesOpt| -|[#7535](https://github.com/NVIDIA/spark-rapids/issues/7535)|[BUG] Timestamp test cases failed due to Python time zone is not UTC| -|[#7432](https://github.com/NVIDIA/spark-rapids/issues/7432)|[BUG][SPARK-41713][SPARK-41726] Spark 3.4 build fails.| -|[#7517](https://github.com/NVIDIA/spark-rapids/issues/7517)|[DOC] doc/source of spark330db Shuffle Manager for Databricks-11.3 shim| -|[#7505](https://github.com/NVIDIA/spark-rapids/issues/7505)|[BUG] Delta Lake writes with AQE can have unnecessary row transitions | -|[#7454](https://github.com/NVIDIA/spark-rapids/issues/7454)|[BUG] Investigate binaryop.cpp:205: Unsupported operator for these types on Databricks 11.3| -|[#7469](https://github.com/NVIDIA/spark-rapids/issues/7469)|[BUG] test_arrays_zip failures on nightly | -|[#6894](https://github.com/NVIDIA/spark-rapids/issues/6894)|[BUG] Multithreaded rapids shuffle metrics incorrect| -|[#7325](https://github.com/NVIDIA/spark-rapids/issues/7325)|[BUG] Fix integration test failures on Databricks-11.3| -|[#7348](https://github.com/NVIDIA/spark-rapids/issues/7348)|[BUG] Fix integration tests for decimal type on db330 shim| -|[#6978](https://github.com/NVIDIA/spark-rapids/issues/6978)|[BUG] NDS query 16 fails on EMR 6.8.0 with AQE enabled| -|[#7133](https://github.com/NVIDIA/spark-rapids/issues/7133)|[BUG] Followup to AQE issues with reused columnar broadcast exchanges| -|[#7208](https://github.com/NVIDIA/spark-rapids/issues/7208)|[BUG] Explicitly check if the platform is supported| -|[#7397](https://github.com/NVIDIA/spark-rapids/issues/7397)|[BUG] shuffle smoke test hash_aggregate_test.py::test_hash_grpby_sum failed OOM intermittently in premerge| -|[#7443](https://github.com/NVIDIA/spark-rapids/issues/7443)|[BUG] prune_partition_column_test fails for Avro| -|[#7415](https://github.com/NVIDIA/spark-rapids/issues/7415)|[BUG] regex integration test failures on Databricks-11.3| -|[#7226](https://github.com/NVIDIA/spark-rapids/issues/7226)|[BUG] GpuFileSourceScanExec always generates all partition columns| -|[#7402](https://github.com/NVIDIA/spark-rapids/issues/7402)|[BUG] array_test.py::test_array_intersect failing cloudera| -|[#7324](https://github.com/NVIDIA/spark-rapids/issues/7324)|[BUG] Support DayTimeIntervalType in Databricks-11.3 to fix integration tests| -|[#7426](https://github.com/NVIDIA/spark-rapids/issues/7426)|[BUG] bloop compile times out on Databricks 11.3| -|[#7328](https://github.com/NVIDIA/spark-rapids/issues/7328)|[BUG] Databricks-11.3 integration test failing due to IllegalStateException: the broadcast must be on the GPU too| -|[#7327](https://github.com/NVIDIA/spark-rapids/issues/7327)|[BUG] Update Exception to fix Assertion error in Databricks-11.3 integration tests.| -|[#7403](https://github.com/NVIDIA/spark-rapids/issues/7403)|[BUG] test_dynamic_partition_write_round_trip failed in CDH tests| -|[#7368](https://github.com/NVIDIA/spark-rapids/issues/7368)|[BUG] with_hidden_metadata_fallback test failures on Databricks 11.3| -|[#7400](https://github.com/NVIDIA/spark-rapids/issues/7400)|[BUG] parquet multithreaded combine reader can calculate size wrong| -|[#7383](https://github.com/NVIDIA/spark-rapids/issues/7383)|[BUG] hive text partitioned reads using the `GpuHiveTableScanExec` are broken| -|[#7350](https://github.com/NVIDIA/spark-rapids/issues/7350)|[BUG] test_conditional_with_side_effects_case_when fails| -|[#7373](https://github.com/NVIDIA/spark-rapids/issues/7373)|[BUG] RapidsUDF does not support UDFs with no inputs| -|[#7213](https://github.com/NVIDIA/spark-rapids/issues/7213)|[BUG] Parquet unsigned int scan test failure| -|[#7344](https://github.com/NVIDIA/spark-rapids/issues/7344)|[BUG] Add PythonMapInArrowExec in 330db shim to fix integration test.| -|[#7367](https://github.com/NVIDIA/spark-rapids/issues/7367)|[BUG] test_re_replace_repetition failed| -|[#7345](https://github.com/NVIDIA/spark-rapids/issues/7345)|[BUG] Integration tests failing on Databricks-11.3 due to mixing of aggregations in HashAggregateExec and SortAggregateExec | -|[#7378](https://github.com/NVIDIA/spark-rapids/issues/7378)|[BUG][ORC] `GpuInsertIntoHadoopFsRelationCommand` should use staging directory for dynamic partition overwrite| -|[#7374](https://github.com/NVIDIA/spark-rapids/issues/7374)|[BUG] Hive reader does not always fall back to CPU when table contains nested types| -|[#7284](https://github.com/NVIDIA/spark-rapids/issues/7284)|[BUG] Generated supported data source file is inaccurate for write data formats| -|[#7347](https://github.com/NVIDIA/spark-rapids/issues/7347)|[BUG] Fix integration tests in Databricks-11.3 runtime by removing config spark.sql.ansi.strictIndexOperator| -|[#7326](https://github.com/NVIDIA/spark-rapids/issues/7326)|[BUG] Support RoundCeil and RoundFloor on Databricks-11.3 shim to fix integration tests.| -|[#7352](https://github.com/NVIDIA/spark-rapids/issues/7352)|[BUG] Databricks-11.3 IT failures - IllegalArgumentException: requirement failed for complexTypeExtractors| -|[#6285](https://github.com/NVIDIA/spark-rapids/issues/6285)|[BUG] Add null values back to test_array_intersect for Spark 3.3+ and Databricks 10.4+| -|[#7329](https://github.com/NVIDIA/spark-rapids/issues/7329)|[BUG] intermittently could not find ExecutedCommandExec in the GPU plan in delta_lake_write test| -|[#7184](https://github.com/NVIDIA/spark-rapids/issues/7184)|[BUG] Fix integration test failures on Databricks 11.3| -|[#7225](https://github.com/NVIDIA/spark-rapids/issues/7225)|[BUG] Partition columns mishandled when Parquet read is coalesced and chunked| -|[#7303](https://github.com/NVIDIA/spark-rapids/issues/7303)|[BUG] Fix CPU fallback for custom timestamp formats in Hive delimited text| -|[#7086](https://github.com/NVIDIA/spark-rapids/issues/7086)|[BUG] GPU Hive delimited text reader is more permissive than `LazySimpleSerDe` for timestamps| -|[#7089](https://github.com/NVIDIA/spark-rapids/issues/7089)|[BUG] Reading invalid `DATE` strings yields exceptions instead of nulls| -|[#6047](https://github.com/NVIDIA/spark-rapids/issues/6047)|[BUG] Look into field IDs when reading parquet using the native footer parser| -|[#6989](https://github.com/NVIDIA/spark-rapids/issues/6989)|[BUG] Spark-3.4 - Integration test failures in array_test| -|[#7122](https://github.com/NVIDIA/spark-rapids/issues/7122)|[BUG] Some tests of `limit_test` fail on Spark 3.4| -|[#7144](https://github.com/NVIDIA/spark-rapids/issues/7144)|[BUG] Spark-3.4 integration test failures due to code update in FileFormatWriter.| -|[#6915](https://github.com/NVIDIA/spark-rapids/issues/6915)|[BUG] Parquet of a binary decimal is not supported| - -### PRs -||| -|:---|:---| -|[#7763](https://github.com/NVIDIA/spark-rapids/pull/7763)|23.02 changelog update 2/14 [skip ci]| -|[#7761](https://github.com/NVIDIA/spark-rapids/pull/7761)|[Doc] remove xgboost demo from aws-emr doc due to nccl issue [skip ci]| -|[#7760](https://github.com/NVIDIA/spark-rapids/pull/7760)|Add notice in gds to install cuda 11.8 [skip ci]| -|[#7570](https://github.com/NVIDIA/spark-rapids/pull/7570)|[Doc] 23.02 doc updates [skip ci]| -|[#7735](https://github.com/NVIDIA/spark-rapids/pull/7735)|Update JNI version to released 23.02.0| -|[#7721](https://github.com/NVIDIA/spark-rapids/pull/7721)|Fix issue where UCX mode was trying to use catalog from the driver| -|[#7713](https://github.com/NVIDIA/spark-rapids/pull/7713)|Workaround for incompatible serialization for `Inf` floats in Hive text write| -|[#7700](https://github.com/NVIDIA/spark-rapids/pull/7700)|Init 23.02 changelog and move 22 changelog to archives [skip ci]| -|[#7708](https://github.com/NVIDIA/spark-rapids/pull/7708)|Set Hive write test to ignore order on verification.| -|[#7697](https://github.com/NVIDIA/spark-rapids/pull/7697)|Disable Dynamic partitioning tests on CDH| -|[#7556](https://github.com/NVIDIA/spark-rapids/pull/7556)|Write support for Hive delimited text tables| -|[#7681](https://github.com/NVIDIA/spark-rapids/pull/7681)|Disable Hive delimited text reader tests for CDH.| -|[#7610](https://github.com/NVIDIA/spark-rapids/pull/7610)|Add multithreaded Shuffle test| -|[#7652](https://github.com/NVIDIA/spark-rapids/pull/7652)|Support Delta Lake UpdateCommand| -|[#7680](https://github.com/NVIDIA/spark-rapids/pull/7680)|Fix Parquet dynamic partition test for| -|[#7653](https://github.com/NVIDIA/spark-rapids/pull/7653)|Add dynamic partitioning test for Parquet writer.| -|[#7576](https://github.com/NVIDIA/spark-rapids/pull/7576)|Support EXECUTOR_BROADCAST on Databricks 11.3 in BroadcastNestedLoopJoin| -|[#7620](https://github.com/NVIDIA/spark-rapids/pull/7620)|Support Delta Lake DeleteCommand| -|[#7638](https://github.com/NVIDIA/spark-rapids/pull/7638)|Update UCX docs to call out Ubuntu 22.04 is not supported yet [skip ci]| -|[#7644](https://github.com/NVIDIA/spark-rapids/pull/7644)|Create a new ColumnarBatch from the broadcast when LazySpillable takes ownership| -|[#7632](https://github.com/NVIDIA/spark-rapids/pull/7632)|Update UCX shuffle doc w/ memlock limit config [skip ci]| -|[#7628](https://github.com/NVIDIA/spark-rapids/pull/7628)|Disable HiveTextReaders for CDH due to NVIDIA#7423| -|[#7633](https://github.com/NVIDIA/spark-rapids/pull/7633)|Bump up add-to-project version to 0.4.0 [skip ci]| -|[#7609](https://github.com/NVIDIA/spark-rapids/pull/7609)|Fallback to CPU for mod only for DB11.3 [Databricks]| -|[#7591](https://github.com/NVIDIA/spark-rapids/pull/7591)|Prevent fixup of GpuShuffleExchangeExec when using EXECUTOR_BROADCAST| -|[#7626](https://github.com/NVIDIA/spark-rapids/pull/7626)|Revert "Skip/xfail some regexp tests (#7608)"| -|[#7598](https://github.com/NVIDIA/spark-rapids/pull/7598)|Disable decimal `pmod` due to #7553| -|[#7571](https://github.com/NVIDIA/spark-rapids/pull/7571)|Refact deploy script to support gpg and nvsec signature [skip ci]| -|[#7590](https://github.com/NVIDIA/spark-rapids/pull/7590)|Fix `json_tuple` cudf error and java array out-of-bound issue| -|[#7604](https://github.com/NVIDIA/spark-rapids/pull/7604)|Fix memory leak in GpuIntervalUtilsTest| -|[#7600](https://github.com/NVIDIA/spark-rapids/pull/7600)|Centralize source-related properties in parent pom for consistent usage in submodules| -|[#7608](https://github.com/NVIDIA/spark-rapids/pull/7608)|Skip/xfail some regexp tests| -|[#7211](https://github.com/NVIDIA/spark-rapids/pull/7211)|Fix regressions related to cuDF changes in handline of end-of-line/string anchors| -|[#7596](https://github.com/NVIDIA/spark-rapids/pull/7596)|Fix deprecation warnings| -|[#7578](https://github.com/NVIDIA/spark-rapids/pull/7578)|Fix double close on exception in GpuCoalesceBatches| -|[#7584](https://github.com/NVIDIA/spark-rapids/pull/7584)|Write test for multithreaded combine wrong buffer size bug | -|[#7580](https://github.com/NVIDIA/spark-rapids/pull/7580)|Support Delta Lake MergeIntoCommand| -|[#7554](https://github.com/NVIDIA/spark-rapids/pull/7554)|Add mixed decimal testing for binary arithmetic ops| -|[#7567](https://github.com/NVIDIA/spark-rapids/pull/7567)|Fix a small leak in generate| -|[#7563](https://github.com/NVIDIA/spark-rapids/pull/7563)|Add patched Hive Metastore Client jar to deps| -|[#7533](https://github.com/NVIDIA/spark-rapids/pull/7533)|Support EXECUTOR_BROADCAST on Databricks 11.3 in BroadcastHashJoin| -|[#7532](https://github.com/NVIDIA/spark-rapids/pull/7532)|Update Databricks docs to add a limitation against DB 11.3 [skip ci]| -|[#7555](https://github.com/NVIDIA/spark-rapids/pull/7555)|Fix case where tracking batch is never updated in batched full join| -|[#7547](https://github.com/NVIDIA/spark-rapids/pull/7547)|Refactor Delta Lake code to handle multiple versions per Spark version| -|[#7513](https://github.com/NVIDIA/spark-rapids/pull/7513)|Remove usage to `ColumnView.repeatStringsSizes`| -|[#7538](https://github.com/NVIDIA/spark-rapids/pull/7538)|Fix Spark 340 build error due to change in KeyGroupedPartitioning| -|[#7549](https://github.com/NVIDIA/spark-rapids/pull/7549)|Remove unused Maven property shim.module.name| -|[#7548](https://github.com/NVIDIA/spark-rapids/pull/7548)|Make RapidsBufferHandle AutoCloseable to prevent extra attempts to remove buffers| -|[#7499](https://github.com/NVIDIA/spark-rapids/pull/7499)|README.md for auditing purposes [skip ci]| -|[#7512](https://github.com/NVIDIA/spark-rapids/pull/7512)|Adds RapidsBufferHandle as an indirection layer to RapidsBufferId| -|[#7527](https://github.com/NVIDIA/spark-rapids/pull/7527)|Allow concurrentGpuTasks to be set per job| -|[#7539](https://github.com/NVIDIA/spark-rapids/pull/7539)|Enable automerge from 23.02 to 23.04 [skip ci]| -|[#7536](https://github.com/NVIDIA/spark-rapids/pull/7536)|Fix datetime out-of-range error in pytest when timezone is not UTC| -|[#7502](https://github.com/NVIDIA/spark-rapids/pull/7502)|Fix Spark 3.4 build errors| -|[#7522](https://github.com/NVIDIA/spark-rapids/pull/7522)|Update docs and add Rapids shuffle manager for Databricks-11.3| -|[#7507](https://github.com/NVIDIA/spark-rapids/pull/7507)|Fallback to CPU for unrecognized Distributions| -|[#7515](https://github.com/NVIDIA/spark-rapids/pull/7515)|Fix leak in GpuHiveTableScanExec| -|[#7504](https://github.com/NVIDIA/spark-rapids/pull/7504)|Align CI test scripts with new init scripts for Databricks [skip ci]| -|[#7506](https://github.com/NVIDIA/spark-rapids/pull/7506)|Add RapidsDeltaWrite node to fix undesired transitions with AQE| -|[#7414](https://github.com/NVIDIA/spark-rapids/pull/7414)|batched full hash join| -|[#7509](https://github.com/NVIDIA/spark-rapids/pull/7509)|Fix json_test.py imports| -|[#7269](https://github.com/NVIDIA/spark-rapids/pull/7269)|Add hadoop-def.sh to support multiple spark release tarballs| -|[#7494](https://github.com/NVIDIA/spark-rapids/pull/7494)|Implement a simplified version of `from_json`| -|[#7434](https://github.com/NVIDIA/spark-rapids/pull/7434)|Support `json_tuple`| -|[#7489](https://github.com/NVIDIA/spark-rapids/pull/7489)|Remove the MIT license from tools jar[skip ci]| -|[#7497](https://github.com/NVIDIA/spark-rapids/pull/7497)|Inserting multiple times to HashedPriorityQueue should not corrupt the heap| -|[#7475](https://github.com/NVIDIA/spark-rapids/pull/7475)|Inject GpuCast for decimal AddSub when operands' precision/scale differ on 340, 330db| -|[#7486](https://github.com/NVIDIA/spark-rapids/pull/7486)|Allow Shims to replace Hive execs| -|[#7484](https://github.com/NVIDIA/spark-rapids/pull/7484)|Fix arrays_zip to not rely on broken segmented gather| -|[#7460](https://github.com/NVIDIA/spark-rapids/pull/7460)|More cases support partition column pruning| -|[#7462](https://github.com/NVIDIA/spark-rapids/pull/7462)|Changing metric component counters to avoid extraneous accruals| -|[#7467](https://github.com/NVIDIA/spark-rapids/pull/7467)|Remove spark2-sql-plugin| -|[#7474](https://github.com/NVIDIA/spark-rapids/pull/7474)|xfail array zip tests| -|[#7464](https://github.com/NVIDIA/spark-rapids/pull/7464)|Enable integration tests against Databricks 11.3 in premerge| -|[#7455](https://github.com/NVIDIA/spark-rapids/pull/7455)|Use GpuAlias when handling Empty2Null in GpuOptimisticTransaction| -|[#7456](https://github.com/NVIDIA/spark-rapids/pull/7456)|Sort Delta log objects when comparing and avoid caching all logs| -|[#7431](https://github.com/NVIDIA/spark-rapids/pull/7431)|Remove release script of spark-rapids/tools [skip ci]| -|[#7421](https://github.com/NVIDIA/spark-rapids/pull/7421)|Remove spark-rapids/tools| -|[#7418](https://github.com/NVIDIA/spark-rapids/pull/7418)|Fix for AQE+DPP issue on AWS EMR| -|[#7444](https://github.com/NVIDIA/spark-rapids/pull/7444)|Explicitly check if the platform is supported| -|[#7417](https://github.com/NVIDIA/spark-rapids/pull/7417)|Make cudf-udf tests runnable on Databricks 11.3| -|[#7420](https://github.com/NVIDIA/spark-rapids/pull/7420)|Ubuntu build&test images default as 20.04| -|[#7442](https://github.com/NVIDIA/spark-rapids/pull/7442)|Fix parsing of Delta Lake logs containing multi-line JSON records| -|[#7438](https://github.com/NVIDIA/spark-rapids/pull/7438)|Add documentation for runnable command enable configs| -|[#7439](https://github.com/NVIDIA/spark-rapids/pull/7439)|Suppress unknown RunnableCommand warnings by default| -|[#7346](https://github.com/NVIDIA/spark-rapids/pull/7346)|[Doc]revert the changes in FAQ for deltalake table support[skip ci]| -|[#7413](https://github.com/NVIDIA/spark-rapids/pull/7413)|[Doc]update getting started doc for EMR and databricks[skip ci]| -|[#7445](https://github.com/NVIDIA/spark-rapids/pull/7445)|Support pruning partition columns for avro file scan| -|[#7447](https://github.com/NVIDIA/spark-rapids/pull/7447)|Xfail the test of pruning partition column for json read| -|[#7440](https://github.com/NVIDIA/spark-rapids/pull/7440)|Xfail largest decimals window aggregation| -|[#7437](https://github.com/NVIDIA/spark-rapids/pull/7437)|Fix the regexp test failures on DB11.3| -|[#7428](https://github.com/NVIDIA/spark-rapids/pull/7428)|Support pruning partition columns for GpuFileSourceScan| -|[#7435](https://github.com/NVIDIA/spark-rapids/pull/7435)|Update IntelliJ IDEA doc [skip ci]| -|[#7416](https://github.com/NVIDIA/spark-rapids/pull/7416)|Reorganize and shim ScanExecMeta overrides and fix interval file IO| -|[#7427](https://github.com/NVIDIA/spark-rapids/pull/7427)|Install-file log4j-core on Databricks 11.3| -|[#7424](https://github.com/NVIDIA/spark-rapids/pull/7424)|xfail Hive text tests failing on CDH| -|[#7408](https://github.com/NVIDIA/spark-rapids/pull/7408)|Fallback to CPU for unrecognized ShuffleOrigin| -|[#7422](https://github.com/NVIDIA/spark-rapids/pull/7422)|Skip test_dynamic_partition_write_round_trip in 321cdh| -|[#7406](https://github.com/NVIDIA/spark-rapids/pull/7406)|Fix array_test.py::test_array_intersect for cloudera spark330| -|[#6761](https://github.com/NVIDIA/spark-rapids/pull/6761)|Switch string to float casting to use new kernel| -|[#7411](https://github.com/NVIDIA/spark-rapids/pull/7411)|Skip Int division test that causes scale less than precision| -|[#7410](https://github.com/NVIDIA/spark-rapids/pull/7410)|Remove 314 from dist build list| -|[#7412](https://github.com/NVIDIA/spark-rapids/pull/7412)|Fix the `with_hidden_metadata_fallback` test failures on DB11.3| -|[#7362](https://github.com/NVIDIA/spark-rapids/pull/7362)|Fix multiplication and division test failures in 330db and 340 shim| -|[#7405](https://github.com/NVIDIA/spark-rapids/pull/7405)|Fix multithreaded combine code initial size calculation| -|[#7395](https://github.com/NVIDIA/spark-rapids/pull/7395)|Enable Delta Lake write acceleration by default| -|[#7384](https://github.com/NVIDIA/spark-rapids/pull/7384)|Fix implementation of createReadRDDForDirectories to match DataSource…| -|[#7391](https://github.com/NVIDIA/spark-rapids/pull/7391)|Exclude GDS test suite as default| -|[#7390](https://github.com/NVIDIA/spark-rapids/pull/7390)|Aggregate Databricks 11.3 shim in the nightly dist jar| -|[#7381](https://github.com/NVIDIA/spark-rapids/pull/7381)|Filter out some new timestamp-related Delta Lake tags when comparing logs| -|[#7371](https://github.com/NVIDIA/spark-rapids/pull/7371)|Avoid shutting down RMM until all allocations have cleared| -|[#7377](https://github.com/NVIDIA/spark-rapids/pull/7377)|Update RapidsUDF interface to support UDFs with no input parameters| -|[#7388](https://github.com/NVIDIA/spark-rapids/pull/7388)|Fix `test_array_element_at_zero_index_fail` and `test_div_overflow_exception_when_ansi` DB 11.3 integration test failures| -|[#7380](https://github.com/NVIDIA/spark-rapids/pull/7380)|[FEA] Support `reverse` for arrays| -|[#7365](https://github.com/NVIDIA/spark-rapids/pull/7365)|Move PythonMapInArrowExec to shim for shared 330+ functionality (db11.3)| -|[#7372](https://github.com/NVIDIA/spark-rapids/pull/7372)|Fix for incorrect nested-unsigned test| -|[#7386](https://github.com/NVIDIA/spark-rapids/pull/7386)|Spark 3.1.4 snapshot fix setting of reproduceEmptyStringBug| -|[#7385](https://github.com/NVIDIA/spark-rapids/pull/7385)|Fix Databricks version comparison in pytests| -|[#7379](https://github.com/NVIDIA/spark-rapids/pull/7379)|Fixes bug where dynamic partition overwrite mode didn't work for ORC| -|[#7370](https://github.com/NVIDIA/spark-rapids/pull/7370)|Fix non file read DayTimeInterval errors| -|[#7375](https://github.com/NVIDIA/spark-rapids/pull/7375)|Fix CPU fallback for GpuHiveTableScanExec| -|[#7355](https://github.com/NVIDIA/spark-rapids/pull/7355)|Fix Add and Subtract test failures in 330db and 340 shim| -|[#7299](https://github.com/NVIDIA/spark-rapids/pull/7299)|Qualification tool: Update parsing of write data format| -|[#7357](https://github.com/NVIDIA/spark-rapids/pull/7357)|Update the integration tests to fit the removed config ` spark.sql.ansi.strictIndexOperator` in DB11.3 and Spark 3.4| -|[#7369](https://github.com/NVIDIA/spark-rapids/pull/7369)|Re-enable some tests in ParseDateTimeSuite| -|[#7366](https://github.com/NVIDIA/spark-rapids/pull/7366)|Support Databricks 11.3| -|[#7354](https://github.com/NVIDIA/spark-rapids/pull/7354)|Fix arithmetic error messages in 330db, 340 shims| -|[#7364](https://github.com/NVIDIA/spark-rapids/pull/7364)|Move Rounding ops to shim for 330+ (db11.3)| -|[#7298](https://github.com/NVIDIA/spark-rapids/pull/7298)|Improve performance of small file parquet reads from blob stores| -|[#7363](https://github.com/NVIDIA/spark-rapids/pull/7363)|Fix ExtractValue assertion with 330db shim| -|[#7340](https://github.com/NVIDIA/spark-rapids/pull/7340)|Fix nested-unsigned test issues.| -|[#7358](https://github.com/NVIDIA/spark-rapids/pull/7358)|Update Delta version to 1.1.0| -|[#7301](https://github.com/NVIDIA/spark-rapids/pull/7301)|Add null values back to test_array_intersect for Spark 3.3.1+ and Databricks 10.4+| -|[#7152](https://github.com/NVIDIA/spark-rapids/pull/7152)|Add a shim for Databricks 11.3 spark330db| -|[#7333](https://github.com/NVIDIA/spark-rapids/pull/7333)|Avoid row number computation when the partition schema is empty| -|[#7317](https://github.com/NVIDIA/spark-rapids/pull/7317)|Make multi-threaded shuffle not experimental and update docs| -|[#7342](https://github.com/NVIDIA/spark-rapids/pull/7342)|Update plan capture listener to handle multiple plans per query.| -|[#7338](https://github.com/NVIDIA/spark-rapids/pull/7338)|Fix auto merge conflict 7336 [skip ci]| -|[#7335](https://github.com/NVIDIA/spark-rapids/pull/7335)|Fix auto merge conflict 7334[skip ci]| -|[#7312](https://github.com/NVIDIA/spark-rapids/pull/7312)|Fix documentation bug| -|[#7315](https://github.com/NVIDIA/spark-rapids/pull/7315)|Fix merge conflict with branch-22.12| -|[#7296](https://github.com/NVIDIA/spark-rapids/pull/7296)|Enable hive text reads by default| -|[#7304](https://github.com/NVIDIA/spark-rapids/pull/7304)|Correct partition columns handling for coalesced and chunked read| -|[#7305](https://github.com/NVIDIA/spark-rapids/pull/7305)|Fix CPU fallback for custom timestamp formats in Hive text tables| -|[#7293](https://github.com/NVIDIA/spark-rapids/pull/7293)|Refine '$LOCAL_JAR_PATH' as optional for integration test on databricks [skip ci]| -|[#7285](https://github.com/NVIDIA/spark-rapids/pull/7285)|[FEA] Support `reverse` for strings| -|[#7297](https://github.com/NVIDIA/spark-rapids/pull/7297)|Remove Decimal Support Section from compatibility docs [skip ci]| -|[#7291](https://github.com/NVIDIA/spark-rapids/pull/7291)|Improve IntelliJ IDEA doc and usability| -|[#7245](https://github.com/NVIDIA/spark-rapids/pull/7245)|Fix boolean, int, and float parsing. Improve decimal parsing for hive| -|[#7265](https://github.com/NVIDIA/spark-rapids/pull/7265)|Fix Hive Delimited Text timestamp parsing| -|[#7221](https://github.com/NVIDIA/spark-rapids/pull/7221)|Fix date parsing in Hive Delimited Text reader| -|[#7287](https://github.com/NVIDIA/spark-rapids/pull/7287)|Don't use native parquet footer parser if field ids for read are needed| -|[#7262](https://github.com/NVIDIA/spark-rapids/pull/7262)|Moving generated files to standalone tools dir| -|[#7268](https://github.com/NVIDIA/spark-rapids/pull/7268)|Remove deprecated compatibility support in premerge| -|[#7248](https://github.com/NVIDIA/spark-rapids/pull/7248)|Fix AlluxioUtilsSuite build on Databricks| -|[#7207](https://github.com/NVIDIA/spark-rapids/pull/7207)|Change the hive text file parser to not use CSV input format| -|[#7209](https://github.com/NVIDIA/spark-rapids/pull/7209)|Change RegExpExtract to use isNull checks instead of contains_re| -|[#7212](https://github.com/NVIDIA/spark-rapids/pull/7212)|Removing common module and adding common files to sql-plugin| -|[#7202](https://github.com/NVIDIA/spark-rapids/pull/7202)|Avoid sort in v1 write for static columns| -|[#7167](https://github.com/NVIDIA/spark-rapids/pull/7167)|Handle two changes related to `FileFormatWriter` since Spark 340| -|[#7194](https://github.com/NVIDIA/spark-rapids/pull/7194)|Skip tests that fail due to recent cuDF changes related to end of string/line anchors| -|[#7170](https://github.com/NVIDIA/spark-rapids/pull/7170)|Fix the `limit_test` failures on Spark 3.4| -|[#7075](https://github.com/NVIDIA/spark-rapids/pull/7075)|Fix the failure of `test_array_element_at_zero_index_fail` on Spark3.4| -|[#7126](https://github.com/NVIDIA/spark-rapids/pull/7126)|Fix support for binary encoded decimal for parquet| -|[#7113](https://github.com/NVIDIA/spark-rapids/pull/7113)|Use an improved API for appending binary to host vector| -|[#7130](https://github.com/NVIDIA/spark-rapids/pull/7130)|Enable chunked parquet reads by default| -|[#7074](https://github.com/NVIDIA/spark-rapids/pull/7074)|Update JNI and cudf-py version to 23.02| -|[#7063](https://github.com/NVIDIA/spark-rapids/pull/7063)|Init version 23.02.0| - -## Older Releases -Changelog of older releases can be found at [docs/archives](/docs/archives)