Skip to content

Commit

Permalink
feat: Add Spark concat_ws function (#8854)
Browse files Browse the repository at this point in the history
Summary:
Add concat_ws Spark function which returns the concatenation for the
input, separated by a separator (the first argument). It allows variable
number of VARCHAR or ARRAY\<VARCHAR\> arguments. And these two
types can be used in combination.

This function is a bit similar to [ConcatFunction](https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/StringFunctions.cpp#L140), except that `concat_ws`
requires separator and supports using ARRAY<VARCHAR> type and mixed types.

This PR is based on #6292 (author: unigof). There are a few bug fixes
and improvements. Also made some changes to align with Spark.

Doc [link](https://docs.databricks.com/en/sql/language-manual/functions/concat_ws.html).

Pull Request resolved: #8854

Reviewed By: kgpai

Differential Revision: D66898251

Pulled By: bikramSingh91

fbshipit-source-id: 1fcd193a245bea4062c4e20d1e1db9ad6cc3290b
  • Loading branch information
PHILO-HE authored and facebook-github-bot committed Dec 13, 2024
1 parent aa59678 commit bddddf8
Show file tree
Hide file tree
Showing 8 changed files with 771 additions and 3 deletions.
19 changes: 19 additions & 0 deletions velox/docs/functions/spark/string.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,25 @@ String Functions
If ``n < 0``, the result is an empty string.
If ``n >= 256``, the result is equivalent to chr(``n % 256``).

.. spark:function:: concat_ws(separator, [string/array<string>], ...) -> varchar
Returns the concatenation result for ``string`` and all elements in ``array<string>``, separated
by ``separator``. The first argument is ``separator`` whose type is VARCHAR. Then, this function
can take variable number of remaining arguments , and it allows mixed use of ``string`` type and
``array<string>`` type. Skips NULL argument or NULL array element during the concatenation. If
``separator`` is NULL, returns NULL, regardless of the following inputs. For non-NULL ``separator``,
if no remaining input exists or all remaining inputs are NULL, returns an empty string. ::

SELECT concat_ws('~', 'a', 'b', 'c'); -- 'a~b~c'
SELECT concat_ws('~', ['a', 'b', 'c'], ['d']); -- 'a~b~c~d'
SELECT concat_ws('~', 'a', ['b', 'c']); -- 'a~b~c'
SELECT concat_ws('~', '', [''], ['a', '']); -- '~~a~'
SELECT concat_ws(NULL, 'a'); -- NULL
SELECT concat_ws('~'); -- ''
SELECT concat_ws('~', NULL, [NULL], 'a', 'b'); -- 'a~b'
SELECT concat_ws('~', NULL, NULL); -- ''
SELECT concat_ws('~', [NULL]); -- ''

.. spark:function:: contains(left, right) -> boolean
Returns true if 'right' is found in 'left'. Otherwise, returns false. ::
Expand Down
18 changes: 15 additions & 3 deletions velox/expression/fuzzer/ExpressionFuzzer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -354,11 +354,23 @@ static void appendSpecialForms(
},
{
"cast",
/// TODO: Add supported Cast signatures to CastTypedExpr and
/// expose
/// them to fuzzer instead of hard-coding signatures here.
// TODO: Add supported Cast signatures to CastTypedExpr and
// expose
// them to fuzzer instead of hard-coding signatures here.
getSignaturesForCast(),
},
{
// For Spark SQL only.
"concat_ws",
std::vector<facebook::velox::exec::FunctionSignaturePtr>{
// Signature: concat_ws (separator, input, ...) -> output:
// varchar, varchar, varchar, ... -> varchar
facebook::velox::exec::FunctionSignatureBuilder()
.argumentType("varchar")
.variableArity("varchar")
.returnType("varchar")
.build()},
},
};

auto specialFormNames = splitNames(specialForms);
Expand Down
1 change: 1 addition & 0 deletions velox/functions/sparksql/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ velox_add_library(
ArrayGetFunction.cpp
ArraySort.cpp
Comparisons.cpp
ConcatWs.cpp
DecimalArithmetic.cpp
DecimalCompare.cpp
Hash.cpp
Expand Down
Loading

0 comments on commit bddddf8

Please sign in to comment.