Skip to content

Commit

Permalink
array_intersect operand swapping (facebookincubator#10624)
Browse files Browse the repository at this point in the history
Summary:
Pull Request resolved: facebookincubator#10624

The order of elements in the result always depends on the order of the elements in the left operand. To save memory, Presto swaps the operands if the right operand is larger than the left since the right is made into a hash set. Presto does this by row level but we can only do it by the row vector level in Velox. This is to preserve the savings obtained by creating a constant set if one of the operands is constant.

A change will later be made to Presto by someone on their team to implement the same constant set logic.

Differential Revision: D60531033
  • Loading branch information
Daniel Hunte authored and facebook-github-bot committed Aug 1, 2024
1 parent 47bb048 commit 7166ba7
Show file tree
Hide file tree
Showing 2 changed files with 34 additions and 5 deletions.
29 changes: 29 additions & 0 deletions velox/functions/prestosql/ArrayIntersectExcept.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,11 @@ class ArrayIntersectExceptFunction : public exec::VectorFunction {
/// If the rhs values passed to either array_intersect() or array_except()
/// are constant (array literals) we create a set before instantiating the
/// object and pass as a constructor parameter (constantSet).
///
/// Smallest array optimization:
///
/// If the rhs values passed to array_intersect() are not constant we create
/// sets from whichever side has the smallest sum of lengths in the batch.

ArrayIntersectExceptFunction() = default;

Expand All @@ -134,6 +139,30 @@ class ArrayIntersectExceptFunction : public exec::VectorFunction {
BaseVector* right = args[1].get();

exec::LocalDecodedVector leftHolder(context, *left, rows);
exec::LocalDecodedVector rightHolder(context, *right, rows);

if constexpr (isIntersect) {
// Swap left and right if needed so that the right array has the smaller
// number of elements since the right will be made into a hash set.
vector_size_t leftSize = 0;
vector_size_t rightSize = 0;
const ArrayVector* leftArrayVector =
leftHolder.get()->base()->as<ArrayVector>();
const ArrayVector* rightArrayVector =
rightHolder.get()->base()->as<ArrayVector>();
rows.applyToSelected([&](vector_size_t row) {
vector_size_t leftidx = leftHolder.get()->index(row);
leftSize += leftArrayVector->sizeAt(leftidx);

vector_size_t rightidx = rightHolder.get()->index(row);
rightSize += rightArrayVector->sizeAt(rightidx);
});
if (leftSize < rightSize) {
std::swap(left, right);
std::swap(leftHolder, rightHolder);
}
}

auto decodedLeftArray = leftHolder.get();
auto baseLeftArray = decodedLeftArray->base()->as<ArrayVector>();

Expand Down
10 changes: 5 additions & 5 deletions velox/functions/prestosql/tests/ArrayIntersectTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -308,10 +308,10 @@ TEST_F(ArrayIntersectTest, deterministic) {
testExpr(expectedC0C1, "array_intersect(C0, C1)", {c0, c1});

// C1 then C0.
auto expectedC1C0 = makeNullableArrayVector<int32_t>({
{1, 4, -2},
{1, 4, -2},
});
// Since C1 has more elements, it should be swapped with C0 and
// the order of the result should be based on C1.
auto expectedC1C0 = expectedC0C1;

testExpr(expectedC1C0, "array_intersect(C1, C0)", {c0, c1});
testExpr(expectedC1C0, "array_intersect(ARRAY[1,4,-2], C0)", {c0});
}
Expand All @@ -326,6 +326,6 @@ TEST_F(ArrayIntersectTest, dictionaryEncodedElementsInConstant) {
auto expected = makeArrayVector<int32_t>({{1, 3}, {2}, {}});
testExpr(
expected,
"array_intersect(c0, testing_dictionary_array_elements(ARRAY [2, 2, 3, 1, 2, 2]))",
"array_intersect(testing_dictionary_array_elements(ARRAY [2, 2, 3, 1, 2, 2]), c0)",
{array});
}

0 comments on commit 7166ba7

Please sign in to comment.