Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding decimal support for min() and max() functions #9005

Conversation

minhancao
Copy link
Contributor

@minhancao minhancao commented Mar 7, 2024

Delivers #9004

This PR is for adding decimal support for min() and max() functions.

presto-cli output:

presto:tpch> SELECT MIN(Col,2) FROM (VALUES cast(0.82 as decimal(5,4)), cast(2.333 as decimal(5,4)), cast(3.132 as decimal(5,4)), cast(4.344 as decimal(5,4))) AS X(Col);
      _col0       
------------------
 [0.8200, 2.3330] 
(1 row)
presto:tpch> SELECT MIN(Col,3) FROM (VALUES cast(0.82 as decimal(5,4)), cast(2.333 as decimal(5,4)), cast(3.132 as decimal(5,4)), cast(4.344 as decimal(5,4))) AS X(Col);
          _col0           
--------------------------
 [0.8200, 2.3330, 3.1320] 
(1 row)
presto:tpch> SELECT MAX(Col,2) FROM (VALUES cast(0.82 as decimal(5,4)), cast(2.333 as decimal(5,4)), cast(3.132 as decimal(5,4)), cast(4.344 as decimal(5,4))) AS X(Col);
      _col0       
------------------
 [4.3440, 3.1320] 
(1 row)
presto:tpch> SELECT MAX(Col,3) FROM (VALUES cast(0.82 as decimal(5,4)), cast(2.333 as decimal(5,4)), cast(3.132 as decimal(5,4)), cast(4.344 as decimal(5,4))) AS X(Col);
          _col0           
--------------------------
 [4.3440, 3.1320, 2.3330] 
(1 row)

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 7, 2024
Copy link

netlify bot commented Mar 7, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 8abad48
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/66280a4fae26f100087498f1

@karteekmurthys
Copy link
Contributor

@minhancao The linux-adapters job failed. This is likely an alignment issue.

 int32_t accumulatorAlignmentSize() const override {
    return 1;
  }

Please override this method:

 int32_t accumulatorAlignmentSize() const override {
    return 1; --> must return size of int128_t in case of long decimals.
  }

@minhancao
Copy link
Contributor Author

minhancao commented Mar 8, 2024

@karteekmurthys I see this piece of code on line 99 of MinMaxAggregates.cpp, would this already have taken care of it?

/// Override 'accumulatorAlignmentSize' for UnscaledLongDecimal values as it
/// uses int128_t type. Some CPUs don't support misaligned access to int128_t
/// type.
template <>
inline int32_t MinMaxAggregate<int128_t>::accumulatorAlignmentSize() const {
  return static_cast<int32_t>(sizeof(int128_t));
}

@karteekmurthys
Copy link
Contributor

Also, the accumulator internally uses a heap for sorting. This heap allocator must be an aligned allocator:

template <typename T, typename Compare>
struct MinMaxNAccumulator {
  int64_t n{0};
  std::vector<T, StlAllocator<T>> heapValues;

  explicit MinMaxNAccumulator(HashStringAllocator* allocator)
      : heapValues{StlAllocator<T>(allocator)} {}

  int64_t getN() const {
    return n;
  }

  size_t size() const {
    return heapValues.size();
  }

Refer this:https://github.com/facebookincubator/velox/pull/8723/files#diff-5091b1dd8232c357067be47fb0e751772f91a0aa12dea5db835a2d4ae296a63cR93

@minhancao
Copy link
Contributor Author

Hi @karteekmurthys, I have made the change that you have requested, please review when you can. thanks!

@karteekmurthys karteekmurthys self-requested a review April 2, 2024 20:57
Copy link
Contributor

@karteekmurthys karteekmurthys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@aditi-pandit
Copy link
Collaborator

@minhancao : Looks good. Thanks for contributing !

@karteekmurthys
Copy link
Contributor

@Yuhta would you please review this?

@minhancao
Copy link
Contributor Author

Hi @Yuhta , can you review this PR when you can please? Thanks!

Copy link
Contributor

@Yuhta Yuhta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run aggregation fuzzer on the PR for 20 min?

if (inputType->isLongDecimal()) {
return std::make_unique<TNumericN<int128_t>>(resultType);
}
VELOX_NYI();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VELOX_UNREACHABLE()

AlignedStlAllocator<T, sizeof(int128_t)>,
StlAllocator<T>>;
using Heap = std::vector<T, Allocator>;
Heap heapValues;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: std::vector<T, Allocator> heapValues;

@minhancao
Copy link
Contributor Author

@Yuhta I have ran locally the velox aggregation fuzzer test:
min() function:
command ran:

_build/debug/velox/functions/prestosql/fuzzer/velox_aggregation_fuzzer_test --seed 123456 --duration_sec 1200 --logtostderr=1 --minloglevel=0 --only min --v=1 2>&1 | tee agg_fuzzer_test_min_20mins.out

output:

I20240422 15:00:08.498713 3874823 AggregationFuzzer.cpp:456] ==============================> Done with iteration 442
I20240422 15:00:08.498742 3874823 AggregationFuzzer.cpp:993] Total masked aggregations: 67 (15.12%)
I20240422 15:00:08.498764 3874823 AggregationFuzzer.cpp:995] Total global aggregations: 33 (7.45%)
I20240422 15:00:08.498781 3874823 AggregationFuzzer.cpp:997] Total group-by aggregations: 325 (73.36%)
I20240422 15:00:08.498798 3874823 AggregationFuzzer.cpp:999] Total distinct aggregations: 46 (10.38%)
I20240422 15:00:08.498817 3874823 AggregationFuzzer.cpp:1001] Total aggregations over distinct inputs: 69 (15.58%)
I20240422 15:00:08.498834 3874823 AggregationFuzzer.cpp:1003] Total window expressions: 39 (8.80%)
I20240422 15:00:08.498852 3874823 AggregationFuzzerBase.cpp:647] Total functions tested: 1
I20240422 15:00:08.498868 3874823 AggregationFuzzerBase.cpp:648] Total iterations requiring sorted inputs: 71 (16.03%)
I20240422 15:00:08.498883 3874823 AggregationFuzzerBase.cpp:650] Total iterations verified against reference DB: 41 (9.26%)
I20240422 15:00:08.498899 3874823 AggregationFuzzerBase.cpp:652] Total functions not verified (verification skipped / not supported by reference DB / reference DB failed): 39 (8.80%) / 209 (47.18%) / 125 (28.22%)
I20240422 15:00:08.498917 3874823 AggregationFuzzerBase.cpp:657] Total failed functions: 32 (7.22%)
[==========] Running 0 tests from 0 test suites.
[==========] 0 tests from 0 test suites ran. (0 ms total)
[  PASSED  ] 0 tests.

max() function:
command ran:

_build/debug/velox/functions/prestosql/fuzzer/velox_aggregation_fuzzer_test --seed 123456 --duration_sec 1200 --logtostderr=1 --minloglevel=0 --only max --v=1 2>&1 | tee agg_fuzzer_test_max_20mins.out

output:

I20240422 15:59:21.033170 3979822 AggregationFuzzer.cpp:456] ==============================> Done with iteration 451
I20240422 15:59:21.033193 3979822 AggregationFuzzer.cpp:993] Total masked aggregations: 68 (15.04%)
I20240422 15:59:21.033223 3979822 AggregationFuzzer.cpp:995] Total global aggregations: 34 (7.52%)
I20240422 15:59:21.033241 3979822 AggregationFuzzer.cpp:997] Total group-by aggregations: 333 (73.67%)
I20240422 15:59:21.033258 3979822 AggregationFuzzer.cpp:999] Total distinct aggregations: 46 (10.18%)
I20240422 15:59:21.033274 3979822 AggregationFuzzer.cpp:1001] Total aggregations over distinct inputs: 70 (15.49%)
I20240422 15:59:21.033290 3979822 AggregationFuzzer.cpp:1003] Total window expressions: 39 (8.63%)
I20240422 15:59:21.033306 3979822 AggregationFuzzerBase.cpp:647] Total functions tested: 1
I20240422 15:59:21.033322 3979822 AggregationFuzzerBase.cpp:648] Total iterations requiring sorted inputs: 74 (16.37%)
I20240422 15:59:21.033339 3979822 AggregationFuzzerBase.cpp:650] Total iterations verified against reference DB: 41 (9.07%)
I20240422 15:59:21.033355 3979822 AggregationFuzzerBase.cpp:652] Total functions not verified (verification skipped / not supported by reference DB / reference DB failed): 39 (8.63%) / 217 (48.01%) / 126 (27.88%)
I20240422 15:59:21.033372 3979822 AggregationFuzzerBase.cpp:657] Total failed functions: 32 (7.08%)
[==========] Running 0 tests from 0 test suites.
[==========] 0 tests from 0 test suites ran. (0 ms total)
[  PASSED  ] 0 tests.

@Yuhta
Copy link
Contributor

Yuhta commented Apr 23, 2024

The branch seems messed up

@minhancao minhancao force-pushed the add_decimal_support_for_min_max_oss branch from 3918677 to 8abad48 Compare April 23, 2024 19:21
@minhancao
Copy link
Contributor Author

@Yuhta Fixed the branch

@facebook-github-bot
Copy link
Contributor

@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@Yuhta merged this pull request in d97ddb4.

Copy link

Conbench analyzed the 1 benchmark run on commit d97ddb41.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Joe-Abraham pushed a commit to Joe-Abraham/velox that referenced this pull request Jun 7, 2024
…or#9005)

Summary:
Delivers facebookincubator#9004

This PR is for adding decimal support for min() and max() functions.

presto-cli output:
```
presto:tpch> SELECT MIN(Col,2) FROM (VALUES cast(0.82 as decimal(5,4)), cast(2.333 as decimal(5,4)), cast(3.132 as decimal(5,4)), cast(4.344 as decimal(5,4))) AS X(Col);
      _col0
------------------
 [0.8200, 2.3330]
(1 row)
presto:tpch> SELECT MIN(Col,3) FROM (VALUES cast(0.82 as decimal(5,4)), cast(2.333 as decimal(5,4)), cast(3.132 as decimal(5,4)), cast(4.344 as decimal(5,4))) AS X(Col);
          _col0
--------------------------
 [0.8200, 2.3330, 3.1320]
(1 row)
presto:tpch> SELECT MAX(Col,2) FROM (VALUES cast(0.82 as decimal(5,4)), cast(2.333 as decimal(5,4)), cast(3.132 as decimal(5,4)), cast(4.344 as decimal(5,4))) AS X(Col);
      _col0
------------------
 [4.3440, 3.1320]
(1 row)
presto:tpch> SELECT MAX(Col,3) FROM (VALUES cast(0.82 as decimal(5,4)), cast(2.333 as decimal(5,4)), cast(3.132 as decimal(5,4)), cast(4.344 as decimal(5,4))) AS X(Col);
          _col0
--------------------------
 [4.3440, 3.1320, 2.3330]
(1 row)

```

Pull Request resolved: facebookincubator#9005

Reviewed By: pedroerp

Differential Revision: D56487562

Pulled By: Yuhta

fbshipit-source-id: 283eb189a91840835784e565cd33fa713167d17d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants