-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-3594] [VL] Allow users to set bloom filter configurations #3610
Conversation
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
@@ -1267,4 +1272,12 @@ object GlutenConfig { | |||
+ "partial aggregation may be early abandoned.") | |||
.intConf | |||
.createOptional | |||
|
|||
val COLUMNAR_VELOX_BLOOM_FILTER_MAX_NUM_BITS = | |||
buildConf("spark.gluten.sql.columnar.backend.velox.bloomFilter.maxNumBits") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a new config as spark.bloom_filter.max_num_bits
has different behavior from spark conf RUNTIME_BLOOM_FILTER_MAX_NUM_BITS
.
https://github.com/facebookincubator/velox/blob/73d4279a14744cf4d038d3a967a49dcddbad9d39/velox/core/QueryConfig.h#L632C5-L632C24
@@ -62,6 +62,9 @@ const std::string kAbandonPartialAggregationMinPct = | |||
"spark.gluten.sql.columnar.backend.velox.abandonPartialAggregationMinPct"; | |||
const std::string kAbandonPartialAggregationMinRows = | |||
"spark.gluten.sql.columnar.backend.velox.abandonPartialAggregationMinRows"; | |||
const std::string kBloomFilterExpectedNumItems = "spark.sql.optimizer.runtime.bloomFilter.expectedNumItems"; | |||
const std::string kBloomFilterNumBits = "spark.sql.optimizer.runtime.bloomFilter.numBits"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about incorporating the two parameters mentioned above with the "spark.gluten" prefix? For instance, we could use "spark.gluten.sql.optimizer.runtime.bloomFilter.expectedNumItems" and "spark.gluten.sql.optimizer.runtime.bloomFilter.numBits".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok to me, let me add new config entries for them.
Run Gluten Clickhouse CI |
1 similar comment
Run Gluten Clickhouse CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just one small question.
getConfigValue(confMap_, kBloomFilterExpectedNumItems, "1000000"); | ||
configs[velox::core::QueryConfig::kSparkBloomFilterNumBits] = | ||
getConfigValue(confMap_, kBloomFilterNumBits, "8388608"); | ||
configs[velox::core::QueryConfig::kSparkBloomFilterMaxNumBits] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the default value of kSparkBloomFilterMaxNumBits
in vanilla spark is 67108864L here . Why here is 4194304?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because velox has below requirement: https://github.com/facebookincubator/velox/blob/73d4279a14744cf4d038d3a967a49dcddbad9d39/velox/core/QueryConfig.h#L632C5-L632C24
Please also update the document, https://github.com/oap-project/gluten/blob/main/docs/Configuration.md |
Added, thanks. |
Run Gluten Clickhouse CI |
@zhli1142015 Can you help to rebase again? Thanks. |
8eae37e
to
ee2b7d6
Compare
Run Gluten Clickhouse CI |
Rebased, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
What changes were proposed in this pull request?
Fixes: #3594
How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)