Skip to content

Commit

Permalink
[VL] Add config for the thread num of velox spill executor (#4794)
Browse files Browse the repository at this point in the history
Allow to specify the thread num of a dedicated spill executor for perf improvements
by default, the number of threads is 0 and the multi-threaded spilling is disabled
this feature is marked as experimental in doc
  • Loading branch information
WangGuangxin authored Mar 25, 2024
1 parent dafb371 commit 74e91a0
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 2 deletions.
10 changes: 9 additions & 1 deletion cpp/velox/compute/WholeStageResultIterator.cc
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ const std::string kHiveConnectorId = "test-hive";
// memory
const std::string kSpillStrategy = "spark.gluten.sql.columnar.backend.velox.spillStrategy";
const std::string kSpillStrategyDefaultValue = "auto";
const std::string kSpillThreadNum = "spark.gluten.sql.columnar.backend.velox.spillThreadNum";
const uint32_t kSpillThreadNumDefaultValue = 0;
const std::string kAggregationSpillEnabled = "spark.gluten.sql.columnar.backend.velox.aggregationSpillEnabled";
const std::string kJoinSpillEnabled = "spark.gluten.sql.columnar.backend.velox.joinSpillEnabled";
const std::string kOrderBySpillEnabled = "spark.gluten.sql.columnar.backend.velox.orderBySpillEnabled";
Expand Down Expand Up @@ -198,13 +200,19 @@ WholeStageResultIterator::WholeStageResultIterator(
std::shared_ptr<velox::core::QueryCtx> WholeStageResultIterator::createNewVeloxQueryCtx() {
std::unordered_map<std::string, std::shared_ptr<velox::Config>> connectorConfigs;
connectorConfigs[kHiveConnectorId] = createConnectorConfig();

auto spillThreadNum = veloxCfg_->get<uint32_t>(kSpillThreadNum, kSpillThreadNumDefaultValue);
std::shared_ptr<folly::Executor> spillExecutor = nullptr;
if (spillThreadNum > 0) {
spillExecutor = std::make_shared<folly::CPUThreadPoolExecutor>(spillThreadNum);
}
std::shared_ptr<velox::core::QueryCtx> ctx = std::make_shared<velox::core::QueryCtx>(
nullptr,
facebook::velox::core::QueryConfig{getQueryContextConf()},
connectorConfigs,
gluten::VeloxBackend::get()->getAsyncDataCache(),
memoryManager_->getAggregateMemoryPool(),
nullptr,
std::move(spillExecutor),
"");
return ctx;
}
Expand Down
2 changes: 1 addition & 1 deletion docs/get-started/Velox.md
Original file line number Diff line number Diff line change
Expand Up @@ -351,7 +351,7 @@ Using the following configuration options to customize spilling:
| spark.gluten.sql.columnar.backend.velox.spillStartPartitionBit | 29 | The start partition bit which is used with 'spillPartitionBits' together to calculate the spilling partition number |
| spark.gluten.sql.columnar.backend.velox.spillPartitionBits | 2 | The number of bits used to calculate the spilling partition number. The number of spilling partitions will be power of two |
| spark.gluten.sql.columnar.backend.velox.spillableReservationGrowthPct | 25 | The spillable memory reservation growth percentage of the previous memory reservation size |

| spark.gluten.sql.columnar.backend.velox.spillThreadNum | 0 | (Experimental) The thread num of a dedicated thread pool to do spill
# Velox User-Defined Functions (UDF)

## Introduction
Expand Down

0 comments on commit 74e91a0

Please sign in to comment.