-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
colexecdisk: create disk-backed operators lazily in diskSpiller
This commit makes the creation of the disk-backed operator lazy, whenever the diskSpiller spills to disk for the first time throughout its lifetime. This allows us avoid allocations in the common case when we don't spill to disk. This required adjustment for some tests that verify that the expected closers are accumulated. In some cases we know exactly how many closers will be added if the operator tree is forced to spill to disk (external sort, external distinct, disk-backed hash group join) whereas in others (external hash join and external hash aggregator) this number is not easy to calculate. In order to have some coverage for the latter case this commit introduces some sanity checks to ensure that at least in some test cases the number of added closers seems reasonable. Note that this change is not without its risks. In particular, the disk-backed operator constructor must execute correctly when delayed. The function captures references to miscellaneous state, which - if mutated - can lead to problems. One such problem was the capture of `result.ColumnTypes` when creating the external distinct, and this commit applies a fix of having a separate reference to the input schema. All constructor functions have been audited to ensure that no arguments being captured would be modified later. The constructors also capture some other functions (most commonly around the monitor registry, the closer registry, and the constructor for disk-backed sort), but those should be safe. An additional complication is that the delayed constructor functions can now run concurrently (if we have concurrency within the flow, most likely due to the plan being distributed). To account for that the monitor and the closer registries have been extended to support optional mutex protection (which is installed whenever we create the first diskSpiller). I chose to make it optional so that we don't incur the mutex access penalty when we have no disk-spilling operators in the plan (the conditional branch should be faster than unconditional mutex lock and unlock). Also, note that the disk-backed operator chain will now be excluded from EXPLAIN (VEC, VERBOSE). This seems ok. Release note: None WIP on concurrency-safety
- Loading branch information
1 parent
0c4cd45
commit 316be7c
Showing
21 changed files
with
355 additions
and
166 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.