Skip to content

Hive splits and multithreading #5552

Answered by xiaoxmeng
saifmasood asked this question in Q&A
Discussion options

You must be logged in to vote

@saifmasood I think Velox does parallelize the table scan operation at split level. The split added by Task::addSplit() used in test actually put the splits in a shared groupSplitsStores in the task which are indexed by table scan plan node id so all the table scan operators could access the added splits. Each table scan operator fetches and processes one split at time (table scan operator's getOutput method calls Task::getSplitOrFuture() to do that). I am not sure how you find out Velox processes all the four as one? There might be some time related race condition that one table scan operator run very fast and process all the four even before the second one starts but you can experiment …

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Answer selected by xiaoxmeng
Comment options

You must be logged in to vote
1 reply
@saifmasood
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants