Optimize context copies for parallel loops #579
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When executing code in parallel, each thread needs its own copy of the
eval_context
object, to avoid race conditions. For large programs, where the symbol table can have many thousands of entries, this added a lot of overhead.This PR fixes a few performance issues related to this:
is_closure
flag tolet_stmt
, which allows us to communicate toevaluate
exactly which symbols we need to copy to the thread local context, instead of copying the entire thing.uninitialized_allocator
and uses it for the symbol table. This way, when we allocate memory for the context, we don't need to memset it viastd::vector
's constructor.eval_context
to a separate objecteval_config
, and only store a pointer to that.Here's an example of what programs look like now with closures: