Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize context copies for parallel loops #579

Merged
merged 9 commits into from
Jan 31, 2025
Merged

Optimize context copies for parallel loops #579

merged 9 commits into from
Jan 31, 2025

Conversation

dsharlet
Copy link
Owner

@dsharlet dsharlet commented Jan 31, 2025

When executing code in parallel, each thread needs its own copy of the eval_context object, to avoid race conditions. For large programs, where the symbol table can have many thousands of entries, this added a lot of overhead.

This PR fixes a few performance issues related to this:

  • Adds an is_closure flag to let_stmt, which allows us to communicate to evaluate exactly which symbols we need to copy to the thread local context, instead of copying the entire thing.
  • Adds uninitialized_allocator and uses it for the symbol table. This way, when we allocate memory for the context, we don't need to memset it via std::vector's constructor.
  • Move the user configurable functions from eval_context to a separate object eval_config, and only store a pointer to that.

Here's an example of what programs look like now with closures:

out.y = loop(parallel, [buffer_min(out, 1), buffer_max(out, 1)], 1) {
 closure {in, out, out.y} in {
  out.out.y = crop_dim(out, 1, [out.y, out.y]) {
   intm.out.y = crop_dim(out, 1, [out.y, out.y]) {
    out.x = loop(parallel, [buffer_min(out, 0), buffer_max(out, 0)], 1) {
     closure {in, out, out.x, out.out.y, intm.out.y} in {
      out.out.x = crop_dim(out.out.y, 0, [out.x, out.x]) {
       intm.out.x = crop_dim(intm.out.y, 0, [out.x, buffer_max(out.out.x, 0)]) {
        call(<anonymous target>, {in}, {intm.out.x})
       }
       call(<anonymous target>, {out}, {out.out.x})
      }
     }
    }
   }
  }
 }
}

@dsharlet dsharlet requested a review from vksnk January 31, 2025 06:01
Copy link
Collaborator

@vksnk vksnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very cool idea!

base/allocator.h Show resolved Hide resolved
builder/pipeline.cc Show resolved Hide resolved
builder/optimizations.cc Show resolved Hide resolved
runtime/evaluate.cc Show resolved Hide resolved
@dsharlet dsharlet merged commit 24a1076 into main Jan 31, 2025
1 check passed
@dsharlet dsharlet deleted the ds/closure branch January 31, 2025 07:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants