Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge Fourier method into main #23

Merged
merged 10 commits into from
Aug 8, 2024
Merged

merge Fourier method into main #23

merged 10 commits into from
Aug 8, 2024

Conversation

MuellerSeb
Copy link
Member

Since the 1.0 tag was created from this branch, it should be merged into main.

@MuellerSeb MuellerSeb added the enhancement New feature or request label Jul 20, 2024
@MuellerSeb MuellerSeb requested a review from LSchueler July 20, 2024 13:48
@@ -29,7 +29,7 @@ lto = true
codegen-units = 1

[dependencies]
pyo3 = { version = "0.20", features = ["abi3-py38", "extension-module"] }
pyo3 = { version = "0.20", features = ["abi3-py310", "extension-module"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not trivial due to the new Bound-API, but you may want to consider upgrading to pyo3 0.22 as it significantly reduces Python-to-Rust call overheads.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Adam, it's great to hear from you! Yeah, I realized that updating the version created a few problems and I wanted to publish the new function as quickly as possible. But now, I should have some time to see how to upgrade pyo3.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmarks are inconclusive, but there might actually be a slight improvement 👍


rayon::ThreadPoolBuilder::new()
.num_threads(num_threads.unwrap_or(rayon::current_num_threads()))
.build()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will add a significant cost for initializing a new thread pool for each and every call, i.e. via the thread start-up cost.

Maybe this could be changed to use the global/default thread pool if num_threads is None? Especially as rayon::current_num_threads is not the default (that would be RAYON_NUM_THREADS or std::thread::available_parallelism otherwise), but the number of threads in the global thread pool in contrast to the function documentation. (So basically it will use the number of threads chosen when the global pool was initialized which of course will be the default if it was initialized implicitly/automatically.)

Maybe something like

fn with_num_threads<F, R>(num_threads: Option<usize>, f: F) where F: FnOnce() -> R {
  if let Some(num_threads) = num_threads {
    ThreadPoolBuilder::new(num_threads).build().unwrap().install(f)
  } else {
    f()
  }
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out and thanks for the suggestion. I'll take a closer look soon.

.and(spectrum_factor)
.and(z1)
.and(z2)
.fold(0.0, |sum, k, &spectrum_factor, &z1, &z2| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the accumulator is a scalar and hence cheap to copy/reduce, I wonder whether this would actually benefit from using a par_fold instead of the sequential fold?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And thanks again :-) I'll run a few benchmarks soon to see, if you are right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, the par_fold version is about 50% slower in my benchmarks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite possible that the overhead is unhelpful for the inner loop which would otherwise benefit from auto-vectorization. Using with_min_len might be a nice compromise between granularity and overhead, but it does add another tunable.

@LSchueler
Copy link
Member

Oh man, the weeks before my holidays where pretty stressful and I had to constantly jump between different things. But still, creating the tag from this branch?! Thanks for catching this!

@LSchueler LSchueler merged commit 0c43f81 into main Aug 8, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants