Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to not reorder loops? #459

Closed
johnbcoughlin opened this issue Jan 11, 2023 · 4 comments
Closed

Add option to not reorder loops? #459

johnbcoughlin opened this issue Jan 11, 2023 · 4 comments

Comments

@johnbcoughlin
Copy link
Contributor

As noted in #126 and in my own usage of the library, compile time is roughly factorial in the number of nested loops. For many applications, it is not too hard to guess a loop ordering that should be good enough. For example, consider this set of 6 nested loops:

    for λx in 1:Kx, λy in 1:Ky, λvx in 1:Kvx, λvy in 1:Kvy
        for αx in 1:Nx, αv in 1:Nv
            ax = as_x[λx]
            ay = as_y[λy]
            avx = as_vx[λvx]
            avy = as_vy[λvy]

            # Do some work
        end
    end

The indices λx, λy etc. are larger strides than the indices αx, αv. So we probably shouldn't even consider loop orders where the alphas are outside. In terms of ordering the outer loops, I think that a human should be able to do a pretty good job just based on the way the data is laid out in memory. It would be nice to be able to take advantage of the automatic unrolling and vectorization capabilities of LV without incurring a compile time blowup.

Is this a feasible suggestion based on how the library is structured right now?

@chriselrod
Copy link
Member

I think that a human should be able to do a pretty good job just based on the way the data is laid out in memory.

  1. That is a function of type. E.g., Adjointand PermutedDimsArray.
  2. People do often get it wrong. "Fastest dimensions on the inside" is not always fastest.

I agree. LoppModels (the rewrite) of avoiding the factorial behavior.

There is #398, but I didn't actually notice it helping for some reason. It's worth looking more into.

Is this a feasible suggestion based on how the library is structured right now?

#398 is simple enough, and avoids the factorial algorithm.
Avoiding reordering at all is more difficult, because LV forgets what the original order actually was.

@chriselrod
Copy link
Member

Maybe the logic to actually skip the factorial behavior is broken in #398.

@johnbcoughlin
Copy link
Contributor Author

I am doing some more investigation into my slow compile times, and I think that it may actually have more to do with LLVM being quite slow to deal with high-dimensional arrays than with LV--the compile time for my generated function is extremely slow even with no @turbo annotation.

@chriselrod
Copy link
Member

chriselrod commented Jan 12, 2023

I am doing some more investigation into my slow compile times, and I think that it may actually have more to do with LLVM being quite slow to deal with high-dimensional arrays than with LV--the compile time for my generated function is extremely slow even with no @turbo annotation.

Are you generating tons of unrolled code?
If you blame LLVM, I suggest starting Julia with

JULIA_LLVM_ARGS="-time-passes" julia

It may also be useful to look into the Julia compiler itself.
If you build Julia from source, you can enable timings:
https://github.com/JuliaLang/julia/blob/b07484ca39a963b49fe31b8d5d2ceee4864f1737/src/options.h#L82
Knowing more about where time is spent in the compiler might help figure out how how to reduce it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants