-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of irregular flattening #1740
base: master
Are you sure you want to change the base?
Conversation
Wow, this is exciting stuff! What are the prospects for success? |
There is no real risk of this not succeeding. Flattening a first-order monomorphic language is not particularly difficult, although implementing all the cases will be tedious. The main challenge is writing the code in a clean and maintainable way. Time is the main constraint. That's only for naive flattening, though, which is notoriously inefficient in practice. It's a more open question how efficient we can make it subsequently. I'm fairly confident we can do a good job, though. The only somewhat bothersome wrinkle is that we allow arbitrary (potentially irregular) parallelism in reduce/scan/histogram operators. Flattening doesn't have a solution for that. However, nontrivial parallelism in those operators is exceedingly rare, and in all cases these constructs can be turned into |
I like it!! Also the return of recursion...
man. 17. okt. 2022 kl. 13.57 skrev Troels Henriksen <
***@***.***>:
… There is no real risk of this not succeeding. Flattening a first-order
monomorphic language is not particularly difficult, although implementing
all the cases will be tedious. The main challenge is writing the code in a
clean and maintainable way. Time is the main constraint.
That's only for *naive* flattening, though, which is notoriously
inefficient in practice. It's a more open question how efficient we can
make it subsequently. I'm fairly confident we can do a good job, though.
The only somewhat bothersome wrinkle is that we allow arbitrary
(potentially irregular) parallelism in reduce/scan/histogram operators.
Flattening doesn't have a solution for that. However, nontrivial
parallelism in those operators is exceedingly rare, and in all cases these
constructs can be turned into maps without any asymptotic overhead, so we
can always do that for the really nasty cases. (Although so can the
original programmer who writes the source code.)
—
Reply to this email directly, view it on GitHub
<#1740 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAI5DO5II43RN2VHMXJCTLDWDU5JDANCNFSM6AAAAAARGPEMCI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
What does the type of a irregular nested array look like? Will there be any limitations on recursion? For example, I'd assume we still wouldn't have recursive data types. |
This is not a source language extension, so there will be no (source) irregular arrays. in the core language, they are represented as flag arrays with flag vectors. There will be no restrictions on recursive functions, but still no recursive data types (these would require a more substantial change to the internal value representation). |
Compiles but does not work.
Fixing parts I think was wrong before.
* Start function flattening * `cmp-bench-json.py` rewritten in Haskell (Issue #748) (#1860) * Note in CHANGELOG. * Use new tool. * Remove cmp-bench-json.py. * Fix #1863. (#1864) * This is 0.23.1. * Onwards! * Fix typo. * Remove copyCopyToCopy rule. (#1866) This is a very old (5+ years) rule that is much too naive in its handling of memory. We have better optimisations now, that aren't buggy. * Remove SrcLoc from ImportName. Syntactic information does not belong in semantic objects. * Use ImportName consistently. (#1869) Previously some parts of the compiler would use FilePaths directly, and it is ambiguous whether those refer to canonical import names. Now it should be clearer. * futhark-benchmarks: bump * Workaround for tiny /tmp on these servers. * futhark-benchmarks: bump * futhark-benchmarks: bump * futhark-benchmarks: bump * Workaround for temporary ghcup breakage. * Switch to GHC 9.4 in Cabal CI. (#1871) If this does not fix Windows, then I will remove it (again). * Plain values should never be Unique. * No need for this. * Also no setUniqueness here. * futhark-benchmarks: bump * Fix #1874. * Avoid spurious space. * Make consumption an effect on functions, rather than types. (#1873) This is a breaking change, because until now we allowed functions like def f (a: *[]i32, b: []i32) = ... where we could then pass in a tuple where in an application `f (x,y)` the value `x` would be consumed, but not `y`. However, this became increasingly difficult to support as the language grew (and frankly, it was always buggy). With this commit, the syntax above is still permitted, but it is interpreted as def f ((a,b): *([]i32, []i32)) = ... i.e. the single tuple argument is consumed *as a whole*. Long term we can also consider amending the syntax or warning about cases where it is misleading, but that is less urgent. I've wanted to make this simplification for a long time, but I always hit various snags. Today I managed to make it work, and the next step will be cleaning up the notion of "uniqueness" in return types as well (it should be the more general notion of "aliases"). * Forgot a test for #1874. * Avoid warnings about "potentially uninitialized" variables. C compilers are (understandably) not smart enough to see that these are never actually used uninitialised. * Make source language Apply AST node multi-argument. (#1875) This is a deviation from the concrete syntax, but humans tend to think of function calls having multiple arguments. Also, the AST had to keep a lot of useless metadata around to express the results of the intermediate applications. And again, it is related to making #1872 more feasible. * Better constant folding for CmpOp PrimExps. This mostly has the effect of making generated code a little neater. * futhark-benchmarks: bump * Add some comments. * More explicit. * Fix #1878. * Forbid access to interpreter. * Ensure no apply-of-apply. The symptom of this being wrong is that defunctionalisation would create duplicate functions. No more! * Handle array results. * Flattening of Copy. * Use Hendrix for CI. (#1862) * First experiment at using Hendrix for CI. * Maybe like this. * Import everything locally. * Try this. * More systems. * Also OpenCL. * Also depend on these. * More readable when split. * Import new CI actions. * Testing with slurm. * Forgot to specify hendrix and the partition flag might also be needed. * The wrong composite actions was included * Trying cuda and opencl on hendrix * Trying to use the composite test action for benchmarks. * Wrong amount of indentation * Forgot to add a |. * Some small changes that will most likely not change things. * trying to use sbatch * switching to titanrtx and used the p flag wrong. * Trailing whitespace purge. * Skip these on TITAN X. * Any GPU will work for these. * Trying to run benchmarks without slurmbench.py * Syntax errors * Accidentally used old keyword test. * found another syntax error i think * I think the equality sign broke it * maybe this will work * Used gres wrongly. * Do not use old futhark-benchmarks. * Trying to use srun and cleaned up composite actions. * Add some comments. * More explicit. * Fix #1878. * Forbid access to interpreter. * Ensure no apply-of-apply. The symptom of this being wrong is that defunctionalisation would create duplicate functions. No more! * Revert "Trying to use srun and cleaned up composite actions." This reverts commit 6c4111f. * using srun and fixing commit history hopefully? * Adding an 8 hour time limit. * Missing -. * Newer version og futhark-benchmarks * Trying to use `${{ always() }}`. * Revert "Newer version og futhark-benchmarks" because of `${{ always() }}` This reverts commit 965e788. * Hopefully this is the correct version of the futhark-benchmarks * Remove always() --------- Co-authored-by: due <[email protected]> * Do not use hendrix except where needed. * Cleanup whitespace. * Matplotlib is handy. * Add job names. * Avoid unnecessary deallocation. * These seem broken. * Style fixes. * Bump GHC. * Not needed anymore. * Seems to fix the nontermination. * Support rev AD of scanomaps and scatters with non-identity lambdas. (#1880) * Fix #1883. * Loop over all dimensions here. * Precompute more chunk counts. This is mostly to track the change in the parallelisation of Replicate in the preceding commit. * Allow arbitrary expressions in size expressions. We still only permit elaboration of expressions that correspond to variables or integer constants. This is a step on the path to realising #1659. * Always forget about the unit tests. * Avoid extra braces when printing. * Oops; fix copy/paste error. * These brackets are necessary. * Fix typo. * A few other wording fixes. * A few more text improvements. * Fix error in manifest schema discovered by @Erk-. * Newer action. * Fix invalid link Thanks to @lkuty for noticing. * Use explicit entry. * Fix #1885. * Better style. * Plotting tool. (#1877) Closes #1861. * Make executable. * Remove trailing whitespace. * Final status message. * Use GitHub machines for Python tests. * Generate tuning param definitions in GenericC. (#1890) This is a step towards #1884. Now that GenericC is responsible for all the work (and has all the information), it can generate new API functions. * Record which tuning params are relevant to which entry points. (#1891) This involves extending the manifest and server protocol, and modifying 'futhark autotune' to use this new information. The main advantage (apart from general cleanup) is that we can now tune threshold parameters used in non-inlined functions. * This is 0.24.1. * Onwards! * Fix #1895. * Do not use interpreter. * Incomplete work on nested maps. * More work on nested maps. * Fix #1896. * This goes in tests. * Use Hendrix for A100 jobs. (#1898) * Fail early. * All these SegOps should be virtualised. * Start function flattening * Incomplete work on function lifting * Very rudimentary lifted function results Currently only handles lifting of functions whose return types are scalar typed variables i.e. no constants or arrays. * Work on lifted function results * Further work on lifted function results * Change way return types are lifted * Correctly return constants from lifted functions * Existential size return for lifted functions Merge building of body statements and results for lifted functions. Will probably need to filter out existential size quantifiers before lifting results. * Filter existential sizes from lifted functions Remove existential quantifiers from the return type and result of a function before lifting as I believe their lifted version aren't needed. * Revert "Filter existential sizes from lifted functions" This reverts commit d04ecc5. It might be useful later but for now it complicates things. * Application of lifted functions * Do not lift entry points. * Work in progress match-expression flattening * Fix bug in lifting function parameters Lifting irregular parameters was (wrongly) in the order `[offsets, flags, segments, elements]`. When calling, the arguments were (rightly) given in the order `[segments, flags, offsets, elements]`. * Fix bug in lifting of if-then-else Wrote too many elements in the final scatters. * Make lifted if-then-else a little nicer * Handle irregular inputs to if-expressions * Handle irregular results of if-expressions * Handle general irregular match-expressions * Irregular match-expr: handle empty arrays * Better error messages * Handle free variables in `liftArg` `inputReps` now also gives type information, which is used by `liftArg` to determine if free variables are regular or irregular. * Flatten builtins scans over multi-dim arrays Let scan functions (genScanomap, genScan, genExScan, ...) in the flatten builtins module operate on multi-dimensional arrays. Of note is that `exScanAndSum`, when given a single-dimensional array, will return the # of segments and sum of segment sizes as scalar values and when given a multi-dimensional array will return them as arrays. Also move `segMap` from Flatten.hs to Flatten.Builtins.hs * Make sure flag and elems array have same size When passing flag and elems array to a function, or returning them from a function, resize them to please the type checker. * Replicate free vars in result of lifted functions * Handle free variables in match-expressions Move the common "if a subexp is a constant or free variable, replicate it, and otherwise do a lookup in dist inputs and dist env" code to a function `liftSubExp`. This is used in `liftArg`, `liftResult` and lifting match-expressions. * Add tests for lifting functions * Add tests for flattening match-expressions --------- Co-authored-by: Troels Henriksen <[email protected]>
This reverts commit 647e4fe.
This highly WIP PR contains an implementation of full flattening as a transformation that goes from the SOACS representation to GPU. It is the result of a few hours of hacking and can handle the following program:
Flattened versions of all core Futhark constructs must be defined, and so far I have only done Iota and Reduce. The main design challenge was the representation of irregular arrays in the compiler, as well as the overall structure of the algorithm. It is currently based on (irregular) distribution, much like the moderate flattening algorithm. I think that is the best way to do it.
The ultimate goals are:
An explicit non-goal is adding support for irregular arrays in the source language.