rewrite evaluable loops using Loop base class #861

joostvanzwieten · 2024-03-08T12:17:30Z

This PR merges the common parts of the evaluable loops into a base class.

To distinguish inner subgraphs from outer subgraphs, this patch chooses a background color based on the number of parent subgraphs.

`Evaluable`s like `Eig` and `LoopConcatenateCombined` return tuples of arrays. Dependencies use `ArrayFromTuple` to fetch one item of the tuple. For readability `ArrayFromTuple` is hidden from the graph. Currently this is done by looking for a `_node_tuple` method, implemented by `Eig` and `LoopConcatenateCombined`, which returns a tuple of nodes. This patch introduces a `graph.TupleNode` with publicly accessible item nodes, which can be used to merge `Evaluable._node_tuple` into `Evaluable._node`.

This patch replaces `Evaluable._node_tuple` with `Evaluable._node` returning a `TupleNode`.

Computing the integer bounds can be expensive. For example, a scalar array that does not depend on arguments will be simplified and evaluated, regardless how expensive this evaluation is and disregarding the `_intbounds_impl`. Currently the `ArrayFromTuple` class takes integer bounds as arguments. When combining loops, the integer bounds for the loops are evaluated, the loops are combined and the evaluated integer bounds are passed to `ArrayFromTuple`, even if we are not using the bounds. To prevent evaluating the bounds, this patch introduces the `_intbounds_tuple` property for `Evaluable`s that return tuples and replaces the integer bounds argument from `ArrayFromTuple` with a `_intbounds_impl` that queries the `_intbounds_tuple`.

In a follow-up commit the `LoopSum` will gain support for parallel evaluation, which makes the order of summation non-deterministic. This causes a unittest to fail. This patch lowers the tolerance for the test.

nutils/evaluable.py

gertjanvanzwieten · 2024-03-26T14:53:27Z

nutils/evaluable.py

+                for i, loop in enumerate(loops):
+                    kwargs = {}
+                    replacements[loop] = ArrayFromTuple(combined, i, loop.shape, loop.dtype)
+                self = util.shallow_replace(replacements.get, self)


Can't we wait till the end to apply all replacements at once?

No, I don't think so. shallow_replace stops at the shallowest replacement, so we'd miss replacements that are a dependency of the shallowest replacement.

gertjanvanzwieten · 2024-03-26T14:56:03Z

nutils/evaluable.py

+                loop, = loops
+                combined = loop._combine_loops(loop.body_arg._loop_deps - loop._loop_deps)
+                if combined != loop:
+                    self = util.shallow_replace(lambda func: combined if func == loop else None, self)


Same here via replacements?

Currently there are 2.5 implementations for evaluable loops: `LoopSum`, `LoopConcatenate` and `LoopConcatenateCombined`. The first loop is the most common one (before sparsification). The second appears after sparsification. The last loop is created only during the 'optimize for numpy' stage and is a combination of several concatenates in a single loop for performance reasons. This patch introduces a base class for evaluable loops and rewrites `LoopSum` and `LoopConcatenate` as implementations of the base class. The base class requires two methods to be implemented: one for initializing the output value and one for updating the output value for each iteration. Due to the generic nature of the base class, the method for updating the output value is guarded with a multiprocessing lock if the loop is evaluated in parallel, even when this is not necessary, e.g. for `LoopConcatenate`. To minimize the impact of locking, the lock used to increment the loop index is reused for updating the the output value. In addition this patch replaces `LoopConcatenateCombined` with `_LoopTuple`, which supports any combination of loop evaluables, not just `LoopConcatenate`.

The shape of `LoopConcatenate` is used in the loop initialization to allocate the output array. The length of the concatenation axis is the result of concatenating and summing (`_SizesToOffset`) the lengths of the same axis of the concatenation value. Under certain conditions the concatenation length can be simplified. For this to work, the concatenation length must be a constructor argument of `LoopConcatenate`, otherwise the length is not picked up by `deep_replace_property`. For the same reason the entire shape of `LoopConcatenate` was passed as constructor argument. Since the shape of an array is very likely already optimized --- the same is true for `LoopConcatenate.shape`! --- this patch removes the shape from the constructor of `LoopConcatenate`, except for the concatenation axis.

joostvanzwieten requested a review from gertjanvanzwieten March 8, 2024 12:17

joostvanzwieten force-pushed the generalize-loop branch 4 times, most recently from 989e798 to c9623e6 Compare March 14, 2024 11:42

joostvanzwieten force-pushed the generalize-loop branch 4 times, most recently from 81dc9c5 to b07f60d Compare March 21, 2024 11:34

joostvanzwieten marked this pull request as ready for review March 21, 2024 11:37

joostvanzwieten force-pushed the generalize-loop branch 2 times, most recently from 70e7795 to aaf54c1 Compare March 21, 2024 15:31

joostvanzwieten added 3 commits March 22, 2024 19:56

fill nested subgraphs with distinct bg colors

a5cbef9

To distinguish inner subgraphs from outer subgraphs, this patch chooses a background color based on the number of parent subgraphs.

replace Evaluable._node_tuple with TupleNode

0380125

This patch replaces `Evaluable._node_tuple` with `Evaluable._node` returning a `TupleNode`.

joostvanzwieten force-pushed the generalize-loop branch from 154b968 to 007ecc3 Compare March 25, 2024 22:29

joostvanzwieten added 3 commits March 26, 2024 12:22

let parallel.fork test false when not forking

bc80de9

return IDSetView in Evaluable._loop_concat_deps

18be7c8

joostvanzwieten force-pushed the generalize-loop branch from 007ecc3 to 9f18db3 Compare March 26, 2024 12:05

lower tolerance of unittest

4f467dc

In a follow-up commit the `LoopSum` will gain support for parallel evaluation, which makes the order of summation non-deterministic. This causes a unittest to fail. This patch lowers the tolerance for the test.

joostvanzwieten force-pushed the generalize-loop branch from 9f18db3 to d93e6a4 Compare March 26, 2024 14:45

gertjanvanzwieten requested changes Mar 26, 2024

View reviewed changes

joostvanzwieten force-pushed the generalize-loop branch from d93e6a4 to 9adefa7 Compare March 26, 2024 15:49

joostvanzwieten and others added 3 commits March 26, 2024 16:49

make _dependencies_sans_invariants non-recursive

de07c59

joostvanzwieten force-pushed the generalize-loop branch from 9adefa7 to de07c59 Compare March 26, 2024 15:49

gertjanvanzwieten approved these changes Mar 26, 2024

View reviewed changes

joostvanzwieten merged commit 0286339 into master Mar 26, 2024
23 checks passed

joostvanzwieten deleted the generalize-loop branch March 26, 2024 16:26

joostvanzwieten mentioned this pull request Oct 23, 2024

merge common functionality of LoopSum and LoopConcatenateCombined #737

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rewrite evaluable loops using Loop base class #861

rewrite evaluable loops using Loop base class #861

joostvanzwieten commented Mar 8, 2024 •

edited

Loading

gertjanvanzwieten Mar 26, 2024

joostvanzwieten Mar 26, 2024

gertjanvanzwieten Mar 26, 2024

rewrite evaluable loops using Loop base class #861

rewrite evaluable loops using Loop base class #861

Conversation

joostvanzwieten commented Mar 8, 2024 • edited Loading

gertjanvanzwieten Mar 26, 2024

Choose a reason for hiding this comment

joostvanzwieten Mar 26, 2024

Choose a reason for hiding this comment

gertjanvanzwieten Mar 26, 2024

Choose a reason for hiding this comment

joostvanzwieten commented Mar 8, 2024 •

edited

Loading