Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inc. ranges with start == end and negative step #2194

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

sortraev
Copy link
Collaborator

@sortraev sortraev commented Nov 21, 2024

There is a discrepancy in certain cases of inclusive ranges between the interpreter and the compiled backends. I do not know which of the two is intended, but from a quick glance I assume the interpreter has intended behavior.

So:

This changes behavior of inclusive ranges for the case where start == end and start != second start > second, for compiled programs only. In this case the result is now [start] rather than a runtime error.

Example: 1..0...1 now produces [1] rather than an "invalid range" runtime error.

This makes behavior for compiled programs similar to that of the Futhark interpreter (and e.g. Haskell ranges), for at least this particular case (I have not looked for other discrepancies).

This changes behavior of inclusive ranges for the case where
`start == end` and `start != second`, for compiled programs only.
In this case the result is now `[start]` rather than a runtime error.

Example: `1..0...1` now produces `[1]` rather than an "invalid range"
runtime error.

This makes behavior for compiled programs similar to that of the Futhark
interpreter (and e.g. Haskell ranges), for at least this particular
case (I have not looked for other discrepancies).
@sortraev sortraev requested a review from athas November 21, 2024 11:20
@sortraev
Copy link
Collaborator Author

@sortraev sortraev changed the title Fix inc. ranges with start == end and non-zero step Fix inc. ranges with start == end and ~non-zero~ negative step Nov 21, 2024
@sortraev sortraev changed the title Fix inc. ranges with start == end and ~non-zero~ negative step Fix inc. ranges with start == end and negative step Nov 21, 2024
@athas
Copy link
Member

athas commented Nov 21, 2024

Do we have a good argument for why the interpreter behaviour is the right one? What are the underlying principles/rules?

@athas
Copy link
Member

athas commented Nov 21, 2024

Looking at the fix, it's perhaps plausible that this was a typo in the first place.

@sortraev
Copy link
Collaborator Author

sortraev commented Nov 22, 2024

Do we have a good argument for why the interpreter behaviour is the right one? What are the underlying principles/rules?

I thought I remembered reading Futhark (inclusive) ranges were based on Haskell ranges (ignoring those cases were the Haskell range would be empty or infinite), but I can't find any note of this so perhaps I made it up or heard it from someone who did. I then chose to conform to the interpreter since it seems to most closely mimic Haskell ranges.

In case anybody not familiar is reading along:
For a Haskell range [first,second..last] we start with first and yield elements as long as the current element is less or equal to last (if ascending) or greater or equal to last (if descending):

def haskell_like_range(first, second, last):
    step = second - first
    is_descending = step < 0
    is_ascending  = not is_descending
    tmp = first
    while (is_ascending  and tmp <= last) or \
          (is_descending and tmp >= last):
        yield tmp
        tmp += step

In Futhark we for obvious reasons do not want infinite ranges, so we throw a runtime error when first = second. Runtime error is also thrown on empty range, although I'm not sure about the motivation here

@athas
Copy link
Member

athas commented Nov 22, 2024

The general principle is that an error is thrown when first+x*second < last for all x (ignoring overflow), as these are the cases that result in infinite ranges.

The form x...y is syntactic sugar for x..x+1...y. This explains why 1...0 is an error and not an empty range, although I suppose it could have been defined differently. Empty ranges are however possible with the x..<y notation, which I suppose could be argued is an inconsistency.

@sortraev
Copy link
Collaborator Author

The general principle is that an error is thrown when first+x*second < last for all x (ignoring overflow), as these are the cases that result in infinite ranges.

I assume you mean throwing error when first+x*step < last for all x (and, for descending ranges, error on first+x*step > last for all x) -- but is this not equivalent to throwing an error on first = second?

Empty ranges are however possible with the x..<y notation, which I suppose could be argued is an inconsistency.

Agree -- if I didn't know better I think I'd intuitively expect e.g. x..y..<z <=> x..y...(z-1) (and similarly for descending ranges). Is there a motivation for erring on empty inclusive ranges?

@athas
Copy link
Member

athas commented Nov 23, 2024

Alright, I'm pretty much convinced. Will you fix the compile error and add modify one of the range tests?

@sortraev
Copy link
Collaborator Author

sortraev commented Nov 23, 2024

Have fixed the unused variable warning (I assume you meant this), and will look at a test later when I have more time.

But I am unsure of exactly how much I have convinced you of, hehe. Just the change proposed in the original commit? Or should we allow certain empty inclusive ranges, to address the inconsistency between e.g. 1..<1 and 1..2...0?

I am also a little unsure of the current reasoning behind generating empty exclusive ranges. For example, as discussed 1..<1 is valid and generates an empty range, but e.g. 1..<0 and 1..<-42 produce errors. How about allowing empty ranges (both inclusive and exclusive) in general, and only erring on step = 0?

@athas
Copy link
Member

athas commented Nov 23, 2024

The reason is that we would like the property that x..<y produces an array with y-x elements (or an error). This is in fact the type of such a range (expressed in the size-dependent type system), and so if y-x<0 then that should definitely be an error.

@athas
Copy link
Member

athas commented Nov 23, 2024

Note that more complicated ranges (such as I think everything with an explicit step), the size in the type will be a freshly generated unknown type, with no promises about what it might be, so we are at greater liberty to play games.

@sortraev
Copy link
Collaborator Author

sortraev commented Nov 25, 2024

The reason is that we would like the property that x..<y produces an array with y-x elements (or an error). This is in fact the type of such a range (expressed in the size-dependent type system), and so if y-x<0 then that should definitely be an error.

This makes a lot of sense! I had not thought of this since I am not big into type theory. It also made me think that perhaps the validity of a range is more easily and concisely expressed in terms of its size -- e.g.: "a range is valid if its size is non-negative". But I'm having a hard time figuring out what the "size" of a range is in general -- the formula ceil( [abs(last - first) + (if inclusive then 1 else 0)] / abs(step)), as is used currently, seems reasonable for the valid ranges I've tested (including those which are not currently considered valid, such as 1..2...0). However, it breaks for invalid ranges such as 7..-4...8, which by this formula has size 1 -- obviously wrong. What exactly is the size of a general range?

I'm not looking to implement, just curious.

For conciseness, but also to better comprehend which (and where) changes
need to be made.
* split step and step_zero computations.
* no need to compute sign.
* inline some definitions which were used only once.
@sortraev
Copy link
Collaborator Author

Actually, after spending some time on thinking of a way to handle 1..2...0 and not being able to figure out how to handle boundaries concisely (i.e. without a lot of branching logic), I think it might help to have a general formula for the size of a range (if it is even defined for invalid ranges, that is).

The problem is essentially:
for the exclusive range 1..<1, we have (first, last) = (1, 1), and hence bounds_invalid_upwards = last < first = false (i.e. not invalid), whereas for the equivalent inclusive range 1..2...0, we have (first, last) = (1, 0), and hence bounds_invalid_upwards = last < first = true. It looks like a simple off-by-one, but there are multiple cases to handle (4, I think, since we can have both ascending and descending ranges with either first >= last or last =< first).

@athas
Copy link
Member

athas commented Nov 25, 2024

Branching logic is not by itself a problem. There's lots of it already. (In retrospect, I think having range expressions may have been a mistake - they are far too subtle compared to their merits.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants