-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raise on out of bounds range access #1061
base: main
Are you sure you want to change the base?
Conversation
If `step > 1`, it's possible for `range.last` to be pastthe bounds but for the `range` to still be a subset of the columns.
The bounds check on ranges was a little fiddly since |
[slice_min, slice_pseudo_max] = | ||
[first, last] | ||
|> Enum.map(&if(&1 < 0, do: n_cols + &1, else: &1)) | ||
|> Enum.sort() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because ranges can be reverse ordered, I am not sure how much we want to rely on sort. My suggestion would be to validate first and last independently:
for pos <- [first, last] do
cond do
pos >= 0 and pos < n_cols -> :ok
pos < 0 and pos >= -n_cols -> :ok
true -> raise "..."
end
end
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enum.slice/2
says it doesn't support reverse ordered ranges yet. Do you want to support them here in anticipation of their future support in Elixir 2.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point. If the code above works (big if), then it should be simpler and more future proof. Worth giving it a try?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Unfortunately though, I think your algorithm doesn't work for the following situation:
df = DF.new(a: [1, 2, 3], b: ["a", "b", "c"], c: [4.0, 5.1, 6.2])
n_cols = DF.n_columns(df) #=> 3
slice = 1..10//10
assert slice.last > n_cols
assert DF.to_columns(df[slice]) == %{"b" => ["a", "b", "c"]}
Here slice
is a valid subset of the columns because even though slice.last > n_cols
, Enum.to_list(slice) == [1]
. This only happens when step > 1
of course.
We can still try and work out a future proof approach though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can adjust the last for a range start..finish//step
, the last element can be calculated using this formula:
start + (div(finish - start, step) * step)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. I wish there was a way we had to implement less of the existing range machinery. :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we deal with empty ranges correctly? 10..0//1 is empty and there should not fail, even in a DF with 2 columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish there was a way we had to implement less of the existing range machinery. :(
Ha yeah I was thinking Range
could use some helpers. I started to design some in my head before realizing I'd spend all day on that. 😅
Do we deal with empty ranges correctly?
Blerg. So apparently not:
iex(4)> df[10..0//1]
** (ArgumentError) range 10..0//1 is out of bounds for a dataframe with 3 column(s)
(explorer 0.11.0-dev) lib/explorer/shared.ex:223: Explorer.Shared.to_existing_columns/3
(explorer 0.11.0-dev) lib/explorer/data_frame.ex:387: Explorer.DataFrame.fetch/2
(elixir 1.18.1) lib/access.ex:322: Access.get/3
iex:4: (file)
iex(4)> df[..]
#Explorer.DataFrame<
Polars[3 x 3]
a s64 [1, 2, 3]
b string ["a", "b", "c"]
c f64 [4.0, 5.1, 6.2]
>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Er, wait. Is that actually the behavior we want? Even though both 0..-1//1
and 10..0//1
are empty, Enum.slice/2
treats them differently:
iex(7)> Enum.slice(df.names, 10..0//1)
[]
iex(8)> Enum.slice(df.names, 0..-1//1)
["a", "b", "c"]
I think we actually want the current behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Negative indexes are treated as last. So it is an empty range but it gets normalized. The other one is an empty range altogether. And then there are ranges that raise when out of index.
it may be best to leave this to until we add a slice! to Elixir? 🤔
This won't be supported until Elixir 2.0. But we can make it future proof now.
Fixes: #1005
In the example from that issue we now get: