Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise on out of bounds range access #1061

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

billylanchantin
Copy link
Member

Fixes: #1005

In the example from that issue we now get:

iex> df = Explorer.DataFrame.new(a: [1, 2, 3])
iex> df[1..1]
# ** (ArgumentError) range 1..1 is out of bounds for a dataframe with 1 column(s)
#     (explorer 0.11.0-dev) lib/explorer/shared.ex:221: Explorer.Shared.to_existing_columns/3
#     (explorer 0.11.0-dev) lib/explorer/data_frame.ex:387: Explorer.DataFrame.fetch/2
#     (elixir 1.18.1) lib/access.ex:322: Access.get/3
#     iex:3: (file)

@billylanchantin
Copy link
Member Author

The bounds check on ranges was a little fiddly since Enum.slice/2 accepts negative bounds. I think I got it but I'm not 100% sure.

[slice_min, slice_pseudo_max] =
[first, last]
|> Enum.map(&if(&1 < 0, do: n_cols + &1, else: &1))
|> Enum.sort()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because ranges can be reverse ordered, I am not sure how much we want to rely on sort. My suggestion would be to validate first and last independently:

for pos <- [first, last] do
  cond do
    pos >= 0 and pos < n_cols -> :ok
    pos < 0 and pos >= -n_cols -> :ok
    true -> raise "..."
  end
end

WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enum.slice/2 says it doesn't support reverse ordered ranges yet. Do you want to support them here in anticipation of their future support in Elixir 2.0?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point. If the code above works (big if), then it should be simpler and more future proof. Worth giving it a try?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Unfortunately though, I think your algorithm doesn't work for the following situation:

df = DF.new(a: [1, 2, 3], b: ["a", "b", "c"], c: [4.0, 5.1, 6.2])

n_cols = DF.n_columns(df) #=> 3
slice = 1..10//10

assert slice.last > n_cols
assert DF.to_columns(df[slice]) == %{"b" => ["a", "b", "c"]}

Here slice is a valid subset of the columns because even though slice.last > n_cols, Enum.to_list(slice) == [1]. This only happens when step > 1 of course.

We can still try and work out a future proof approach though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can adjust the last for a range start..finish//step, the last element can be calculated using this formula:
start + (div(finish - start, step) * step)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I wish there was a way we had to implement less of the existing range machinery. :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we deal with empty ranges correctly? 10..0//1 is empty and there should not fail, even in a DF with 2 columns.

Copy link
Member Author

@billylanchantin billylanchantin Feb 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish there was a way we had to implement less of the existing range machinery. :(

Ha yeah I was thinking Range could use some helpers. I started to design some in my head before realizing I'd spend all day on that. 😅

Do we deal with empty ranges correctly?

Blerg. So apparently not:

iex(4)> df[10..0//1]
** (ArgumentError) range 10..0//1 is out of bounds for a dataframe with 3 column(s)
    (explorer 0.11.0-dev) lib/explorer/shared.ex:223: Explorer.Shared.to_existing_columns/3
    (explorer 0.11.0-dev) lib/explorer/data_frame.ex:387: Explorer.DataFrame.fetch/2
    (elixir 1.18.1) lib/access.ex:322: Access.get/3
    iex:4: (file)
iex(4)> df[..]
#Explorer.DataFrame<
  Polars[3 x 3]
  a s64 [1, 2, 3]
  b string ["a", "b", "c"]
  c f64 [4.0, 5.1, 6.2]
>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Er, wait. Is that actually the behavior we want? Even though both 0..-1//1 and 10..0//1 are empty, Enum.slice/2 treats them differently:

iex(7)> Enum.slice(df.names, 10..0//1)
[]
iex(8)> Enum.slice(df.names, 0..-1//1)
["a", "b", "c"]

I think we actually want the current behavior?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Negative indexes are treated as last. So it is an empty range but it gets normalized. The other one is an empty range altogether. And then there are ranges that raise when out of index.

it may be best to leave this to until we add a slice! to Elixir? 🤔

This won't be supported until Elixir 2.0. But
we can make it future proof now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Accessing a DataFrame with an out of bounds range should raise
2 participants