Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise ValueError if DataFrame column length does not match data #14135

Closed
10 changes: 9 additions & 1 deletion python/cudf/cudf/core/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -645,7 +645,15 @@ def __init__(
elif isinstance(data, (cudf.Series, pd.Series)):
if isinstance(data, pd.Series):
data = cudf.Series.from_pandas(data, nan_as_null=nan_as_null)

if (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we shouldn't be raising no?:

In [1]: import pandas as pd

In [2]: s = pd.Series([1, 2, 3], name=2)

In [3]: pd.DataFrame(s)
Out[3]: 
   2
0  1
1  2
2  3

In [4]: pd.DataFrame(s, columns=[])
Out[4]: 
Empty DataFrame
Columns: []
Index: []

In [5]: pd.DataFrame(s, columns=[2])
Out[5]: 
   2
0  1
1  2
2  3

In [6]: pd.DataFrame(s, columns=[2, 1])
Out[6]: 
   2    1
0  1  NaN
1  2  NaN
2  3  NaN

In [7]: pd.DataFrame(s, columns=[1])
Out[7]: 
Empty DataFrame
Columns: [1]
Index: []

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess it's this very narrow use case that should only raise. (I'm not sure of the history why the other cases are allowed)

columns is not None
and data.name not in columns
and len(data) != len(columns)
):
raise ValueError(
f"Length of values ({len(data)}) does not "
f"match length of columns ({len(columns)})"
)
name = data.name or 0
self._init_from_dict_like(
{name: data},
Expand Down
6 changes: 6 additions & 0 deletions python/cudf/cudf/tests/test_dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -10366,6 +10366,12 @@ def test_dataframe_nlargest_nsmallest_str_error(attr):
)


def test_dictlike_data_column_length_mismatch():
ser = cudf.Series(range(5))
with pytest.raises(ValueError):
cudf.DataFrame(ser, columns=[1, 2])


@pytest.mark.parametrize("digits", [0, 1, 3, 4, 10])
def test_dataframe_round_builtin(digits):
pdf = pd.DataFrame(
Expand Down