Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enh]: Enable validation #1337

Open
MarcSkovMadsen opened this issue Nov 9, 2024 · 8 comments
Open

[Enh]: Enable validation #1337

MarcSkovMadsen opened this issue Nov 9, 2024 · 8 comments
Labels
enhancement New feature or request

Comments

@MarcSkovMadsen
Copy link

MarcSkovMadsen commented Nov 9, 2024

We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?

Please describe the purpose of the new feature or describe the problem to solve.

I would like to add support for general DataFrames in the HoloViz ecosystem.

The starting point is that most things are "stored" as parameters of a param.Parameterized class:

class MyClass(param.Parameterized):
    value = param.DataFrame()

MyClass(value=pl.DataFrame())

When MyClass is instantiated the value is validated. I was hoping that narwhals would provide functionality to validate if the supplied value is valid. For example if it is an IntoDataFrame.

Suggest a solution if possible.

I was hoping it was possible to do something like isinstance(value, IntoDataFrame):

But when running

from narwhals.typing import IntoFrame, IntoDataFrame
import pytest
import pandas as pd
import polars as pl

TYPES = [
    IntoFrame, IntoDataFrame
]

DFs = [pd.DataFrame(), pl.DataFrame()]

@pytest.mark.parametrize("tp", TYPES)
@pytest.mark.parametrize("df", DFs)
def test_validation(df, tp):
    assert isinstance(df, tp)

I get

FAILED script.py::test_validation[df0-Union0] - TypeError: issubclass() arg 2 must be a class, a tuple of classes, or a union
FAILED script.py::test_validation[df0-Union1] - TypeError: issubclass() arg 2 must be a class, a tuple of classes, or a union
FAILED script.py::test_validation[df1-Union0] - TypeError: issubclass() arg 2 must be a class, a tuple of classes, or a union
FAILED script.py::test_validation[df1-Union1] - TypeError: issubclass() arg 2 must be a class, a tuple of classes, or a union

If you have tried alternatives, please describe them below.

I can of course write a validation function myself. But I was hoping that not have to maintain functionality to handle different dataframe libraries.

Additional information that may help us understand your needs.

No response

@FBruzzesi
Copy link
Member

FBruzzesi commented Nov 9, 2024

Hey @MarcSkovMadsen , thank for reporting. We currently provide a is_into_dataframe function to validate if the object is convertible into a eager narwhals DataFrame:

from narwhals.dependencies import is_into_dataframe
import pandas as pd
import polars as pl
import numpy as np

df_pd = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
df_pl = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
np_arr = np.array([[1, 4], [2, 5], [3, 6]])

is_into_dataframe(df_pd)
True

is_into_dataframe(df_pl)
True

is_into_dataframe(np_arr)
False

As of now we don't have an equivalent for lazy dataframes though. You could combine other functionalities, such as is_dask_dataframe and is_polars_lazyframe

@MarcSkovMadsen
Copy link
Author

Thanks that should be ok for me for now.

I was thinking that you needed to change your typing implementation to something like below:

image

@MarcSkovMadsen
Copy link
Author

MarcSkovMadsen commented Nov 9, 2024

I read the tutorial and assumed I would learn what was necessary from that to get started. But I should also learn to study the API docs. Just did not cross my mind 😄 Thx.

@FBruzzesi
Copy link
Member

Any feedback on how to improve the documentation is welcomed! We recently did some development to move forward with the integration with plotly and let the docs a little behind 🙈

@MarcSkovMadsen
Copy link
Author

MarcSkovMadsen commented Nov 9, 2024

In addition to pandas and polars dataframe we also have a database SELECT query that I thought could have lightweight interchange support. See #1289 for more context.

From

image

I would expect the below to work:

from narwhals.dependencies import is_into_dataframe

class DatabaseConnector():
    def __dataframe__(self):
        raise NotImplementedError()

connector = DatabaseConnector()

assert is_into_dataframe(connector)

But it does not.

script.py:9: in <module>
    assert is_into_dataframe(connector)
E   assert False
E    +  where False = <function is_into_dataframe at 0x7f0093107ce0>(<script.DatabaseConnector object at 0x7f00930bcf10>)

@FBruzzesi
Copy link
Member

FBruzzesi commented Nov 9, 2024

I think the best workaround for now then would be something along the following lines:

if isinstance(
    df := nw.from_native(df_native, eager_or_interchange_only=True, pass_through=True),
    nw.DataFrame,
):
    # work with narwhals DataFrame, using eager_or_interchange_only=True enables interchange support
elif isinstance(
    df := nw.from_native(df_native, pass_through=True),
    nw.LazyFrame,
):
    # work with narwhals LazyFrame
else:
    # df_native is unchanged

With pass_through=True, the original object will be returned if unable to convert to a Narwhals DataFrame/LazyFrame, making the first two conditions False.

@MarcoGorelli
Copy link
Member

Thanks for the request!

Agree with Francesco's suggestion, we do something similar in the Plotly PR

Just one clarification: hopefully, "interchange" level will just be temporary, and in 2025 we can end up with:

  • full api support
  • lazy-only support

And we can end up with a well-defined spec which different dataframe / database libraries can implement so that we're not the single point of failure. It's very difficult to do that properly without ending up as in https://xkcd.com/927/, but long-term, it might be something we can aim for
Short-term, we're more focused on concrete use cases - we ain't gonna have any hope of standardising anything if we can't first get adoption

@MarcSkovMadsen
Copy link
Author

MarcSkovMadsen commented Nov 9, 2024

Thanks for all the help. I managed to get a first version of support for Narwhals in panel-extensions/panel-graphic-walker#22. I hope this will inspire for support in param, Panel, hvPlot, HoloViews and Bokeh. At least the concept is in that PR.

@MarcoGorelli MarcoGorelli added the enhancement New feature or request label Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants