Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable plotting backends #20885

Open
MarcoGorelli opened this issue Jan 24, 2025 · 9 comments
Open

Configurable plotting backends #20885

MarcoGorelli opened this issue Jan 24, 2025 · 9 comments
Labels
A-api Area: changes to the public API python Related to Python Polars

Comments

@MarcoGorelli
Copy link
Collaborator

MarcoGorelli commented Jan 24, 2025

Polars has a plot namespace which allows to conveniently create simple plots: https://docs.pola.rs/api/python/dev/reference/dataframe/plot.html. It currently uses Altair

Several users have said that for some use cases they prefer other libraries. Plotly in particular comes up quite a lot. The good news is that when Plotly v6.0.0 comes out, it will have native support for Polars, meaning that plotly.express will be able to plot Polars dataframes without converting to other dataframe libraries and without other dataframe libraries being required. The perf increase is often ~3x, but even >10x for some plots, especially those which involve grouping over multiple dimensions, see https://plotly.com/blog/chart-smarter-not-harder-universal-dataframe-support/ for some results.

It could be nice there for to make .plot configurable, so that users can do:

df.plot(backend='plotly').line(x='a', y='b')
df.plot(backend='altair').line(x='a', y='b')
df.plot.line(x='a', y='b')  # defaults to Altair
pl.Config.set_plotting_backend('plotly')
df.plot.line(x='a', y='b')  # now, it uses Plotly (only supported for plotly v6.0.0+)

A few principles to abide by, to make sure this doesn't get out of hand:

  • This shouldn't do anything clever, and should just be an entrypoint to libraries which specialise in plotting
  • the arguments in polars.DataFrame.plot.line should be some simple dimensions which all plotting backends can respect (e.g. x, y, color, ...). Anything else should be backend-specific
  • configuration is left to the plotting backends. In Altair this can often be done with .properties or .configure_*, in Plotly there's various methods update_* methods. Links to their respective docs can be provided, but Polars should make no attempt to standardise on these
  • Any library which is added as a plotting backend should support Polars directly, rather than "if we receive a dataframe which isn't pandas then we convert to pandas and do all transformations in pandas". In 2025 at least, I think it's OK to set that as the minimum bar 😄

I don't have a tonne of time now unfortunately, but if anyone wanted to implemented this I'd make it a priority to review it. Else, I will get to it, just not in the immediate future

@MarcoGorelli MarcoGorelli added the A-api Area: changes to the public API label Jan 24, 2025
@nameexhaustion nameexhaustion added the python Related to Python Polars label Jan 24, 2025
@deanm0000
Copy link
Collaborator

I think static typing will be difficult to impossible unless instead of all of them using plot, they each get their own like df.px.line or df.alt.line it doesn't matter if you just want the graph immediately but if you ever need to chain some library specific method then you need the static typer to know.

@deanm0000
Copy link
Collaborator

I guess you'd have to put the backend argument in the final method rather than the namespace then you could use overloads but that is suboptimal since you're just going to want to set it in config and not type it out everywhere

@MarcoGorelli
Copy link
Collaborator Author

🤔 that's a good point, typing the return type may be problematic in cases where the backend is set just in pl.Config (as opposed to as an argument in .plot)

they each get their own like df.px.line or df.alt.line

🤔 not sure about putting all these on polars.DataFrame, but I guess df.plot.px.line could work, if it's not deemed too long?

@etrotta
Copy link
Contributor

etrotta commented Jan 24, 2025

I think static typing will be difficult to impossible unless instead of all of them using plot, they each get their own like df.px.line or df.alt.line it doesn't matter if you just want the graph immediately but if you ever need to chain some library specific method then you need the static typer to know.

You could add @overloads for each backend, setting a string literal for that parameter and maybe even use a TypedDict for kwargs relevant to that backend, although that'll require specifying it for each plot() call instead of relying on the global default (...which feels reasonable if you want to enforce typing in first place).

Example to demonstrate how typing could work for the interface

import typing
import plotly.express as px
import matplotlib.pyplot as plt
from matplotlib.figure import Figure as MatplotlibFigure
from plotly.graph_objects import Figure as PlotlyFigure

class MatplotlibArguments(typing.TypedDict, total=False):
    alpha: float

class PlotlyArguments(typing.TypedDict, total=False):
    log_x: bool
    log_y: bool

@typing.overload
def plot(backend: typing.Literal["matplotlib"], x: list[int], y: list[int], **kwargs: typing.Unpack[MatplotlibArguments]) -> MatplotlibFigure:
    ...

@typing.overload
def plot(backend: typing.Literal["plotly"], x: list[int], y: list[int], **kwargs: typing.Unpack[PlotlyArguments]) -> PlotlyFigure:
    ...

def plot(backend: str, x: list[int], y: list[int], **kwargs: typing.Any) -> PlotlyFigure | MatplotlibFigure:
    if backend == "matplotlib":
        fig = plt.figure()
        plt.scatter(x, y, **kwargs, ax=fig.axes)
        return fig
    elif backend == "plotly":
        fig = px.scatter(x=x, y=y, **kwargs)
        return fig
    else:
        raise ValueError()

plot("matplotlib", x=[1, 2], y=[3, 4], alpha=0.5)
plot("plotly", x=[1, 2], y=[30, 400], log_y=True)

@MarcoGorelli
Copy link
Collaborator Author

thanks @etrotta

although that'll require specifying it for each plot() call instead of relying on the global default (...which feels reasonable if you want to enforce typing in first place).

Not totally sure about this, it would slightly detract from ergonomics if people have to do df.plot.px.line each time. Though arguably one compromise could be:

  • df.plot.line is typed to return Any - this is shorter to type, and maybe more useful for EDA
  • df.plot.px.line is typed, and useful for IDE work where people are more likely to tolerate 3 extra key strokes in exchange for better typing

@deanm0000
Copy link
Collaborator

As long as you're not chaining extra methods then df.plot.line->Any should be fine. It's only if you're doing something like

df.plot.line(...).encode(...) that the typing is important since that's specific to altair.

I really don't like typing the backend as a string for the overload, it just feels weird and is more typing than df.plot.px.line

It could be that df.plot.line returns Any but if you need the static typing (for any reason but in particular chaining methods) you could also do df.plot.XX.line where XX is shorthand for the backend.

@kszlim
Copy link
Contributor

kszlim commented Jan 24, 2025

A potentially controversial take, but maybe exposing the plotting backends at a top level namespace could be fine.

df.altair.line(...).encode(...)
df.px.line(...)

@deanm0000
Copy link
Collaborator

A potentially controversial take, but maybe exposing the plotting backends at a top level namespace could be fine.

df.altair.line(...).encode(...)
df.px.line(...)

that would be my preference too but I can see how it'd get too cluttered between plotly, altair, matplotlib, etc and so I assume that's a non-starter.

@deanm0000
Copy link
Collaborator

This is more a brainstorm than a commitment to follow through but I did this #20904

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-api Area: changes to the public API python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

5 participants