Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Density geometry revamp #1157

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open

[WIP] Density geometry revamp #1157

wants to merge 14 commits into from

Conversation

tlnagy
Copy link
Member

@tlnagy tlnagy commented May 28, 2018

Fixes #1152

Contributor checklist:

  • I've updated the documentation to reflect these changes
  • I've added an entry to NEWS.md
  • I've added and/or updated the unit tests
  • I've run the regression tests
  • I've squash'ed or fixup'ed junk commits with git-rebase
  • I've built the docs and confirmed these changes don't cause new errors

Implemented changes

  • This PR introduces a bunch of new features for Geom.density: custom kernels, custom scaling of densities, stacking, trimmed distributions, adjustments of the computed bandwidth, vertical and horizontal orientations
  • Geom.violin also gets all these features plus the ability to create split violins and group by color (EDIT: this is bigger than this PR) as well
  • Geom.violin and Geom.density now share the same DensityStatistic. There is still some ugliness here because I infer whether the user wants a density or violin plot based on what variables are initialized in the aesthetic. I'm not wedded to this approach. EDIT: I now rely on a manual setting on orientation
  • This PR also introduces a general way for grouping by color and by category. I intend to leverage it for Geom.boxplot and maybe even Geom.bar.

Here's a little taste of what's going to be possible soon:

using Distributions, DataFrames, Gadfly
srand(123);
data = vcat(rand(Beta(2,5), 10), 1-rand(Beta(2,5), 10), 0.25+rand(Beta(2,5), 10));
types = repeat([:one, :two], outer=15);
group = vcat(fill(1, 10), fill(0.5, 10), fill(2, 10)); df = DataFrame(data=data, group=group); p = plot(df, x=:group, y=:data, color=types, Geom.violin(split=true, trim=false))

image

TODO

  • Rebuild render() for Geom.density
  • Flesh out edge cases for Geom.violin render()
  • Add support for conditional density estimates

@tlnagy
Copy link
Member Author

tlnagy commented Jun 1, 2018

We now have the ability to generate conditional density distributions!

using RDatasets, Gadfly
df = dataset("ggplot2", "diamonds")
plot(df, x=:Carat, color=:Cut, Geom.density(scale=:count, position=:fill))

image

@Mattriks and @bjarthur do you all know how I can adjust the x and y extents from an apply_statistics function? I would like to trim off the excess space in the above plot somehow.

@bjarthur
Copy link
Member

bjarthur commented Jun 2, 2018

#781 is related

tlnagy added 7 commits June 3, 2018 13:54
working on #1152.

Note: This is a WIP and currently completely breaks `Geom.density` and
`Geom.violin` has several regressions.
this is necessary for allowing user control over ordering in stacked
density plots
My original implementation was too clever in that it figured out the
orientation of density and violin plots automatically. The logic ended
up being quite convoluted and so I switched back to using the standard
`orientation` logic and an explicit flag for whether a density plot is a
violin or not.

This commit also adds the ability to stack either raw densities and to
create conditional density distributions.
@tlnagy tlnagy force-pushed the tn/density-revamp branch from fe4a3a1 to 084bb29 Compare June 3, 2018 20:58
tlnagy added 2 commits June 3, 2018 15:07
There are no set of defaults that apply to both density and violin
geometries so it's better if they each have their respective defaults
Simple KDEs should now be working
@tlnagy
Copy link
Member Author

tlnagy commented Jun 3, 2018

There's a bug in KernelDensity.jl that will sometimes cause this PR to fail. It would probably be best if my fix (JuliaStats/KernelDensity.jl#52) is merged and tagged prior to merging this PR.

tlnagy added 5 commits June 3, 2018 16:29
- adds support for horizontal violins
- removes manual control over splitting (temp until position code is
rewritten)
[ci skip]
Estimate the density of `x` at `n` points, and put the result in `x` and `y`.
Smoothing is controlled by `bandwidth`. Used by [`Geom.density`](@ref Gadfly.Geom.density).
"""
const density = DensityStatistic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be consistent with the rest of the API, i'd suggest keeping const density = DensityStatistic and moving the docstrings for struct DensityStatistic to the retainedStat.density.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

A general statistic for density plots (e.g. KDE plots and violin plots).
See [`Geom.density`](@ref Gadfly.Geom.density) or [`Geom.violin`](@ref
Gadfly.Geom.violin) for more details.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in #1116 i purposely tried to teach new users the GoG way of thinking by making the Geom docstrings explicitly defer explanation of details to the corresponding Stat docstrings for "derived" geometries, not the other way around as you've done here. i feel this is important as it really is a different way of doing things that many find hard to grasp. does that make sense?

also, it would be good to be explicit about which aesthetics this statistic transforms, again so that the flow of data into making a graph is transparent. so in this case x is transformed into x and y and is grouped by color. in #1116 i defined aes2str, to which one can input the output of {input,output}_aesthetics(), to help ensure that docs don't get out of sync with code. might not be useful in this case because of the grouping aesthetic in :color.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I'll correct this.


```@example
using Gadfly, RDatasets
plot(dataset("ggplot2", "diamonds"), x=:Carat, color=:Cut, Geom.density(position=:fill), Guide.title("Conditional density estimate"), Coord.cartesian(ymax=1.0, xmax=5))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this long line will likely create a horizontal slider in the generated doc html. a hard line break would be nice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I had already corrected this locally.

Geom.violin[(; bandwidth, adjust, kernel, trim, order)]

Draws a violin plot which is a combination of [`Geom.density`](@ref) and
[`Geom.boxplot`](@ref). This plot type is useful for comparing differences in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't see a call to Geom.boxplot in the Geom.violin code. is this docstring correct in this regard?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant it stylistically, not technically, but that could change (see #1157 (comment))

nothing # hide
```
![](diamonds_violin1.svg)
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no other Geom docstrings currently have examples. i'd suggest moving this to the gallery.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be nice to change, but it depends on JuliaDocs/Documenter.jl#736 anyway so I'll remove this for now.

"""
Geom.density(; bandwidth, adjust, kernel, trim, scale, position, orientation, order)

Draws a kernel density estimate. This is a cousin of [`Geom.histogram`](@ref)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice to document how Geom.density(Stat.identity) behaved. in other words, what aesthetics does Geom.density directly use.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll add this.

"""
const density = DensityGeometry

element_aesthetics(::DensityGeometry) = Symbol[]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

element_aesthetics should contain :x, :y, and :color, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I don't leave this blank, they are filled with autogenerated values so it's impossible to give useful error messages using Gadfly.assert_aesthetics_defined. I wasn't sure how to get around this so I leave this blank and figure out errors later: https://github.com/GiovineItalia/Gadfly.jl/pull/1157/files#diff-9ec506bf78232ae17d082c22c2e66449R616

@bjarthur
Copy link
Member

bjarthur commented Jun 5, 2018

looking forward to having this functionality! in general it looks great.

my only big question is whether separate Geoms are necessary (density and violin) with their own typedefs and render functions, or whether Stat.density could be combined with Geom.line, and Geom.density and Geom.violin could simply be aliases.

sorry for all the hassle resolving the conflicts with #1116.

@bjarthur
Copy link
Member

bjarthur commented Jun 5, 2018

also, does this break anything? is it backwards compatible with the former Geom.density and Geom.violin? if not, depwarns are warranted. and we should think about whether to merge it before or after fixing #1130

@tlnagy
Copy link
Member Author

tlnagy commented Jun 7, 2018

Thanks for the review. I won't have time to fix this for a bit and I want to make sure we get it right before merging.

also, does this break anything? is it backwards compatible with the former Geom.density and Geom.violin?

AFAIK, there shouldn't be any regressions or changes to the API. There are however some changes to defaults that I think are better than the current ones. The primary change is that density plots by default now clip to the range of the data (this the same as ggplot2) since it doesn't make sense to predict outside that range.

my only big question is whether separate Geoms are necessary (density and violin) with their own typedefs and render functions

This might be achievable. The primary difficulty is that these two geoms are grouped in very different ways.

  • Density plots are generally overlaid or stacked on top of each other (varying color), only x and y aesthetics are used.
  • Violin plots are generally dodged (or split) according to group and color, x, y, and width aesthetics are used.

They are both essentially Compose.polygons, but the offsets and spacing are different enough that gets a little hairy. If anything Geom.boxplot and Geom.violin are probably the most likely to share code.

I think in general we should think about separating out the positioning code into structs because a lot of it is boilerplate. The same positioning code should work for bars, boxplots, violins, etc. This is what ggplot2 does. See http://ggplot2.tidyverse.org/reference/index.html#section-layer-position-adjustment for details

@bjarthur
Copy link
Member

do you want to merge this before or after we support 0.7? there will likely be merge conflicts if we wait.

@tlnagy
Copy link
Member Author

tlnagy commented Jul 1, 2018

I don't have time right now to revamp this PR so I'm okay with letting it get a bit stale and then finalize it a bit later. There are enough small edge cases that I need to handle that it's better to sit on this till it's ready.

@wizofe
Copy link

wizofe commented Feb 6, 2022

Split violin plot is a very useful function and it seems it's still not part of Gadfly. Any chance of possibility to revamp this PR? Let me know if I could help somehow.

@bjarthur
Copy link
Member

bjarthur commented Feb 6, 2022

agreed! @tlnagy ??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Density geometries revamp
3 participants