Skip to content
This repository has been archived by the owner on Jun 29, 2021. It is now read-only.

[WIP] Add dotplot #107

Closed
wants to merge 54 commits into from
Closed

[WIP] Add dotplot #107

wants to merge 54 commits into from

Conversation

sethaxen
Copy link
Member

@sethaxen sethaxen commented Mar 5, 2020

This PR will implement a dotplot a la geom_dotplot in ggplots2. It currently only implements the "dot density" (Wilkinson) plot type but will (probably) eventually use Analysis to support different binnings of dots including "histodot".

Some examples so far

julia> using Makie, StatsMakie, RDatasets;

julia> mtcars = RDatasets.dataset("datasets", "mtcars");

julia> dotplot(Data(mtcars), zero(mtcars.MPG), :MPG; stackdir = :up, orientation = :horizontal, binwidth = 1.5)

test3

julia> dotplot(Position.dodge, Data(mtcars), Group(color=:Cyl), :VS, :MPG; strokewidth=2)

test

julia> dotplot(Data(mtcars), :Cyl, :MPG; stackdir = :centerwhole)

test2

Some to-dos/questions I'd appreciate input on:

  • Support passing only a single argument (see first example above), also relevant for violin and boxplot (relates BoundsError with violin and boxplot using Data #60)
  • Figure out how to get plot area in pixels (works fine in MakieLayout but dots partially overlap in Makie probably because we get whole scene in pixels)
  • Be sure changing the stroke width doesn't change the flushness or radius of the markers
  • Make sure zooming works correctly (markersize should scale so dots still stack)
  • Figure out why MakieLayouts initial layout is bad (upon resize, it gets better)
  • Figure out why on MakieLayout when width gets low, the dots fly off the plot
  • Figure out how to detect when only a single category (x-position) is being plotted (in this case the units of the x-axis depend on themselves, so we need to break the dependency or everything rushes to 0)
  • How to change theme defaults based on type (e.g. if a single x-position, default to horizontal and non-centered)
  • Refactor to only handle binned dots in dotplot and provide Analysis types dotdensity and histodot for doing the binning
    • Make not providing an Analysis default to dotdensity mode somehow
  • Add histodot method
  • Support smoothed bins (from the original paper)
  • Support undirected binning (from this paper)
  • Support stacking by groups
  • Support binning all groups together
  • Add quantiles Analysis type for quantile dotplots. It just computes the ECDF on the data, which it uses to uniformly sample dots on the quantiles, which can then be passed to dotplot.
  • Work out better default bin width. Currently, it uses ggplot2's default of a maximum of 30 bins over the data range. Ideally though, a heuristic would find a number of bins that approximately fits the dot heights within the width in data units, so they take up roughly the same space as e.g. a violin plot.

@codecov-io
Copy link

codecov-io commented Mar 5, 2020

Codecov Report

Merging #107 into master will decrease coverage by 30.82%.
The diff coverage is 4.01%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #107       +/-   ##
===========================================
- Coverage    76.8%   45.98%   -30.83%     
===========================================
  Files          16       21        +5     
  Lines         444      759      +315     
===========================================
+ Hits          341      349        +8     
- Misses        103      410      +307
Impacted Files Coverage Δ
src/typerecipes/histogram.jl 72.22% <ø> (+2.22%) ⬆️
src/StatsMakie.jl 100% <ø> (ø) ⬆️
src/typerecipes/tallies.jl 0% <0%> (ø)
src/recipes/stacks.jl 0% <0%> (ø)
src/recipes/dotplot.jl 0% <0%> (ø)
src/typerecipes/quantiles.jl 0% <0%> (ø)
src/recipes/boxplot.jl 83.33% <100%> (+3.62%) ⬆️
src/group/dodge.jl 94.91% <100%> (-0.09%) ⬇️
src/recipes/conversions.jl 92.3% <100%> (+1.39%) ⬆️
src/utils.jl 32% <32%> (ø)
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b0d52ea...8517b88. Read the comment docs.

@sethaxen
Copy link
Member Author

sethaxen commented Mar 8, 2020

@piever or someone else, perhaps you can advise on a good way to go about making a stacked plot like the one below from ggplot2:

ggplot(mtcars, aes(x = mpg, fill = factor(cyl))) +
  geom_dotplot(stackgroups = TRUE, binwidth = 1, binpositions = "all")

geom_dotplot-15 (1)

In this case, the grouping is done by the categorical variable cyl. However, instead of producing multiple series plotted separately, the grouping is handled internally. Currently, we can get close by using Style instead of Group (ignore the difference in color distribution for now):

julia> using Makie, StatsMakie, RDatasets
julia> dotplot(Data(mtcars), Style(color=:Cyl), :MPG; orientation = :horizontal, stackdir = :up)

test

But this is a hack since Style is meant for continuous variables and Group for discrete.
Moreover the intention for Group seems to be that it splits the data into multiple series that are independently plotted, and Position.stack handles these individual series. In this case, they would need to be plotted together.

Which of these three syntaxes is most consistent with StatsMakie's grammar?

julia> dotplot(Position.stack, Data(mtcars), Group(color=:Cyl), :MPG; orientation = :horizontal, stackdir = :up)
julia> dotplot(Data(mtcars), Group(color=:Cyl, stack=:Cyl), :MPG; orientation = :horizontal, stackdir = :up) # redundant
julia> dotplot(Data(mtcars), Group(color=:Cyl), :MPG; orientation = :horizontal, stackdir = :up, stackgroups = true) # similar to ggplot2

Another complication is that we have different stages where we might want to bin (via a parameter like binby):

  1. all: bins are determined before groups are considered, and then counts are divided among groups.
  2. group: bins are determined for each group and the counts are divided among x positions
  3. position?: Bins are determined at each x position, considering all groups (this or all is needed for stacking by groups)
  4. automatic: Bins are determined separately for each x position and group

2 and 4 can happen within the recipe, but I don't know how to do 1. and 3. yet, since they'll need to happen outside the recipe.

@mkborregaard
Copy link
Member

I've assigned this PR to the Vizcon project so we can discuss it there

@sethaxen
Copy link
Member Author

sethaxen commented Mar 9, 2020

Awesome!

src/StatsMakie.jl Show resolved Hide resolved
src/recipes/dotplot.jl Outdated Show resolved Hide resolved
@piever
Copy link
Member

piever commented Mar 11, 2020

This seems very similar to the histogram recipe, so I guess for now we may want to be consistent with the dotplot(Position.stack, ...) syntax. I see the appeal of say Group(stack = :Cyil), but that can be changed uniformly throughout StatsMakie.

On the technical side, I think you are bringing up a bit of a tricky issue, as the current infrastructure with Group sends each "trace" separately to the analysis and to the plotting. For the first issue, see adjust_gloablly here, which you can use to compute some keyword arguments of the analysis using all of the data. It is called here (I've just done a round of clean-up, so the source code on latest master could be clearer than before).

What I'm wondering is whether the "grouping framework" should maybe also put the data back together, with the correct attributes called column-wise. In that case you would do something like dotplot(Group(dodge = :Cyl), ...), then this would go to a call equivalent to dotplot(x, y, dodge = rank(mtcars.Cyl)). Or maybe we could create a data structure where both things exist (some lazy concatenation of vectors). I'm still making up my mind here, I should probably start experimenting on a separate smaller package (this machinery is plot agnostic and can live on its own).

Alternatively, note that you could always overload this for dotplot rather than an average PlotFunc to have it do what you want for your specific use-case.

@sethaxen
Copy link
Member Author

I'm working on a refactor. A "dot plot" has two main parts. The first is a set of algorithms for creating what I'm calling Tallies. Tallies are like histograms but have the constraint that each bin has counts (not densities), and edges of different bins have no relationship to each other. e.g. bins can overlap. I'm thinking a tally would also contain a geometric object that defines its "limits", defaulting to a Point for categorical. The "histodot" tallying method just makes a histogram, while the "dotdensity" tallying method makes a series of tallies. These would fall into the category of analyses.

The second part is a Stacks plot. Stacks turns base points and counts into stacks of just-touching markers using any provided marker. Its main benefit is in 2D, where it constraints the aspect ratio of the marker (this already seems to be the case in 3D). Stacks makes it easy to build other representations of discrete or discretized distributions, such as waffle plots, and it also lets us build 3D dot plots (from this paper). It might make sense to move it to AbstractPlotting.

With a default conversion of Tallies that selects an appropriate default marker based on the geometric object stored in Tallies, we can then do

plot(tally, x) # Make 2D dot plot with :dotdensity method and stacks
plot(tally(method = :histodot), x) # make :histodot plot with stacks
stacks(histogram, x) # same as above
barplot(tally, x) # plot tally similarly to histogram

Finally, to get the convenient centering, stacking, and dodging behavior of the familiar dot plot, a dotplot function would internally call these.

@sethaxen
Copy link
Member Author

sethaxen commented Jan 5, 2021

This got way too general when I tried to support 3D dot plots, it's now stale, and I don't intend to revisit it any time soon, so I'm closing.

@sethaxen sethaxen closed this Jan 5, 2021
@SimonDanisch
Copy link
Member

Ah pitty! I hope we can revive the code later! :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants