Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Beeswarm plot #61

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
9 changes: 8 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ This package contains many statistical recipes for concepts and types introduced
- histogram/histogram2d
- boxplot
- violin
- beeswarm
- marginalhist
- corrplot/cornerplot

Expand All @@ -25,18 +26,24 @@ using StatPlots
gr(size=(400,300))
```

The `DataFrames` support allows passing `DataFrame` columns as symbols. Operations on DataFrame column can be specified using quoted expressions, e.g.
The `DataFrames` support allows passing `DataFrame` columns as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for also taking the time to clean up files etc. But I'd like to keep changes like this separate from changes that add new functionality - can you cherry-pick this and the other changes (deleted/insert lines) to a separate PR and keep this PR on the beeswarm recipe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit of the readme has been changed, so this can be scrapped when you rebase.

symbols. Operations on `DataFrame` column can be specified using quoted
expressions, e.g.

```julia
using DataFrames
df = DataFrame(a = 1:10, b = 10*rand(10), c = 10 * rand(10))
plot(df, :a, [:b :c])
scatter(df, :a, :b, markersize = :(4 * log(:c + 0.1)))
```

If you find an operation not supported by DataFrames, please open an issue. An alternative approach to the `StatPlots` syntax is to use the [DataFramesMeta](https://github.com/JuliaStats/DataFramesMeta.jl) macro `@with`. Symbols not referring to DataFrame columns must be escaped by `^()` e.g.

```julia
using DataFramesMeta
@with(df, plot(:a, [:b :c], colour = ^([:red :blue])))
```

---

## marginalhist with DataFrames
Expand Down
4 changes: 1 addition & 3 deletions src/StatPlots.jl
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

module StatPlots

using Reexport
Expand All @@ -24,12 +23,11 @@ include("cornerplot.jl")
include("distributions.jl")
include("boxplot.jl")
include("violin.jl")
include("beeswarm.jl")
include("hist.jl")
include("marginalhist.jl")
include("bar.jl")
include("shadederror.jl")
include("groupederror.jl")



end # module
69 changes: 69 additions & 0 deletions src/beeswarm.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
@shorthands beeswarm

# ---------------------------------------------------------------------------
# Beeswarm plot
@recipe function f(::Type{Val{:beeswarm}}, x, y, z; trim=false, side=:both)

side = check_side(side)

xp, yp = Float64[], Float64[]
glabels = sort(collect(unique(x)))
bw = d[:bar_width]
bw == nothing && (bw = 0.8)

for (i,glabel) in enumerate(glabels)
# We get the values for this label
lab_y = y[filter(i -> _cycle(x,i) == glabel, 1:length(y))]
lab_x = zeros(lab_y)

# Number of bins (defaults to sturges)
binning_mode = d[:bins]
if binning_mode == :auto
binning_mode = :sturges
end
n = Plots._auto_binning_nbins(tuple(lab_y), 1, mode=binning_mode)

# Get the widths and the coordinates
widths, centers = StatPlots.violin_coords(lab_y, trim=trim, n=n)
isempty(widths) && continue

# normalize
hw = 0.5Plots._cycle(bw, i)
widths = hw * widths / Plots.ignorenan_maximum(widths)

# make the violin
xcenter = Plots.discrete_value!(d[:subplot][:xaxis], glabel)[1]

for i in 2:length(centers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using i here is what throws your width calculations - i is also the index in the outer loop. Change to j.

inside = Bool[centers[i-1] < u <= centers[i] for u in lab_y]
if sum(inside) > 1
Copy link
Member

@mkborregaard mkborregaard Sep 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need

      if sum(inside) == 1
        lab_x[inside] .+= xcenter
      elseif sum(inside) > 1

for when there's only a single point.

if (side==:right)
start = 0.0
stop = widths[i]
elseif (side==:left)
start = -widths[i]
stop = 0.0
elseif (side == :both)
start = -widths[i]
stop = widths[i]
end
lab_x[inside] = lab_x[inside] .+ linspace(start, stop, sum(inside)) .+ xcenter
end
end

append!(xp, lab_x)
append!(yp, lab_y)

end

x := xp
y := yp
seriestype := :scatter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, add

  if get!(d, :markershape, :circle) == :none
          d[:markershape] = :circle
  end

Copy link
Member

@mkborregaard mkborregaard Jul 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cf JuliaPlots/Plots.jl#989 , a PR to fix this behaviour in Plots so this code won't be necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was merged so disregard this comment.

if get!(d, :markershape, :circle) == :none
d[:markershape] = :circle
end
()

end

Plots.@deps beeswarm scatter
5 changes: 2 additions & 3 deletions src/hist.jl
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,8 @@ Plots.@deps density path
# ---------------------------------------------------------------------------
# cumulative density

@recipe function f(::Type{Val{:cdensity}}, x, y, z; trim=false,
npoints = 200)
BeastyBlacksmith marked this conversation as resolved.
Show resolved Hide resolved
newx, newy = violin_coords(y, trim=trim)
@recipe function f(::Type{Val{:cdensity}}, x, y, z; trim::Bool=false, npoints=200)
newx, newy = violin_coords(y, trim=trim, n=npoints)

if Plots.isvertical(d)
newx, newy = newy, newx
Expand Down
40 changes: 35 additions & 5 deletions src/violin.jl
Original file line number Diff line number Diff line change
@@ -1,11 +1,20 @@

# ---------------------------------------------------------------------------
# Violin Plot
BeastyBlacksmith marked this conversation as resolved.
Show resolved Hide resolved
# Utility functions

const _violin_warned = [false]

function violin_coords(y; trim::Bool=false)
kd = KernelDensity.kde(y, npoints = 200)
"""
**Use kde to return an envelope for the violin and beeswarm plots**

violin_coords(y; trim::Bool=false, n::Int64=200)

- `y`: points to estimate the distribution from
- `trim`: whether to remove the extreme values
- `n`: number of points to use in kde (defaults to 200)

"""
function violin_coords(y; trim::Bool=false, n::Int64=200)
kd = KernelDensity.kde(y, npoints = n)
if trim
xmin, xmax = Plots.ignorenan_extrema(y)
inside = Bool[ xmin <= x <= xmax for x in kd.x]
Expand All @@ -14,8 +23,29 @@ function violin_coords(y; trim::Bool=false)
kd.density, kd.x
end

"""
**Check that the side is correct**

check_side(side::Symbol)

`side` can be `:both`, `:left`, or `:right`. Any other value will default to
`:both`.
"""
function check_side(side::Symbol)
if !(side in [:both, :left, :right])
warn("side (you gave :$side) must be one of :both, :left, or :right")
side = :both
info("side set to :$side")
end
return side
end

# ---------------------------------------------------------------------------
# Violin plot
@recipe function f(::Type{Val{:violin}}, x, y, z; trim=false, side=:both)

side = check_side(side)

@recipe function f(::Type{Val{:violin}}, x, y, z; trim=true, side=:both)
xsegs, ysegs = Segments(), Segments()
glabels = sort(collect(unique(x)))
bw = d[:bar_width]
Expand Down