Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Beeswarm plot #61

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
9 changes: 8 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ This package contains many statistical recipes for concepts and types introduced
- histogram/histogram2d
- boxplot
- violin
- beeswarm
- marginalhist
- corrplot/cornerplot

Expand All @@ -25,18 +26,24 @@ using StatPlots
gr(size=(400,300))
```

The `DataFrames` support allows passing `DataFrame` columns as symbols. Operations on DataFrame column can be specified using quoted expressions, e.g.
The `DataFrames` support allows passing `DataFrame` columns as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for also taking the time to clean up files etc. But I'd like to keep changes like this separate from changes that add new functionality - can you cherry-pick this and the other changes (deleted/insert lines) to a separate PR and keep this PR on the beeswarm recipe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit of the readme has been changed, so this can be scrapped when you rebase.

symbols. Operations on `DataFrame` column can be specified using quoted
expressions, e.g.

```julia
using DataFrames
df = DataFrame(a = 1:10, b = 10*rand(10), c = 10 * rand(10))
plot(df, :a, [:b :c])
scatter(df, :a, :b, markersize = :(4 * log(:c + 0.1)))
```

If you find an operation not supported by DataFrames, please open an issue. An alternative approach to the `StatPlots` syntax is to use the [DataFramesMeta](https://github.com/JuliaStats/DataFramesMeta.jl) macro `@with`. Symbols not referring to DataFrame columns must be escaped by `^()` e.g.

```julia
using DataFramesMeta
@with(df, plot(:a, [:b :c], colour = ^([:red :blue])))
```

---

## marginalhist with DataFrames
Expand Down
4 changes: 1 addition & 3 deletions src/StatPlots.jl
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

module StatPlots

using Reexport
Expand All @@ -24,12 +23,11 @@ include("cornerplot.jl")
include("distributions.jl")
include("boxplot.jl")
include("violin.jl")
include("beeswarm.jl")
include("hist.jl")
include("marginalhist.jl")
include("bar.jl")
include("shadederror.jl")
include("groupederror.jl")



end # module
50 changes: 50 additions & 0 deletions src/beeswarm.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# ---------------------------------------------------------------------------
# Beeswarm plot
@recipe function f(::Type{Val{:beeswarm}}, x, y, z; trim::Bool=false, side::Symbol=:both)
if !(side in [:both :left :right])
BeastyBlacksmith marked this conversation as resolved.
Show resolved Hide resolved
warn("side (you gave :$side) must be one of :both, :left, or :right")
side = :both
info("side set to :$side")
end
x, y = Float64[], Float64[]
glabels = sort(collect(unique(x)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the code here is identical across beeswarm, violin and boxplot - I haven't checked carefully, but would it not be possible to extract a general function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure yet -- I'll try to get the beeswarm function working, then see if we can extract some things.

bw = d[:bar_width]
bw == nothing && (bw = 0.8)
for (i,glabel) in enumerate(glabels)

# We get the values for this label
lab_y = y[filter(i -> _cycle(x,i) == glabel, 1:length(y))]
lab_x = zeros(lab_y)

# Then we apply Sturge's rule to get the number of bins
BeastyBlacksmith marked this conversation as resolved.
Show resolved Hide resolved
n = convert(Int64, ceil(1+log2(length(lab_y))))

# Get the widths and the coordinates
widths, centers = violin_coords(lab_y, trim=trim, n=n)
isempty(widths) && continue

# normalize
hw = 0.5_cycle(bw, i)
widths = hw * widths / Plots.ignorenan_maximum(widths)

# make the violin
xcenter = Plots.discrete_value!(d[:subplot][:xaxis], glabel)[1]
if (side==:right)
xcoords = vcat(widths, zeros(length(widths))) + xcenter
elseif (side==:left)
xcoords = vcat(zeros(length(widths)), -reverse(widths)) + xcenter
else
xcoords = vcat(widths, -reverse(widths)) + xcenter
end
ycoords = vcat(centers, reverse(centers))

push!(xsegs, xcoords)
push!(ysegs, ycoords)
end

seriestype := :scatter
x := xsegs.pts
y := ysegs.pts
()
end
Plots.@deps beeswarm scatter
6 changes: 3 additions & 3 deletions src/hist.jl
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,9 @@ Plots.@deps density path
# ---------------------------------------------------------------------------
# cumulative density

@recipe function f(::Type{Val{:cdensity}}, x, y, z; trim=false,
npoints = 200)
BeastyBlacksmith marked this conversation as resolved.
Show resolved Hide resolved
newx, newy = violin_coords(y, trim=trim)
@recipe function f(::Type{Val{:cdensity}}, x, y, z; trim::Bool=false,
n::Int64=200)
newx, newy = violin_coords(y, trim=trim, n=n)

if Plots.isvertical(d)
newx, newy = newy, newx
Expand Down
24 changes: 19 additions & 5 deletions src/violin.jl
Original file line number Diff line number Diff line change
@@ -1,11 +1,22 @@

# ---------------------------------------------------------------------------
# Violin Plot
BeastyBlacksmith marked this conversation as resolved.
Show resolved Hide resolved
# Violin plot utility functions

const _violin_warned = [false]

function violin_coords(y; trim::Bool=false)
kd = KernelDensity.kde(y, npoints = 200)
"""
**Use kde to return an enveloppe for the violin and beeswarm plots**
BeastyBlacksmith marked this conversation as resolved.
Show resolved Hide resolved

~~~
BeastyBlacksmith marked this conversation as resolved.
Show resolved Hide resolved
violin_coords(y; trim::Bool=false, n::Int64=200)
~~~

- `y`: points to estimate the distribution from
- `trim`: whether to remove the extreme values
- `n`: number of points to use in kde (defaults to 200)

"""
function violin_coords(y; trim::Bool=false, n::Int64=200)
kd = KernelDensity.kde(y, npoints = n)
if trim
xmin, xmax = Plots.ignorenan_extrema(y)
inside = Bool[ xmin <= x <= xmax for x in kd.x]
Expand All @@ -15,7 +26,9 @@ function violin_coords(y; trim::Bool=false)
end


@recipe function f(::Type{Val{:violin}}, x, y, z; trim=true, side=:both)
# ---------------------------------------------------------------------------
# Violin plot recipe
@recipe function f(::Type{Val{:violin}}, x, y, z; trim=false, side=:both)
xsegs, ysegs = Segments(), Segments()
glabels = sort(collect(unique(x)))
bw = d[:bar_width]
Expand Down Expand Up @@ -49,3 +62,4 @@ end
()
end
Plots.@deps violin shape