Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantile bars #1521

Merged
merged 12 commits into from
Mar 16, 2021
Merged

Quantile bars #1521

merged 12 commits into from
Mar 16, 2021

Conversation

rikhuijzer
Copy link
Contributor

@rikhuijzer rikhuijzer commented Feb 27, 2021

This adds a new statistic to Gadfly, namely for showing quantiles for distributions. In the docs, this currently looks as follows:

Stat.quantile_bars

using DataFrames, Gadfly, Distributions
set_default_plot_size(14cm, 8cm)
n = 400
group = [i ≤ n/2 ? -1 : 1 for i in 1:n]
x = randn(n) .+ group

plot(x=x, color=string.(group), Geom.density, Guide.colorkey(title="", pos=[3.5,0.7]),
    Guide.title("Density with bars showing the central 90% CI"),
    Guide.ylabel("Density"), Coord.cartesian(xmin=-4, xmax=4),
    layer(Stat.density, Geom.polygon(fill=true, preserve_order=true), Theme(alphas=[0.4])),
    layer(Stat.quantile_bars(quantiles=[0.05, 0.95], bar_width=0.05), Geom.bar)
)

image

Contributor checklist:

  • I've updated the documentation to reflect these changes
  • I've added an entry to NEWS.md
  • I've added and/or updated the unit tests
  • I've run the regression tests
  • I've squash'ed or fixup'ed junk commits with git-rebase
  • I've built the docs and confirmed these changes don't cause new errors

@Mattriks
Copy link
Member

Two design issues here are:

  1. For a manual bandwidth, bandwidth needs to be specified in both Geom.density and Stat.quantile_bars.
    Note that layers can have more than 1 Geom, if a Stat outputs the aesthetics for those Geoms (see the subsection on layer inputs in Layers). But in this case something like:
   layer(Stat.density(quantiles=[0.05, 0.95]), Geom.line, Geom.bar)

won't work because both Geom.line and Geom.bar want to use the y aesthetic. I don't see a way around this atm.

  1. Should Stat.quantile_bars be used with Geom.segment or Geom.bar? For Geom.bar you have to define bar_width,
    but with Geom.segment width can be changed with Theme(line_width), and in the future linewidth should be developed into a proper aesthetic.
    Making Stat.quantile_bar to be used with Geom.segment seems a better way forward.

Also (from point 1) your example can be tidied up using 2 geoms in the Stat.density layer:

n = 400
group = repeat([-1, 1], inner=200)
x = randn(n) .+ group

plot(x=x, color=string.(group), Guide.colorkey(title="", pos=[3.5,0.7]),
  layer(Stat.density, Geom.line, Geom.polygon(fill=true, preserve_order=true), alpha=[0.4]),
    layer(Stat.quantile_bars(quantiles=[0.05, 0.95], bar_width=0.05), Geom.bar)
    Guide.title("Density with bars showing the central 90% CI"),
    Guide.ylabel("Density"), Coord.cartesian(xmin=-4, xmax=4)
)

Note for polygons, alpha=[0.4] was enabled in #1511.
I'll make some other comments on the code.

@codecov-io
Copy link

codecov-io commented Mar 1, 2021

Codecov Report

Merging #1521 (12c8ada) into master (04d931f) will decrease coverage by 0.02%.
The diff coverage is 50.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1521      +/-   ##
==========================================
- Coverage   89.74%   89.72%   -0.03%     
==========================================
  Files          39       39              
  Lines        4633     4672      +39     
==========================================
+ Hits         4158     4192      +34     
- Misses        475      480       +5     
Impacted Files Coverage Δ
src/statistics.jl 95.69% <50.00%> (-0.77%) ⬇️
src/scale.jl 95.63% <0.00%> (+0.03%) ⬆️
src/varset.jl 97.87% <0.00%> (+0.04%) ⬆️
src/Gadfly.jl 76.44% <0.00%> (+0.23%) ⬆️
src/misc.jl 65.00% <0.00%> (+0.40%) ⬆️
src/theme.jl 73.61% <0.00%> (+3.74%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 04d931f...12c8ada. Read the comment docs.

@rikhuijzer
Copy link
Contributor Author

It seems that something upstream is broken. Probably, it is related to the new PooledArrays 1.2.0 release of 6 hours ago.

@rikhuijzer
Copy link
Contributor Author

Apart from that, thank you for the feedback @Mattriks! I have now implemented your comments. Let me know if you have more comments.

@rikhuijzer
Copy link
Contributor Author

rikhuijzer commented Mar 1, 2021

Thanks to a PooledArrays update by Bogumił Kamiński, the tests passed again in 0aa5feb.

@rikhuijzer
Copy link
Contributor Author

@Mattriks Is there anything I can do to move this PR forwards?

@Mattriks
Copy link
Member

Mattriks commented Mar 4, 2021

Be patient 😃

@rikhuijzer
Copy link
Contributor Author

Friendly reminder.

By the way, I understand that reviewing code takes time and is not the greatest job in the world, but it's also quite demotivating to see PRs being stalled. I have many ideas for, hopefully, good PRs for Gadfly. However, with each day of delay on the review, the new PRs become less important for me.

@Mattriks
Copy link
Member

Mattriks commented Mar 9, 2021

Sorry for the delay! Super busy with other stuff, I will attempt to get back to this PR this week. More PRs for Gadfly are very welcome! To discuss your new PRs/ideas opening a new issue is good, or message me directly on Julia discourse!

@rikhuijzer
Copy link
Contributor Author

Okay 👍 👍 Thanks for letting me know!

src/statistics.jl Show resolved Hide resolved
src/statistics.jl Outdated Show resolved Hide resolved
src/statistics.jl Outdated Show resolved Hide resolved
@Mattriks
Copy link
Member

PR looks good! You should also add a test to /test/testscripts: call it stat_quantile_bars.jl, and don't use random data.
Also run the Regression Tests.

@rikhuijzer
Copy link
Contributor Author

PR looks good! You should also add a test to /test/testscripts: call it stat_quantile_bars.jl, and don't use random data.
Also run the Regression Tests.

Thanks! I've implemented your suggestions and also ran the regression tests. Of course, there isn't something to check against, but the image looks as expected:

image

If this PR is merged, would it then also be possible to register a new version in the JuliaRegistries? That would be nice, because I then don't have to use Gadfly.jl#master in an upcoming paper of mine 😄

docs/Project.toml Show resolved Hide resolved
test/testscripts/stat_quantile_bars.jl Outdated Show resolved Hide resolved
@rikhuijzer
Copy link
Contributor Author

Good idea of the hstack, much clearer now:

image

My bad about the randomness. I copied it from the density example.

@Mattriks Mattriks merged commit 189bd36 into GiovineItalia:master Mar 16, 2021
@rikhuijzer rikhuijzer deleted the quantile-bars branch March 16, 2021 09:11
@rikhuijzer rikhuijzer mentioned this pull request Mar 19, 2021
@Mattriks Mattriks mentioned this pull request Jan 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants