Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add and subtract kerneldensities #93

Open
mkborregaard opened this issue Jan 29, 2021 · 3 comments
Open

Add and subtract kerneldensities #93

mkborregaard opened this issue Jan 29, 2021 · 3 comments

Comments

@mkborregaard
Copy link

Would it make sense to be able to do arithmetic on the kde objects, e.g. to add and subtract them, for comparing distributions?

@tpapp
Copy link
Collaborator

tpapp commented Jan 29, 2021

I am not quite sure it would make sense directly (with + and -). I am assuming that the underlying operation op you are thinking of is something like

kde(vcat(x, y)) ≈ op(kde(x), kde(y))

but even if we ignore technical complications (bandwidth selection etc) information is lost with the lengths of x and y (ie weights would be needed).

(Sorry if I misunderstood the question.)

@mkborregaard
Copy link
Author

Not quite sure - to be exact this is what I wanted (I'll post my code, as it's essentially an MWE):
I have the density of some variables (individual species plotted in a two-dimensional trait space after ordination) in two situations - before and after human disturbance of the ecosystem.

using CSV, Plots, DataFrames, StatsPlots, KernelDensity

after = CSV.read("Pcoa_AFTER_27jan21.txt", DataFrame)
before = CSV.read("Pcoa_BEFORE_27jan21.txt", DataFrame)

function pplot(obj)
    x, y = obj[!,"Axis.1"], obj[!,"Axis.2"]
    kd = kde((x,y))
    p = plot(kd, st = :contourf, color = :viridis, lc = :white, lw = 0.2)
    scatter!(p, x, y, msw = 0, color = :lightgrey, ms = 2, label = "")
    p
end

plot(
    pplot(before),
    pplot(after),
    size = (1000, 400), clim = (0, 60)
)

kdes
What I'm interested in here is the difference surface between the two - what areas have denser distributions and what have less dense? So the result is not really a kde (as it doesn't sum to 1). Here's an example from ggplot2 of + instead of -.
https://files.slack.com/files-pri/T68168MUP-F01KW2657PH/image.png

I'm not sure it makes 100% sense statistically, sorry - there was some discussion on Slack https://julialang.slack.com/archives/C6821M4KE/p1611916388016600

@mkborregaard
Copy link
Author

But I guess the conclusion is that + and - does not make sense for this package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants