-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Ratcliff Diffusion Model pdf and simulators #90
base: master
Are you sure you want to change the base?
Conversation
Could use some feedback. I might have botched the rejection sampling translation. Lastly, I'm not sure what we want to do about the CDF. I am aware of some implementations, but not sure which we might want to consider modifying using HCubature. |
Benchmark Results
Benchmark PlotsA plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR. |
Sounds good. I'll see if I can help. Let me verify whether I understand the basic approach. We want to integrate over nu via |
Here is my plan. I will start by testing |
Before digging into the minutia of the code, I wanted to investigate the feasibility of differentiating through a numeric integator. So far I have not had success. I'm not sure if the devs you talked to at JuliaCon might have some insights for making it work. Here is my example: Code
Error
|
I think the source of the problem can be traced to this line in |
yes that is correct. |
Sounds good. I have discussed this exact issue regarding hcubature with @gdalle. I will bring up the MWE and touch base tomorrow. It does indeed have to do with the interaction with AD from my understanding, and he discussed some solutions. |
I can get it to work with a version hard-coded. It is something about the type I will keep looking into. using Distributions
import Distributions: insupport
import Distributions: logpdf
import Distributions: loglikelihood
import Distributions: maximum
import Distributions: minimum
import Distributions: rand
import Distributions: pdf
using StaticArrays: SVector
using LinearAlgebra: norm
using HCubature
using ForwardDiff
using Optim
using Turing
struct GaussKronrod{T<:Real}
x::Vector{T}
w::Vector{T}
wg::Vector{T}
end
# Hardcoded Gauss-Kronrod rule for n=7 (15-point rule)
function GaussKronrod(::Type{T}) where T<:Real
x = T[
0.0,
0.2077849550078985,
0.4058451513773972,
0.5860872354676911,
0.7415311855993944,
0.8648644233597691,
0.9491079123427585,
0.9914553711208126,
]
w = T[
0.2094821410847278,
0.2044329400752989,
0.1903505780647854,
0.1690047266392679,
0.1406532597155259,
0.1047900103222502,
0.06309209262997855,
0.02293532201052922,
]
wg = T[
0.4179591836734694,
0.3818300505051189,
0.3818300505051189,
0.2797053914892767,
0.2797053914892767,
0.1294849661688697,
0.1294849661688697,
]
return GaussKronrod{T}(x, w, wg)
end
const gk_float64 = GaussKronrod(Float64)
GaussKronrod(::Type{Float64}) = gk_float64
countevals(g::GaussKronrod) = 15
function (g::GaussKronrod{T})(f::F, a_::SVector{1}, b_::SVector{1}, norm=norm) where {F,T}
a = a_[1]
b = b_[1]
c = (a + b) * T(0.5)
Δ = (b - a) * T(0.5)
fx⁰ = f(SVector(c)) # f(0)
I = fx⁰ * g.w[1]
I′ = fx⁰ * g.wg[1]
@inbounds for i = 2:length(g.x)
Δx = Δ * g.x[i]
fx = f(SVector(c + Δx)) + f(SVector(c - Δx))
I += fx * g.w[i]
if i <= length(g.wg)
I′ += fx * g.wg[i]
end
end
I *= Δ
I′ *= Δ
return I, norm(I - I′), 1
end
# Function to perform the integration
function custom_hcubature(f, a, b; norm=norm)
gk = GaussKronrod(eltype(a))
I, E, _ = gk(f, SVector(a...), SVector(b...), norm)
return I, E
end
struct NormalMixture{T<:Real} <: ContinuousUnivariateDistribution
μ::T
σmin::T
σmax::T
end
function NormalMixture(μ, σmin, σmax)
return NormalMixture(promote(μ, σmin, σmax)...)
end
NormalMixture(; μ, σmin = 0, σmax= 2) = NormalMixture(μ, σmin, σmax)
minimum(d::NormalMixture) = -Inf
maximum(d::NormalMixture) = Inf
insupport(d::NormalMixture, x::Real) = true
Base.broadcastable(d::NormalMixture) = Ref(d)
function rand(dist::NormalMixture, n::Int)
x = fill(0.0, n)
for i ∈ 1:n
x[i] = rand(dist)
end
return x
end
function rand(dist::NormalMixture)
(; μ, σmin, σmax) = dist
σ = rand(Uniform(σmin, σmax))
return rand(Normal(μ, σ))
end
function pdf(dist::NormalMixture, x::Real)
(; μ, σmin, σmax) = dist
Δ = σmax - σmin
integral, _ = custom_hcubature(σ -> pdf(Normal(μ, σ[1]), x), [σmin], [σmax])
return integral / Δ
end
logpdf(dist::NormalMixture, x::Real) = log(pdf(dist, x))
loglikelihood(dist::NormalMixture, x::Real) = logpdf(dist, x)
data = rand(NormalMixture(μ = 0.0), 100)
#try a few different test cases
## example 1
function test_normalmixture(params)
d = NormalMixture(μ = params[1], σmin = params[2], σmax = params[3])
return logpdf(d, 0.5)
end
ForwardDiff.gradient(test_normalmixture, [0.0, 1.0, 2.0])
## example 2
function normalmixture_vector(params)
d = NormalMixture(μ = params[1], σmin = params[2], σmax = params[3])
return [logpdf(d, 0.5), pdf(d, 0.5)]
end
ForwardDiff.jacobian(normalmixture_vector, [0.0, 1.0, 2.0])
## example 3
function log_likelihood(μ, data)
return sum(logpdf(NormalMixture(μ = μ[1]), x) for x in data)
end
neg_log_likelihood(μ) = -log_likelihood(μ, data)
# Use ForwardDiff to compute the gradient
function grad_neg_log_likelihood!(storage, μ)
ForwardDiff.gradient!(storage, neg_log_likelihood, μ)
end
## example 4
function test_gradient()
# Choose a point to evaluate the gradient
test_point = [1.0]
# Create a storage vector for the gradient
gradient = similar(test_point)
# Compute the gradient
grad_neg_log_likelihood!(gradient, test_point)
end
test_gradient()
## example 5
# Perform optimization
optimize(neg_log_likelihood, grad_neg_log_likelihood!, [0.0], BFGS())
## example 6
@model function model(data)
μ ~ Normal(0, 10)
data ~ NormalMixture(; μ)
end
chains = sample(model(data), NUTS(;adtype = AutoForwardDiff()), 1000) |
Yeah it is indeed about the type.The key change is the extension of the kronrod function to work with Dual numbers, which allows HCubature to use these types internally. In other words, we have the "let it breath" : see talk on AD function QuadGK.kronrod(::Type{ForwardDiff.Dual{T,V,N}}, n::Integer) where {T,V,N}
x, w, wg = QuadGK.kronrod(V, n)
return map(y -> ForwardDiff.Dual{T,V,N}(y), x),
map(y -> ForwardDiff.Dual{T,V,N}(y), w),
map(y -> ForwardDiff.Dual{T,V,N}(y), wg)
end |
Awesome. This is helpful. I have a few questions. First, is this something that should be changed at Also, I didn't know Julia highlighting works on Github. Update This might be type piracy. So it might be something that needs to be changed at Second Update Unfortunately, this confirms my concern about numeric integration killing performance. Integration is not slow in an absolute sense, but the relative speeds are vastly different: julia> @btime logpdf($Normal(0, 1), $2)
150.229 ns (1 allocation: 16 bytes)
-2.9189385332046727
julia> @btime logpdf($NormalMixture(0, 0, 2), $2)
4.140 μs (30 allocations: 1.55 KiB)
-2.885455766369351 Or about 26 times slower. It seems like the performance hit is larger than expected, unless the gap for the gradients is even bigger. Maybe there is a way to improve performance. |
(Just to say, though you probably agree with me, that at this stage (esp for complicated models) performance should probably not be a priority that stops us from implementing any models. Even if unusably slow, I think having a working and correct implementation in Julia can be then used as a target to test against and set the stage for future improvements and the creation of interesting optimization challenges |
Just popping here to say that I'm happy to help for any AD-related bugs, as long as you can provide an MWE! Had a few chats with @kiante-fernandez at JuliaCon but I can always clarify some stuff |
@gdalle, thank you for your willingness to help with AD related issues. I really appreciate it. As some brief background, Kiante has encountered difficulty implementing the PDF of a model with several parameters that are integrated out. His first attempts were to use analytic solutions provided by others, but that has proved to be challenging. Now we are exploring the feasibility of brute forcing it with using Distributions
using ForwardDiff
using HCubature
f(Θ) = hcubature(σ -> pdf(Normal(Θ[1], σ[1]), 1), [Θ[2]], [Θ[3]])[1]
ForwardDiff.gradient(f, [0, 0, 2]) Our understanding is that either this method needs to relax its parametric restrictions, or we need a new method akin to Kiante's solution: function QuadGK.kronrod(::Type{ForwardDiff.Dual{T,V,N}}, n::Integer) where {T,V,N}
x, w, wg = QuadGK.kronrod(V, n)
return map(y -> ForwardDiff.Dual{T,V,N}(y), x),
map(y -> ForwardDiff.Dual{T,V,N}(y), w),
map(y -> ForwardDiff.Dual{T,V,N}(y), wg)
end One potential issue is that of type piracy. I don't know much about AD, but maybe a solution needs to go into ChainRules or QuadGK. Another issue is performance. Estimating mu for a simple normal model requires about .90 seconds, but the required time for the normal mixture is estimated to be about 20 minutes with Kiantes solution. I'm not sure whether it is inherently this slow, or perhaps the problem is that AD is propagating into the into the integration algorithm without needing to. Thanks again for your help. |
Given that we have limited developer hours, I recommend approaching this strategically. My rationale for developing the simple normal mixture model was to guage the performance implications of using numeric integration so we can potentially avoid wasting time on a non-viable approach. At this point, adding numeric integration to a simple model increased the execution time by more than three orders of magnitude, which makes think twice about proceeding. This will only be worse for a more complex model with two integrals. If there is not a way to significantly improve performance, I don't think it is worth pursuing. On a related note, I also think it might be worth revisiting the status of the analytic approach Kiante was working on. If I remember correctly, I added tests for numerous sub-functions of the pdf which produced the same results as existing implementations. Typically, the bugs can be found by systematically testing the components and their integration, which suggests we might have been close. At this point, I'm not suggesting that we should go in one direction or the other, but I think we need more information from gdalle. |
Your diagnosis is correct, something would probably need to be fixed in GaussKronrod to accomodate |
Thanks for your replies. I used a normal distribution as a toy example to guage feasibility of the approach. The decision making model is a type of 1D stochastic differential equation which evolves until it hits a threshold for the first time. So the PDF is much more complex. In case it is useful, equation 5 shows the pdfs and the integrals. Maybe there is a clever mathematical trick with change of variables, but I'm not knowledable enough to know. |
Yeah my bad, I think even with the MWE it doesn't work because you're integrating by varying I coded a small example with Integrals.jl. With Zygote.jl I obtain a gradient but the slowdown is still two orders of magnitude. using BenchmarkTools
using Distributions
using Integrals
using Zygote
integrand(x::T1, p::T2) where {T1<:Real,T2<:Real} = pdf(Normal(p, x), one(promote_type(T1, T2)))
function f(Θ)
domain = (Θ[2], Θ[3])
p = Θ[1]
prob = IntegralProblem(integrand, domain, p)
sol = solve(prob, QuadGKJL(); reltol=1e-3, abstol=1e-3)
return sol.u
end
Θ = [0.0, 0.0, 2.0]
f(Θ)
Zygote.gradient(f, Θ)[1]
@btime f($Θ);
@btime Zygote.gradient($f, $Θ); With ForwardDiff.jl the exact same error pops up, which is weird cause Integrals.jl should not ever try to differentiate through the solver. I suggest you open an issue in Integrals.jl with the same MWE and ask them. My best guess is that rules for differentiation with respect to integral bounds are still missing: |
Actually @kiante-fernandez that would be a very meaningful and useful contribution to the ecosystem as a whole. Wanna wrestle with autodiff for real? Try contributing the differentiation rules with respect to bounds! |
Thanks for the example above. I replaced Zygote with ForwardDiff, but it worked for me. Here is the version I am using.
Were you using a different version by chance? |
That's very weird, I have the same version. How did you do the replacement? What happens when you run the following code in a fresh REPL (regardless of the initial environment) using Pkg
Pkg.activate(temp=true)
Pkg.add(["BenchmarkTools", "Distributions", "ForwardDiff", "Integrals", "Zygote"])
using BenchmarkTools
using Distributions
using ForwardDiff
using Integrals
using Zygote
integrand(x::T1, p::T2) where {T1<:Real,T2<:Real} = pdf(Normal(p, x), one(promote_type(T1, T2)))
function f(Θ)
domain = (Θ[2], Θ[3])
p = Θ[1]
prob = IntegralProblem(integrand, domain, p)
sol = solve(prob, QuadGKJL(); reltol=1e-3, abstol=1e-3)
return sol.u
end
Θ = [0.0, 0.0, 2.0]
f(Θ)
Zygote.gradient(f, Θ)[1]
ForwardDiff.gradient(f, Θ)
@btime f($Θ);
@btime Zygote.gradient($f, $Θ);
@btime ForwardDiff.gradient($f, $Θ); |
My appologies. I was just about to ammend my comment. I accidently ran Kiante's meothod for |
My pleasure! And feel free to tag the maintainers if your issue doesn't get any attention within a few days. It's completely okay in the open source community, as long as you're respectful and considerate. |
For our record keeping, the issue at Integrals.jl can be found here. |
I did a preliminary investigation this morning. One thing I learned was that the gradient works with the numeric pdf of the DDM. I was surprised by that. I'm not sure whether ForwardDiff did not go through hcubature, but it did not error. The preliminary benchmarks are 17.70ms (numeric) vs 1.33ms (analytic) for I also discovered that the pdf of the DDM without any across-trial variability ( |
Thanks for the updated code, Kiante. The pdf looks better, but something still is not right either with the pdf or the simulated data. using Plots
using SequentialSamplingModels
fixed_parms = (
ν = 1.00,
α = 0.80,
τ = 0.30,
z = 0.25,
η = 0,
sz = 0,
st = 0,
σ = 1.0
)
dist = DDM(; fixed_parms...)
choice, rt = rand(dist, 100_000)
pr_1 = mean(choice .== 1)
p1 = histogram(rt[choice .== 1], norm = true)
p1[1][1][:y] .*= pr_1
x = range(.25, 1.25, length = 200)
dens1 = pdf.(dist, 1, x)
plot!(x, dens1, leg=false) |
I think it might be the sampling method. Also something I wanted to note about one of our current tests in function p=wfpt(t,v,a,z,err)
tt=t/(a^2); % use normalized time
w=z/a; % convert to relative start point
% calculate number of terms needed for large t
if pi*tt*err<1 % if error threshold is set low enough
kl=sqrt(-2*log(pi*tt*err)./(pi^2*tt)); % bound
kl=max(kl,1/(pi*sqrt(tt))); % ensure boundary conditions met
else % if error threshold set too high
kl=1/(pi*sqrt(tt)); % set to boundary condition
end
% calculate number of terms needed for small t
if 2*sqrt(2*pi*tt)*err<1 % if error threshold is set low enough
ks=2+sqrt(-2*tt.*log(2*sqrt(2*pi*tt)*err)); % bound
ks=max(ks,sqrt(tt)+1); % ensure boundary conditions are met
else % if error threshold was set too high
ks=2; % minimal kappa for that case
end
% compute f(tt|0,1,w)
p=0; %initialize density
if ks<kl % if small t is better...
K=ceil(ks); % round to smallest integer meeting error
for k=-floor((K-1)/2):ceil((K-1)/2) % loop over k
p=p+(w+2*k)*exp(-((w+2*k)^2)/2/tt); % increment sum
end
p=p/sqrt(2*pi*tt^3); % add constant term
else % if large t is better...
K=ceil(kl); % round to smallest integer meeting error
for k=1:K
p=p+k*exp(-(k^2)*(pi^2)*tt/2)*sin(k*pi*w); % increment sum
end
p=p*pi; % add constant term
end
% convert to f(t|v,a,w)
p=p*exp(-v*a*w -(v^2)*t/2)/(a^2); Here is RT dist rtdists::ddiffusion(0.35, "upper", a = 0.50, v = 0.80, t0 = 0.2, z = 0.3 * 0.50, sz = 0, sv = 0, st0 = 0.00, s = 1)
#0.6635714
rtdists::ddiffusion(0.35, "lower", a = 0.50, v = 0.80, t0 = 0.2, z = 0.3 * 0.50, sz = 0, sv = 0, st0 = 0.00, s = 1)
#0.4450956 Running the code above (as well as our current implementation) wfpt(.35 - 0.2, -0.8, 0.5,1 - 0.3, 1.0e-12)
%-1.0323
wfpt(.35 - 0.2, 0.8, 0.5, 0.3, 1.0e-12)
%0.4638 I wonder if I am missing something. Note that our current implementation would give something closer to the |
Another example to consider is the following which would test off a popular python implementation (HDDM) uses the following repo import hddm_wfpt
import numpy as np
def ddm_loglik(data, v, a, z, t, err=1e-12):
data = data[:, 0] * data[:, 1]
# print(v.shape)
# print(v)
# print(a.shape)
# print(a)
# print(z)
# print(t)
# print(data)
# print(data.shape[0])
v_ = np.zeros(data.shape[0])
v_[:] = np.float64(v)
a_ = np.zeros(data.shape[0])
a_[:] = 2 * np.float64(a)
z_ = np.zeros(data.shape[0])
z_[:] = np.float64(z)
t_ = np.zeros(data.shape[0])
t_[:] = np.float64(t)
# Our function expects inputs as float64, but they are not guaranteed to
# come in as such --> we type convert
out_logp = hddm_wfpt.wfpt.wiener_logp_array(np.float64(data),
v_, # v
np.zeros(data.shape[0]), # sv
a_, # a
z_, # z
np.zeros(data.shape[0]), # sz
t_, # t
np.zeros(data.shape[0]), # st
err,
)
# print(out_logp)
return out_logp
# Example usage
if __name__ == "__main__":
# Create sample data
# Assuming data is a 2D array where each row is [reaction_time, choice]
# reaction_time is in seconds, choice is 1 or -1
data = np.array([
# [0.71, 1],
# [0.71, -1],
[0.35, 1],
[0.35, -1]
])
# Set parameters
v = 0.80 # drift rate
a = 0.50 # boundary separation
z = 0.30 # starting point
t = 0.20 # non-decision time
# Call the function
log_likelihoods = ddm_loglik(data, v, a, z, t)
# Print results
print("Log-likelihoods:")
print((log_likelihoods))
#> Log-likelihoods:
#> [0.41412645 0.13426766] |
Hmmm. That is really strange the results differ so much. One possibility is that they have different parameterizations, but I'm guessing you have accounted for that already. The diffusion coefficient is another possible point of difference. Some use .1 and other use 1. One way to adjudicate between the three versions is to figure out which overlays correctly with data simulated from the SDE. The implementation of the SDE is straight forward, which would reduce the chance of errors. Edit One thing I realized is that the |
As a test for the new ChatGPT version that's supposedly amazing with code and complex problems, I asked for insights on this issue, it said: Unfortunately, I can't say if it's pure nonsense or not 😅 |
Haha. That is the fundamental problem with the current AI approach. In the case of # in any order and any length
(;ν, α, z, τ, η, σ) = d vs. # in a fixed order and size
(ν, α, z, τ, η, _, _, σ) = params(d) GPT and other LLMs "hallucinate" because each token is generated sequentially by samping from a distribution. So you need to be careful when working using code generated by LLMs. It could easily swap the order of parameters in (v_mean, α, z, τ, η, _, _, σ) = params(d) This code is still correct, but will introduce inconsistency in your code base. If you used the built-in Julia unpacking functionality, it would introduce an error because |
at least i learned something :) |
I think AI will be very helpful for code review once the field decides to incorporate more symbolic methods rather than continuuing with the cash-grab, hype train of scaling data and hardware. LLMs are fundamentally flawed. In the meantime, its just useful enough to wreak havoc. I'm glad I don't have to grade papers =) |
The last PR was quite stale and did not keep up with the new type interface. Here is the new WIP for the Ratcliff DDM