-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scalar indexing on GPU arrays causes runtime error #4
Labels
question
Further information is requested
Comments
I resolved the issue by introducing The example for using CUDA
using Enzyme
using Test
const C = 0.2f0
function upwind_kernel!(du, u, v, numerical_flux)
i = threadIdx().x + (blockIdx().x - 1) * blockDim().x
if i <= length(u)
numerical_flux[i] = u[i] * v
end
sync_threads()
if i <= length(u)
if i > 1
du[i] = -C * (numerical_flux[i] - numerical_flux[i-1])
else # i == 1
du[i] = -C * (numerical_flux[1] - numerical_flux[end])
end
end
return nothing
end
function grad_upwind_kernel!(du, du_shadow, u, u_shadow, v, numerical_flux, numerical_flux_shadow)
autodiff_deferred(Forward,
Const(upwind_kernel!),
Const,
Duplicated(du, du_shadow),
Duplicated(u, u_shadow),
Const(v),
Duplicated(numerical_flux, numerical_flux_shadow))
return nothing
end
function init_first_element_kernel!(arr)
if threadIdx().x == 1 && blockIdx().x == 1
arr[1] = 1.0f0
end
return nothing
end
function test_cuda_upwind()
n = 201
nthreads = 256
nblocks = ceil(Int, n/nthreads)
# Allocate GPU memory
u = CUDA.zeros(Float32, n)
du = CUDA.zeros(Float32, n)
numerical_flux = CUDA.zeros(Float32, n)
@cuda threads=1 blocks=1 init_first_element_kernel!(u)
# Shadow variables for forward mode
u_shadow = CUDA.zeros(Float32, n)
du_shadow = CUDA.zeros(Float32, n)
numerical_flux_shadow = CUDA.zeros(Float32, n)
@cuda threads=1 blocks=1 init_first_element_kernel!(u_shadow)
v = 1.0f0
@cuda threads=nthreads blocks=nblocks grad_upwind_kernel!(
du, du_shadow, u, u_shadow, v, numerical_flux, numerical_flux_shadow)
CUDA.synchronize()
return Array(du_shadow)
end
# Test suite
@testset "CUDA Upwind Forward Mode" begin
result = test_cuda_upwind()
@test length(result) == 201
# Add more specific test assertions here
end
|
This only calculates the gradient. To compute the full Jacobian, I need to devise an efficient method that fully utilizes the GPU to avoid performance issues. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I hit an error when trying
u[1] = 1.0f0
on my CUDA array. The error message says "Scalar indexing is disallowed". Not sure why I can't just set array elements like I do with normal arrays?The text was updated successfully, but these errors were encountered: