Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalar indexing on GPU arrays causes runtime error #4

Open
junyixu opened this issue Jan 12, 2025 · 2 comments
Open

Scalar indexing on GPU arrays causes runtime error #4

junyixu opened this issue Jan 12, 2025 · 2 comments
Labels
question Further information is requested

Comments

@junyixu
Copy link
Owner

junyixu commented Jan 12, 2025

I hit an error when trying u[1] = 1.0f0 on my CUDA array. The error message says "Scalar indexing is disallowed". Not sure why I can't just set array elements like I do with normal arrays?

@junyixu
Copy link
Owner Author

junyixu commented Jan 12, 2025

I resolved the issue by introducing init_first_element_kernel!

The example for upwind!:

using CUDA
using Enzyme
using Test

const C = 0.2f0

function upwind_kernel!(du, u, v, numerical_flux)
    i = threadIdx().x + (blockIdx().x - 1) * blockDim().x
    
    if i <= length(u)
        numerical_flux[i] = u[i] * v
    end
    
    sync_threads()
    
    if i <= length(u)
        if i > 1
            du[i] = -C * (numerical_flux[i] - numerical_flux[i-1])
        else  # i == 1
            du[i] = -C * (numerical_flux[1] - numerical_flux[end])
        end
    end
    
    return nothing
end

function grad_upwind_kernel!(du, du_shadow, u, u_shadow, v, numerical_flux, numerical_flux_shadow)
    autodiff_deferred(Forward, 
                     Const(upwind_kernel!),
                     Const,
                     Duplicated(du, du_shadow),
                     Duplicated(u, u_shadow),
                     Const(v),
                     Duplicated(numerical_flux, numerical_flux_shadow))
    return nothing
end

function init_first_element_kernel!(arr)
    if threadIdx().x == 1 && blockIdx().x == 1
        arr[1] = 1.0f0
    end
    return nothing
end

function test_cuda_upwind()
    n = 201
    nthreads = 256
    nblocks = ceil(Int, n/nthreads)
    
    # Allocate GPU memory
    u = CUDA.zeros(Float32, n)
    du = CUDA.zeros(Float32, n)
    numerical_flux = CUDA.zeros(Float32, n)
    
    @cuda threads=1 blocks=1 init_first_element_kernel!(u)
    
    # Shadow variables for forward mode
    u_shadow = CUDA.zeros(Float32, n)
    du_shadow = CUDA.zeros(Float32, n)
    numerical_flux_shadow = CUDA.zeros(Float32, n)
    
    @cuda threads=1 blocks=1 init_first_element_kernel!(u_shadow)
    
    v = 1.0f0
    
    @cuda threads=nthreads blocks=nblocks grad_upwind_kernel!(
        du, du_shadow, u, u_shadow, v, numerical_flux, numerical_flux_shadow)
    
    CUDA.synchronize()
    
    return Array(du_shadow)
end

# Test suite
@testset "CUDA Upwind Forward Mode" begin
    result = test_cuda_upwind()
    @test length(result) == 201
    # Add more specific test assertions here
end

@junyixu
Copy link
Owner Author

junyixu commented Jan 12, 2025

This only calculates the gradient. To compute the full Jacobian, I need to devise an efficient method that fully utilizes the GPU to avoid performance issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant