Scalar indexing on GPU arrays causes runtime error #4

junyixu · 2025-01-12T00:48:50Z

I hit an error when trying u[1] = 1.0f0 on my CUDA array. The error message says "Scalar indexing is disallowed". Not sure why I can't just set array elements like I do with normal arrays?

The text was updated successfully, but these errors were encountered:

junyixu · 2025-01-12T03:03:36Z

I resolved the issue by introducing init_first_element_kernel!

The example for upwind!:

using CUDA
using Enzyme
using Test

const C = 0.2f0

function upwind_kernel!(du, u, v, numerical_flux)
    i = threadIdx().x + (blockIdx().x - 1) * blockDim().x
    
    if i <= length(u)
        numerical_flux[i] = u[i] * v
    end
    
    sync_threads()
    
    if i <= length(u)
        if i > 1
            du[i] = -C * (numerical_flux[i] - numerical_flux[i-1])
        else  # i == 1
            du[i] = -C * (numerical_flux[1] - numerical_flux[end])
        end
    end
    
    return nothing
end

function grad_upwind_kernel!(du, du_shadow, u, u_shadow, v, numerical_flux, numerical_flux_shadow)
    autodiff_deferred(Forward, 
                     Const(upwind_kernel!),
                     Const,
                     Duplicated(du, du_shadow),
                     Duplicated(u, u_shadow),
                     Const(v),
                     Duplicated(numerical_flux, numerical_flux_shadow))
    return nothing
end

function init_first_element_kernel!(arr)
    if threadIdx().x == 1 && blockIdx().x == 1
        arr[1] = 1.0f0
    end
    return nothing
end

function test_cuda_upwind()
    n = 201
    nthreads = 256
    nblocks = ceil(Int, n/nthreads)
    
    # Allocate GPU memory
    u = CUDA.zeros(Float32, n)
    du = CUDA.zeros(Float32, n)
    numerical_flux = CUDA.zeros(Float32, n)
    
    @cuda threads=1 blocks=1 init_first_element_kernel!(u)
    
    # Shadow variables for forward mode
    u_shadow = CUDA.zeros(Float32, n)
    du_shadow = CUDA.zeros(Float32, n)
    numerical_flux_shadow = CUDA.zeros(Float32, n)
    
    @cuda threads=1 blocks=1 init_first_element_kernel!(u_shadow)
    
    v = 1.0f0
    
    @cuda threads=nthreads blocks=nblocks grad_upwind_kernel!(
        du, du_shadow, u, u_shadow, v, numerical_flux, numerical_flux_shadow)
    
    CUDA.synchronize()
    
    return Array(du_shadow)
end

# Test suite
@testset "CUDA Upwind Forward Mode" begin
    result = test_cuda_upwind()
    @test length(result) == 201
    # Add more specific test assertions here
end

junyixu · 2025-01-12T03:06:48Z

This only calculates the gradient. To compute the full Jacobian, I need to devise an efficient method that fully utilizes the GPU to avoid performance issues.

junyixu closed this as completed Jan 12, 2025

junyixu added the question Further information is requested label Jan 12, 2025

junyixu reopened this Jan 12, 2025

junyixu mentioned this issue Jan 12, 2025

Error when calling rhs_gpu! directly with CUDA arrays trixi-gpu/TrixiCUDA.jl#110

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scalar indexing on GPU arrays causes runtime error #4

Scalar indexing on GPU arrays causes runtime error #4

junyixu commented Jan 12, 2025

junyixu commented Jan 12, 2025

junyixu commented Jan 12, 2025

Scalar indexing on GPU arrays causes runtime error #4

Scalar indexing on GPU arrays causes runtime error #4

Comments

junyixu commented Jan 12, 2025

junyixu commented Jan 12, 2025

junyixu commented Jan 12, 2025