Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OMPD] The LLVM OMPD ompd_get_task_function() entry_point address return value for NVIDIA GPUs does not seem to correspond to a valid code address on the GPU #68

Open
jdelsign opened this issue Feb 27, 2019 · 1 comment

Comments

@jdelsign
Copy link

The LLVM OMPD ompd_get_task_function() entry_point address return value for NVIDIA GPUs does not seem to correspond to a valid code address on the GPU.

The TotalView OMPD support is instrumented to print out the calls it makes to the OMPD library. Here are some of the calls made when trying to fetch the task function for a GPU thread that has stopped in OMP code on the GPU:

  • Get the process handle:
ompd_process_initialize(context=0x553bf80)->rc_ok: handle=0x6024440
  • Get the GPU device handle:
ompd_device_initialize(process_handle=0x6024440,device_context=0x581bae8,kind=2,sizeof_id=8,*(long*)id=0x40914544)->rc_ok: device_handle=0x60251a0
  • Get the GPU thread handle:
ompd_get_thread_handle(handle=0x60251a0,kind=3,sizeof_thread_id=136,*(long*)thread_id=0)->rc_ok: thread_handle=0x6030990
  • Get the current task for the GPU thread:
ompd_get_curr_task_handle(thread_handle=0x6030990)->rc_ok: task_handle=0x6018eb0
  • Get the task function:
ompd_get_task_function(task_handle=0x6018eb0)->rc_ok: entry_point={segment=0,address=0x7f8034961a80}

The problem is that the result code indicates success, but the entry_point address does not correspond to any program section in the CUDA ELF images. TotalView prints the following error message:

    ERROR: Address 0x7f8034963a80 isn't mapped in address_space 200500.-1

Which means that it searched its address section table for an address range containing the address, and it did not find one. The address does exist in the GPU address space because I can dereference it as an address in the global segment:

    d1.<> f t1.-1 p {*(int @global *)0x7f8034963a80}
     *(int @global *)0x7f8034963a80 = 0xfffff389 (-3191)
    d1.<> f t1.-1 p {*(long @global *)0x7f8034963a80}
     *(long @global *)0x7f8034963a80 = 0x000000fffffff389 (1099511624585)
    d1.<> 

I'm guessing that that's some sort of heap memory on the GPU, but I don't really know.

@jprotze
Copy link

jprotze commented Mar 26, 2019

This is the LLVM-IR generated for a random target region:

; Function Attrs: noinline norecurse nounwind
define internal void @__omp_offloading_38_50a3934e_vec_mult_l15_worker() #1 {
  %1 = alloca i8*, align 8
  %2 = alloca i8, align 1
  %3 = call i32 bitcast (i32 (i8*)* @__kmpc_global_thread_num to i32 (%struct.ident_t*)*)(%struct.ident_t* @1)
  store i8* null, i8** %1, align 8
  store i8 0, i8* %2, align 1
  br label %4

; <label>:4:                                      ; preds = %19, %0
  call void @llvm.nvvm.barrier0()
  %5 = call i1 @__kmpc_kernel_parallel(i8** %1, i16 1)
  %6 = zext i1 %5 to i8
  store i8 %6, i8* %2, align 1
  %7 = load i8*, i8** %1, align 8
  %8 = icmp eq i8* %7, null
  br i1 %8, label %20, label %9

; <label>:9:                                      ; preds = %4
  %10 = load i8, i8* %2, align 1
  %11 = icmp ne i8 %10, 0
  br i1 %11, label %12, label %19

; <label>:12:                                     ; preds = %9
  %13 = load i8*, i8** %1, align 8
  %14 = icmp eq i8* %13, bitcast (void (i16, i32)* @__omp_outlined___wrapper to i8*)
  br i1 %14, label %15, label %16

; <label>:15:                                     ; preds = %12
  call void @__omp_outlined___wrapper(i16 0, i32 %3) #11
  br label %18

; <label>:16:                                     ; preds = %12
  %17 = bitcast i8* %7 to void (i16, i32)*
  call void %17(i16 0, i32 %3)
  br label %18

; <label>:18:                                     ; preds = %16, %15
  call void @__kmpc_kernel_end_parallel()
  br label %19

; <label>:19:                                     ; preds = %18, %9
  call void @llvm.nvvm.barrier0()
  br label %4

; <label>:20:                                     ; preds = %4
  ret void
}

The comparison in label 12 suggests, that the address in %13 should be the address of __omp_outlined___wrapper. This is the address presented by the ompd_get_task_function()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants