Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIOpen Error: hipoc_kernel.cpp:106: Failed to launch kernel: invalid configuration argument #3492

Open
GeezYuven opened this issue Feb 9, 2025 · 1 comment

Comments

@GeezYuven
Copy link

GeezYuven commented Feb 9, 2025

  1. When I use the t5layernorm command MIOpenDriver t5layernorm --input 128x256x512x11 -F 1 -m 0 -t 1 -i 1, I encountered this error,Details are as follows:

MIOpenDriver t5layernorm --input 128x256x512x11 -F 1 -m 0 -t 1 -i 1
MIOpen(HIP): Info2 [CheckHipnnVersion] MIOpen and HIPNN Version matching was successful
MIOpen(HIP): Info [get_device_name] Raw device name: gfx906:sramecc-:xnack-
MIOpen(HIP): Info [AmdRocmMetadataVersionDetect] ROCm MD version AMDHSA_COv3, HIP version 5.1.24472, MIOpen version 3.2.0.0b0f297-dirty
MIOpen(HIP): Info2 [ValidateGcnAssemblerImpl] Target: x86_64-unknown-linux-gnu
MIOpen(HIP): Info2 [ValidateGcnAssemblerImpl] Thread model: posix
MIOpen(HIP): Info2 [ValidateGcnAssemblerImpl]
MIOpen(HIP): Info2 [GetPerfDbPathFile] inexact perf database search
MIOpen(HIP): Info [Measure] RamDb::Prefetch time: 1.02762 ms
MIOpen(HIP): Info [Handle] stream: 0x7221ba0, device_id: 0
MIOpen(HIP): Info2 [GetWorkspaceSizes] T5LayernormBackward: 180224
MIOpen(HIP): MIOpenDriver Info2 [GPUMem] hipMalloc 738197504 at 0x2ad4ec200000 Ok
MIOpen(HIP): MIOpenDriver Info2 [GPUMem] hipMalloc 44 at 0x2ad483c00000 Ok
MIOpen(HIP): MIOpenDriver Info2 [GPUMem] hipMalloc 738197504 at 0x2ad518400000 Ok
MIOpen(HIP): MIOpenDriver Info2 [GPUMem] hipMalloc 67108864 at 0x2ad498c00000 Ok
MIOpen(HIP): MIOpenDriver Info2 [GPUMem] hipMalloc 738197504 at 0x2ad544600000 Ok
MIOpen(HIP): MIOpenDriver Info2 [GPUMem] hipMalloc 738197504 at 0x2ad570800000 Ok
MIOpen(HIP): MIOpenDriver Info2 [GPUMem] hipMalloc 44 at 0x2ad483c01000 Ok
MIOpen(HIP): MIOpenDriver Info2 [GPUMem] hipMalloc 180224 at 0x2ad483c02000 Ok
PRNG seed: 12345678
MIOpen(HIP): Command [LogCmdT5LayerNorm] ./bin/MIOpenDriver t5layernormfp32 -n 128 -c 256 -H 512 -W 11 -F 1 -m 0
MIOpen(HIP): Info2 [GetInvoker] Returning an invoker for problem dtype1normalized_dim0outer_size1inner_size184549376 and algorithm T5LayerNormForward
MIOpen(HIP): Info2 [GetFound1_0] No invokers found for dtype1normalized_dim0outer_size1inner_size184549376
MIOpen(HIP): Info [FindSolutionImpl] T5LayernormForward (not searchable)
MIOpen(HIP): Info2 [SearchForSolutions] T5LayernormForward: Success.
MIOpen(HIP): Info2 [PrepareInvoker] Preparing kernel: T5LayernormFwdContiguous
MIOpen(HIP): Info2 [SQLiteBase] Initializing system database file ""
MIOpen(HIP): Info [KernDb] database not present
MIOpen(HIP): Info2 [SQLiteBase] Initializing user database file "./db3/conv2d-fp16/kdb/gfx906_64.ukdb"
MIOpen(HIP): Trace [Exec] 47072929652800:PRAGMA journal_mode=WAL;
MIOpen(HIP): Trace [Exec] 47072929652800:CREATE TABLE IF NOT EXISTS kern_db (id INTEGER PRIMARY KEY ASC,kernel_name TEXT NOT NULL,kernel_args TEXT NOT NULL,kernel_blob BLOB NOT NULL,kernel_hash TEXT NOT NULL,uncompressed_size INT NOT NULL);CREATE UNIQUE INDEX IF NOT EXISTS idx_kern_db ON kern_db(kernel_name, kernel_args);
MIOpen(HIP): Info2 [KernDb] Database created successfully
MIOpen(HIP): Trace [Exec] 47072929652800:PRAGMA table_info(kern_db);
MIOpen(HIP): Info2 [LoadBinary] Loading binary for: "MIOpenLayerNorm.cpp.o"; args: -DMIOPEN_USE_FP16=0 -DMIOPEN_USE_FP32=1 -DMIOPEN_USE_BFP16=0 -DINPUT_TYPE=float -DOUTPUT_TYPE=float -DLOCAL_SIZE=256 -DMIOPEN_ELEMENTWISE_AFFINE=0 -DMIOPEN_WEIGHT_BIAS=1 -DMIOPEN_ELEMENTWISE_AFFINE_FUSED_ADD=2 -DMIOPEN_WEIGHT_BIAS_FUSED_ADD=3 -DMIOPEN_ELEMENTWISE_AFFINE_T5=4 -DMIOPEN_WEIGHT_BIAS_T5=5 -mcpu=gfx906
MIOpen(HIP): Info2 [Prepare] SELECT kernel_blob, kernel_hash, uncompressed_size FROM kern_db WHERE (kernel_name = 'MIOpenLayerNorm.cpp.o') AND (kernel_args = '-DMIOPEN_USE_FP16=0 -DMIOPEN_USE_FP32=1 -DMIOPEN_USE_BFP16=0 -DINPUT_TYPE=float -DOUTPUT_TYPE=float -DLOCAL_SIZE=256 -DMIOPEN_ELEMENTWISE_AFFINE=0 -DMIOPEN_WEIGHT_BIAS=1 -DMIOPEN_ELEMENTWISE_AFFINE_FUSED_ADD=2 -DMIOPEN_WEIGHT_BIAS_FUSED_ADD=3 -DMIOPEN_ELEMENTWISE_AFFINE_T5=4 -DMIOPEN_WEIGHT_BIAS_T5=5 -mcpu=gfx906');
MIOpen(HIP): Info2 [Measure] Db::FindRecord time: 0.709541 ms
MIOpen(HIP): Info2 [LoadBinary] Unable to load binary for: "MIOpenLayerNorm.cpp.o"; args: -DMIOPEN_USE_FP16=0 -DMIOPEN_USE_FP32=1 -DMIOPEN_USE_BFP16=0 -DINPUT_TYPE=float -DOUTPUT_TYPE=float -DLOCAL_SIZE=256 -DMIOPEN_ELEMENTWISE_AFFINE=0 -DMIOPEN_WEIGHT_BIAS=1 -DMIOPEN_ELEMENTWISE_AFFINE_FUSED_ADD=2 -DMIOPEN_WEIGHT_BIAS_FUSED_ADD=3 -DMIOPEN_ELEMENTWISE_AFFINE_T5=4 -DMIOPEN_WEIGHT_BIAS_T5=5 -mcpu=gfx906
MIOpen(HIP): Trace [LoadProgram] HIPOCProgram MIOpenLayerNorm.cpp
MIOpen(HIP): Info2 [SaveBinary] Saving binary for: "MIOpenLayerNorm.cpp.o"; args: -DMIOPEN_USE_FP16=0 -DMIOPEN_USE_FP32=1 -DMIOPEN_USE_BFP16=0 -DINPUT_TYPE=float -DOUTPUT_TYPE=float -DLOCAL_SIZE=256 -DMIOPEN_ELEMENTWISE_AFFINE=0 -DMIOPEN_WEIGHT_BIAS=1 -DMIOPEN_ELEMENTWISE_AFFINE_FUSED_ADD=2 -DMIOPEN_WEIGHT_BIAS_FUSED_ADD=3 -DMIOPEN_ELEMENTWISE_AFFINE_T5=4 -DMIOPEN_WEIGHT_BIAS_T5=5 -mcpu=gfx906
MIOpen(HIP): Info2 [Prepare] INSERT OR REPLACE INTO kern_db(kernel_name, kernel_args, kernel_blob, kernel_hash, uncompressed_size) VALUES(?, ?, ?, ?, ?);
MIOpen(HIP): Info2 [Measure] Db::StoreRecord time: 11.9996 ms
MIOpen(HIP): Info2 [Register] Invoker registered for algorithm dtype1normalized_dim0outer_size1inner_size184549376 and solver T5LayernormForward
MIOpen(HIP): Info2 [SetAsFound1_0] Solver T5LayernormForward registered as find 1.0 best for T5LayerNormForward in dtype1normalized_dim0outer_size1inner_size184549376
MIOpen(HIP): Info2 [run] kernel_name = T5LayernormFwdContiguous, global_work_dim = { 4294967296, 1, 1 }, local_work_dim = { 256, 1, 1 }
MIOpen Error: hipoc_kernel.cpp:106: Failed to launch kernel: invalid configuration argument
GPU Kernel Time Forward T5LayerNorm Elapsed: 0 ms
Forward T5LayerNorm FAILED: 0.321133 > 1.5e-06

@GeezYuven
Copy link
Author

This seems to be because the amount of data to be processed exceeds the maximum grid size per dimension. And the calculated data is 4294967296

Image

But when I use another set of parameters (where the calculated data amount is still greater than the limit), it doesn't show up. Why is that?

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants