You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the Bug
The latest master branch fails to build with several ROCm versions, including 5.4, 5.5, and 5.6.
Rolling back to the commit made on June 20 (git checkout 10c7482) allows ROCm-5.4 to build. The build still fails for 5.5 and 5.6 but with a different error.
The build fails when compiling csrc/mlp_hip.hip with errors like the following:
csrc/mlp_hip.hip:65:53: error: unknown type name 'hipblasOperation_t'; did you mean 'hipsparseOperation_t'?
static rocblas_operation hipOperationToRocOperation(hipblasOperation_t op)
^~~~~~~~~~~~~~~~~~
hipsparseOperation_t
/opt/rocm-5.4.3/include/hipsparse/hipsparse.h:317:3: note: 'hipsparseOperation_t' declared here
} hipsparseOperation_t;
^
csrc/mlp_hip.hip:69:10: error: use of undeclared identifier 'HIPBLAS_OP_N'
case HIPBLAS_OP_N:
^
csrc/mlp_hip.hip:71:10: error: use of undeclared identifier 'HIPBLAS_OP_T'
case HIPBLAS_OP_T:
^
csrc/mlp_hip.hip:73:10: error: use of undeclared identifier 'HIPBLAS_OP_C'
case HIPBLAS_OP_C:
^
csrc/mlp_hip.hip:79:8: error: unknown type name 'hipblasStatus_t'; did you mean 'hipsparseStatus_t'?
static hipblasStatus_t rocBLASStatusToHIPStatus(rocblas_status error)
^~~~~~~~~~~~~~~
hipsparseStatus_t
/opt/rocm-5.4.3/include/hipsparse/hipsparse.h:188:3: note: 'hipsparseStatus_t' declared here
} hipsparseStatus_t;
^
Rolling back to the commit from June 20 allows the build to complete:
Building apex from master with ROCm-5.5 and ROCm-5.6 fail with errors similar to each other, but errors that are distinct from ROCm-5.4. Here are the steps I used to build with ROCm-5.6:
csrc/mlp_hip.hip:91:10: error: use of undeclared identifier 'rocblas_status_excluded_from_build'
case rocblas_status_excluded_from_build:
^
csrc/mlp_hip.hip:104:10: error: use of undeclared identifier 'rocblas_status_arch_mismatch'; did you mean 'rocblas_status_size_query_mismatch'?
case rocblas_status_arch_mismatch:
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
rocblas_status_size_query_mismatch
/opt/rocm-5.6.0/include/rocblas/internal/rocblas-types.h:212:5: note: 'rocblas_status_size_query_mismatch' declared here
rocblas_status_size_query_mismatch = 8, /**< Unmatched start/stop size query */
^
csrc/mlp_hip.hip:104:10: error: duplicate case value 'rocblas_status_size_query_mismatch'
case rocblas_status_arch_mismatch:
^
csrc/mlp_hip.hip:96:10: note: previous case defined here
case rocblas_status_size_query_mismatch:
^
In this case, rolling back to the June 20 commit fails with a different error:
csrc/mlp_hip.hip:89:7: error: use of undeclared identifier 'rocblas_datatype_f64_r'
rocblas_datatype_f64_r,
^
csrc/mlp_hip.hip:92:7: error: use of undeclared identifier 'rocblas_datatype_f64_r'
rocblas_datatype_f64_r,
^
csrc/mlp_hip.hip:96:7: error: use of undeclared identifier 'rocblas_datatype_f64_r'
rocblas_datatype_f64_r,
^
csrc/mlp_hip.hip:99:7: error: use of undeclared identifier 'rocblas_datatype_f64_r'
rocblas_datatype_f64_r,
^
csrc/mlp_hip.hip:101:7: error: use of undeclared identifier 'rocblas_datatype_f64_r'
rocblas_datatype_f64_r,
^
csrc/mlp_hip.hip:102:7: error: use of undeclared identifier 'rocblas_gemm_algo_standard'
rocblas_gemm_algo_standard,
^
Building with the June 20 commit, I see that the csrc/mlp_hip.hip file contains the following for ROCm-5.5 and ROCm-5.6 (which fails):
/* Includes, cuda */
#include <hipblas/hipblas.h>
#include <hip/hip_runtime.h>
but it has the following for ROCm-5.4 (which builds):
/* Includes, cuda */
#include <rocblas/rocblas.h>
#include <hip/hip_runtime.h>
Expected Behavior
Environment
The text was updated successfully, but these errors were encountered:
adammoody
changed the title
Problems building apex main branch with ROCm-5.4, 5.5, and 5.6
Problems building apex with ROCm-5.4, 5.5, and 5.6
Aug 25, 2023
I'm seeing this as well, a number of errors like those above while building the cuda_ext.
/apex/csrc/mlp_hip.hip:65:53: error: unknown type name 'hipblasOperation_t'; did you mean 'hipsparseOperation_t'?
static rocblas_operation hipOperationToRocOperation(hipblasOperation_t op)
Hi @adammoody and @loadams, if you are using PyTorch 2.0 or earlier, please use master branch for apex. If you are using PyTorch 2.1+, please use torch_2.1_higher branch.
There are some changes related to CUDA to HIP conversion in PyTorch.
export HIP_PLATFORM_HCC
export HIP_PLATFORM_AMD
these two commands are not needed.
Describe the Bug
The latest
master
branch fails to build with several ROCm versions, including 5.4, 5.5, and 5.6.Rolling back to the commit made on June 20 (
git checkout 10c7482
) allows ROCm-5.4 to build. The build still fails for 5.5 and 5.6 but with a different error.Minimal Steps/Code to Reproduce the Bug
For ROCm-5.4.3, I use the following to build:
The build fails when compiling
csrc/mlp_hip.hip
with errors like the following:Rolling back to the commit from June 20 allows the build to complete:
Building apex from
master
with ROCm-5.5 and ROCm-5.6 fail with errors similar to each other, but errors that are distinct from ROCm-5.4. Here are the steps I used to build with ROCm-5.6:That fails with the following error:
In this case, rolling back to the June 20 commit fails with a different error:
Building with the June 20 commit, I see that the
csrc/mlp_hip.hip
file contains the following for ROCm-5.5 and ROCm-5.6 (which fails):but it has the following for ROCm-5.4 (which builds):
Expected Behavior
Environment
The text was updated successfully, but these errors were encountered: