-
Notifications
You must be signed in to change notification settings - Fork 395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hmem/rocr: Add ROCr dmabuf support under verbs #9618
Conversation
@edgargabriel @nusislam for additional review |
src/hmem_rocr.c
Outdated
bool dmabuf_support = false, dmabuf_kernel = false; | ||
|
||
char *s; | ||
if (!(s = getenv("FI_HMEM_ROCR_USE_DMABUF")) || !atoi(s)) goto out; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use fi_param_define()
and fi_param_get_*()
to define and read environment variables. Only use getenv
for variables defined externally (e.g. MPI).
src/hmem_rocr.c
Outdated
} | ||
|
||
#if HAVE_HSA_AMD_PORTABLE_EXPORT_DMABUF | ||
const char kernel_opt1[] = "CONFIG_DMABUF_MOVE_NOTIFY=y"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't interleave declarations with statements. Either move the declarations up or add braces around the entire block. Same for the next function.
src/hmem_rocr.c
Outdated
} | ||
|
||
int rocr_hmem_get_dmabuf_fd(const void *addr, uint64_t size, int *dmabuf_fd, | ||
uint64_t *offset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alignment is off.
src/hmem_rocr.c
Outdated
|
||
char *s; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the empty line after this line.
src/hmem_rocr.c
Outdated
// HSA_AMD_SYSTEM_INFO_DMABUF_SUPPORTED = 0x204, we use the number instead of var name | ||
// for backward compatibility reasons. | ||
hsa_ret = hsa_ops.hsa_system_get_info((hsa_system_info_t) 0x204, &dmabuf_support); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: here we usually don't add the empty line between the call and the return value checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming the return value checking clause after this should also have { at same line?
int rocr_hmem_get_dmabuf_fd(const void *addr, uint64_t size, int *dmabuf_fd, | ||
uint64_t *offset) | ||
{ | ||
static bool supported = false, checked = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add an empty line here.
src/hmem_rocr.c
Outdated
{ | ||
static bool supported = false, checked = false; | ||
if (!checked) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move {
to the end of the previous line.
src/hmem_rocr.c
Outdated
hsa_ret = hsa_ops.hsa_amd_portable_export_dmabuf(addr, size, dmabuf_fd, offset); | ||
|
||
if (hsa_ret != HSA_STATUS_SUCCESS) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here.
src/hmem_rocr.c
Outdated
@@ -1054,6 +1063,100 @@ int rocr_dev_reg_copy_from_hmem(uint64_t handle, void *dest, const void *src, | |||
return FI_SUCCESS; | |||
} | |||
|
|||
bool rocr_is_dmabuf_supported(void) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this function intended to be used anywhere else? If not, maybe define it as static. If yes, consider moving the checked
logic from the caller side to here to avoid duplicated checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that makes sense and I made it static, the hmem_cuda
counterpart also have this function but non-static, I wonder if it had a specific reason there?
bot:aws:retest |
@j-xiong thank you for the review and comments, could you take a look at the updated commit as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please squash the third commit into the first one and update the PR by "git push -f". We don't do auto-squash here and don't expect one commit fixing another commit in the same PR.
src/hmem_rocr.c
Outdated
return -FI_EOPNOTSUPP; | ||
|
||
#if HAVE_HSA_AMD_PORTABLE_EXPORT_DMABUF | ||
hsa_status_t hsa_ret; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You forgot this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@j-xiong would you please check again?
8179285
to
10201e7
Compare
Windows build error:
|
I think this |
bot:aws:retest |
Signed-off-by: Tim Hu <[email protected]>
Signed-off-by: Tim Hu <[email protected]>
10201e7
to
9781e9d
Compare
bot:aws:retest
@j-xiong how about now? |
@AtlantaPepsi Let's wait for the CI test result. FYI, the |
@j-xiong I think all checks have been passed |
get_dmabuf_fd
for ROCr under HMEM interface, it acquires dmabuf handle through HSA runtime and is now invoked byverbs
providers.FI_HMEM_ROCR_USE_DMABUF
(please let us know if naming is ok) as a switch that will be checked during runtime