Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: leak of all_gpu_id_array global in KMT. #255

Open
benvanik opened this issue Oct 28, 2024 · 2 comments
Open

[Issue]: leak of all_gpu_id_array global in KMT. #255

benvanik opened this issue Oct 28, 2024 · 2 comments

Comments

@benvanik
Copy link

It looks like the all_gpu_id_array is not cleaned up when KMT is unloaded. If KMT is initialized multiple times in the same process it will leak the array multiple times. hsakmt_fmm_destroy_process_apertures seems to clean up the other global (gpu_mem) but not all_gpu_id_array like it should.

From ASAN:

Direct leak of 8 byte(s) in 1 object(s) allocated from:
    #0 0x5ff5b2387bcf in malloc (/home/nod/src/iree-build/runtime/src/iree/hal/drivers/amdgpu/cts/amdgpu_all_driver_test+0x223bcf) (BuildId: 1530ccada4eb72df)
    #1 0x74e567024f56 in hsakmt_fmm_init_process_apertures /home/nod/src/ROCR-Runtime/libhsakmt/src/fmm.c:2642:22
    #2 0x74e567034da9 in hsaKmtAcquireSystemProperties /home/nod/src/ROCR-Runtime/libhsakmt/src/topology.c:2190:8
    #3 0x74e566ea3a10 in rocr::AMD::BuildTopology() /home/nod/src/ROCR-Runtime/runtime/hsa-runtime/core/runtime/amd_topology.cpp:306:36
    #4 0x74e566ea420e in rocr::AMD::Load() /home/nod/src/ROCR-Runtime/runtime/hsa-runtime/core/runtime/amd_topology.cpp:433:18
    #5 0x74e566ee96c2 in rocr::core::Runtime::Load() /home/nod/src/ROCR-Runtime/runtime/hsa-runtime/core/runtime/runtime.cpp:1995:17
    #6 0x74e566ee0945 in rocr::core::Runtime::Acquire() /home/nod/src/ROCR-Runtime/runtime/hsa-runtime/core/runtime/runtime.cpp:140:51
    #7 0x74e566eaaf83 in rocr::HSA::hsa_init() /home/nod/src/ROCR-Runtime/runtime/hsa-runtime/core/runtime/hsa.cpp:206:52
    #8 0x74e566f567f5 in hsa_init /home/nod/src/ROCR-Runtime/runtime/hsa-runtime/core/common/hsa_table_interface.cpp:70:35
    #9 0x5ff5b243eeed in iree_hsa_init /home/nod/src/iree/runtime/src/iree/hal/drivers/amdgpu/util/libhsa_tables.h:11:1
    #10 0x5ff5b243e426 in iree_hal_amdgpu_libhsa_initialize /home/nod/src/iree/runtime/src/iree/hal/drivers/amdgpu/util/libhsa.c:498:14
    #11 0x5ff5b2400e80 in iree_hal_amdgpu_driver_load_libhsa /home/nod/src/iree/runtime/src/iree/hal/drivers/amdgpu/driver.c:231:26
    #12 0x5ff5b2400b63 in iree_hal_amdgpu_driver_create /home/nod/src/iree/runtime/src/iree/hal/drivers/amdgpu/driver.c:270:26
    #13 0x5ff5b23d4222 in iree_hal_amdgpu_driver_factory_try_create /home/nod/src/iree/runtime/src/iree/hal/drivers/amdgpu/registration/driver_module.c:40:26
    #14 0x5ff5b23fffaa in iree_hal_driver_registry_try_create /home/nod/src/iree/runtime/src/iree/hal/driver_registry.c:314:14
    #15 0x5ff5b23c94f9 in iree::hal::cts::TryGetDriver(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, iree_hal_driver_t**) /home/nod/src/iree/runtime/src/iree/hal/cts/cts_test_base.h:73:26
    #16 0x5ff5b23ca866 in iree::hal::cts::DriverTest::CreateDriver() /home/nod/src/iree/runtime/src/iree/hal/cts/driver_test.h:38:14
    #17 0x5ff5b23c81ee in iree::hal::cts::DriverTest_QueryAndCreateAvailableDevicesByOrdinal_Test::TestBody() /home/nod/src/iree/runtime/src/iree/hal/cts/driver_test.h:103:17
    #18 0x5ff5b2525ce8 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/nod/src/iree/third_party/googletest/googletest/src/gtest.cc:2635:10
    #19 0x5ff5b24e5491 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/nod/src/iree/third_party/googletest/googletest/src/gtest.cc:2671:14
    #20 0x5ff5b2498e23 in testing::Test::Run() /home/nod/src/iree/third_party/googletest/googletest/src/gtest.cc:2710:5
    #21 0x5ff5b249a796 in testing::TestInfo::Run() /home/nod/src/iree/third_party/googletest/googletest/src/gtest.cc:2856:11
    #22 0x5ff5b249bde6 in testing::TestSuite::Run() /home/nod/src/iree/third_party/googletest/googletest/src/gtest.cc:3034:30
    #23 0x5ff5b24bef9e in testing::internal::UnitTestImpl::RunAllTests() /home/nod/src/iree/third_party/googletest/googletest/src/gtest.cc:5964:44
    #24 0x5ff5b252f928 in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/nod/src/iree/third_party/googletest/googletest/src/gtest.cc:2635:10
    #25 0x5ff5b24ea6b6 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /home/nod/src/iree/third_party/googletest/googletest/src/gtest.cc:2671:14
    #26 0x5ff5b24be225 in testing::UnitTest::Run() /home/nod/src/iree/third_party/googletest/googletest/src/gtest.cc:5543:10
    #27 0x5ff5b2400690 in RUN_ALL_TESTS() /home/nod/src/iree/third_party/googletest/googletest/include/gtest/gtest.h:2334:73
    #28 0x5ff5b24005b3 in main /home/nod/src/iree/runtime/src/iree/testing/gtest_main.cc:20:13
    #29 0x74e575c29d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
@ppanchad-amd
Copy link

Hi @benvanik. Internal ticket has been created investigate your issue. Thanks!

@zichguan-amd
Copy link

Hi @benvanik, recent commit c066ec1 should fix the leak. Thanks for reporting it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants