From 65fffd554acd3867ed8aad07860e7b119439f903 Mon Sep 17 00:00:00 2001 From: Michael Zingale Date: Thu, 21 Nov 2024 10:27:15 -0500 Subject: [PATCH] add a bit more on debuggin with precise memory --- sphinx_docs/source/olcf-workflow.rst | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/sphinx_docs/source/olcf-workflow.rst b/sphinx_docs/source/olcf-workflow.rst index dc77cf6..a6e270e 100644 --- a/sphinx_docs/source/olcf-workflow.rst +++ b/sphinx_docs/source/olcf-workflow.rst @@ -208,7 +208,16 @@ If it doesn't crash with the trace, then try: interrupt bt +It might say that the memory location is not precise, to enable precise +memory, in the debugger, do: +.. prompt:: + :prompts: (gdb) + + set amdgpu precise-memory on + show amdgpu precise-memory + +and rerun. @@ -222,11 +231,11 @@ Workaround to prevent hangs for collectives: export FI_MR_CACHE_MONITOR=memhooks -Some AMReX reports are that it hangs if the initial Arena size is too big, and we should do +Some AMReX reports are that it hangs if the initial Arena size is too +big, and we should do :: amrex.the_arena_init_size=0 -The arena size would then grow as needed with time. There is a suggestion that if the size is -larger than +The arena size would then grow as needed with time.