distributed_reset_configuration failed: python: distributed_interfaces/cutensornet_distributed_interface_mpi.c:44: unpackMpiCommunicator: Assertion `sizeof(MPI_Comm) == comm->commSize' failed. #28
-
Under the following setup. Hardware: INSPUR NF5488M5 (V100 version) When I run
But when I run
I have tried other smaples and those works. |
Beta Was this translation helpful? Give feedback.
Replies: 8 comments 12 replies
-
Dear koichi-tsujino, |
Beta Was this translation helpful? Give feedback.
-
Could you please try building and running the tensornet_example_mpi_auto C sample on your machine (samples inside https://github.com/NVIDIA/cuQuantum/tree/main/samples/cutensornet)? Before running the sample, could you please additionally check the environment variable $CUTENSORNET_COMM_LIB that is supposed to point to the libcutensornet_distributed_interface_mpi.so wrapper library. |
Beta Was this translation helpful? Give feedback.
-
One possible reason why you observe a crash is that the MPI library linked to by the sample you are running is different from the MPI library used by the MPI wrapper libcutensornet_distributed_interface_mpi.so, in case multiple MPI libraries are present in your system. In the meantime, let me try to reproduce your issue locally ... |
Beta Was this translation helpful? Give feedback.
-
On our local machine, the C/C++ sampler tensornet_example_mpi_auto works fine with both MPICH and OpenMPI. I would guess the issue could be related to the Python environment setup or something ... |
Beta Was this translation helpful? Give feedback.
This comment has been hidden.
This comment has been hidden.
-
@koichi-tsujino Questions:
|
Beta Was this translation helpful? Give feedback.
-
Thank you very much for your kind hlep. I have created a new environment, installed cuquantum-python and mpi4py and it works fine. But I have still some questions. Your wrote: But, here openmpi is not included. I don't need openmpi here? Another situation I have: Another point is, In the page, https://docs.nvidia.com/cuda/cuquantum/getting_started.html Note: To enable automatic MPI parallelism for cuTensorNet, you can install cuquantum with an MPI from conda-forge, e.g., conda install -c conda-forge cuquantum openmpi. For detailed guide, please refer to cuTensorNet Guide. Install cuQuantum Python from conda-forge Here I find mpi4py (optional, see mpi4py installation guide) In this page, Now, I understand
|
Beta Was this translation helpful? Give feedback.
-
I have encounterd antho error. When I exutge the following code.
I got.
Do you have any idea how to check the reason? mpi4py is working I believe. The code below works.
|
Beta Was this translation helpful? Give feedback.
@koichi-tsujino Questions:
conda list
?nvidia
channel? Asking becausewhich mpiexec
)