Need help in Running a 37 Qubit Simulation using multi-gpu multi-node on Supercomputer #170
Replies: 5 comments 9 replies
-
mpiexec -n 8 --bind-to none --map-by node --oversubscribe -x UCX_TLS=^cma --mca coll_hcoll_enable 0 -x OMPI_MCA_coll_hcoll_enable=0 cuquantum-benchmarks circuit --frontend qiskit --backend cusvaer --benchmark quantum_volume --nqubits 35 --ngpus 1 --cusvaer-global-index-bits 1,1 --cusvaer-p2p-device-bits 1 2025-01-01 09:12:57,546 INFO * Running quantum_volume with 1 GPUs, and 35 qubits [qiskit-v1.0.2 | cusvaer-v0.4.0]: |
Beta Was this translation helpful? Give feedback.
-
mpiexec -n 32 --bind-to none --map-by node --oversubscribe -x UCX_TLS=^cma --mca coll_hcoll_enable 0 -x OMPI_MCA_coll_hcoll_enable=0 cuquantum-benchmarks circuit --frontend qiskit --backend cusvaer --benchmark quantum_volume --nqubits 35 --ngpus 1 --cusvaer-global-index-bits 3,1 --cusvaer-p2p-device-bits 3 During handling of the above exception, another exception occurred: Traceback (most recent call last): |
Beta Was this translation helpful? Give feedback.
-
Hi @silicofeller, |
Beta Was this translation helpful? Give feedback.
-
Summary of MPI and CUDA-Awareness Issues on Supercomputer Cluster Problem Statement: We are attempting to run a 36-qubit quantum volume simulation using cuquantum-benchmarks on the supercomputer cluster, leveraging NVIDIA's cuQuantum Appliance. The setup involves multiple nodes with NVIDIA A100-SXM4-40GB GPUs. Here are the key issues encountered: Memory Allocation: CUDA Awareness in Open MPI: MPI Configuration and Binding: Network and Host Issues: Munge Configuration in PMIx: Actions Taken:
Request for Resolution from NVIDIA: We kindly request assistance from NVIDIA with the following: Memory Optimization for Simulations: Guidance on optimizing the cuquantum-benchmarks for higher qubit counts or alternative simulation strategies that might be less memory-intensive. CUDA-Aware MPI Configuration: Resolution of Munge/PMIx Issues: Documentation or Known Issues: Best Practices for Node and GPU Management: We appreciate any insights, configurations, or patches that could help us overcome these technical hurdles and successfully run our quantum simulations. |
Beta Was this translation helpful? Give feedback.
-
@ymagchi Is it possible that we can get on a quick zoom call at your preferred time? |
Beta Was this translation helpful? Give feedback.
-
I have 4 Nodes of 8 A100 GPUs each (40GB) This is the code I have run:
mpiexec -n 16 --bind-to none --map-by node --oversubscribe -x UCX_TLS=^cma --mca coll_hcoll_enable 0 -x OMPI_MCA_coll_hcoll_enable=0 cuquantum-benchmarks circuit --frontend qiskit --backend cusvaer --benchmark quantum_volume --nqubits 36 --precision single --ngpus 1 --cusvaer-global-index-bits 3,1 --cusvaer-p2p-device-bits 3
File "/opt/conda/envs/cuquantum-24.03/bin/cuquantum-benchmarks", line 8, in
sys.exit(run())
File "/opt/conda/envs/cuquantum-24.03/lib/python3.10/site-packages/cuquantum_benchmarks/run.py", line 335, in run
runner.run()
File "/opt/conda/envs/cuquantum-24.03/lib/python3.10/site-packages/cuquantum_benchmarks/run_interface.py", line 92, in run
self._run()
File "/opt/conda/envs/cuquantum-24.03/lib/python3.10/site-packages/cuquantum_benchmarks/run_interface.py", line 304, in _run
preprocess_data = backend.preprocess_circuit(
File "/opt/conda/envs/cuquantum-24.03/lib/python3.10/site-packages/cuquantum_benchmarks/backends/backend_qiskit.py", line 61, in preprocess_circuit
self.transpiled_qc = qiskit.transpile(circuit, self.backend) # (circuit, basis_gates=['u3', 'cx'], backend=self.backend)
File "/opt/conda/envs/cuquantum-24.03/lib/python3.10/site-packages/qiskit/compiler/transpiler.py", line 341, in transpile
_check_circuits_coupling_map(circuits, coupling_map, backend)
File "/opt/conda/envs/cuquantum-24.03/lib/python3.10/site-packages/qiskit/compiler/transpiler.py", line 455, in _check_circuits_coupling_map
raise CircuitTooWideForTarget(
qiskit.transpiler.exceptions.CircuitTooWideForTarget: 'Number of qubits (36) in circuit-158 is greater than maximum (35) in the coupling_map'
Traceback (most recent call last):
File "/opt/conda/envs/cuquantum-24.03/bin/cuquantum-benchmarks", line 8, in
sys.exit(run())
File "/opt/conda/envs/cuquantum-24.03/lib/python3.10/site-packages/cuquantum_benchmarks/run.py", line 335, in run
runner.run()
File "/opt/conda/envs/cuquantum-24.03/lib/python3.10/site-packages/cuquantum_benchmarks/run_interface.py", line 92, in run
self._run()
File "/opt/conda/envs/cuquantum-24.03/lib/python3.10/site-packages/cuquantum_benchmarks/run_interface.py", line 304, in _run
preprocess_data = backend.preprocess_circuit(
File "/opt/conda/envs/cuquantum-24.03/lib/python3.10/site-packages/cuquantum_benchmarks/backends/backend_qiskit.py", line 61, in preprocess_circuit
self.transpiled_qc = qiskit.transpile(circuit, self.backend) # (circuit, basis_gates=['u3', 'cx'], backend=self.backend)
File "/opt/conda/envs/cuquantum-24.03/lib/python3.10/site-packages/qiskit/compiler/transpiler.py", line 341, in transpile
_check_circuits_coupling_map(circuits, coupling_map, backend)
File "/opt/conda/envs/cuquantum-24.03/lib/python3.10/site-packages/qiskit/compiler/transpiler.py", line 455, in _check_circuits_coupling_map
raise CircuitTooWideForTarget(
qiskit.transpiler.exceptions.CircuitTooWideForTarget: 'Number of qubits (36) in circuit-158 is greater than maximum (35) in the coupling_map'
2025-01-01 09:51:02,316 INFO * Running quantum_volume with 1 GPUs, and 36 qubits [qiskit-v1.0.2 | cusvaer-v0.4.0]:
Traceback (most recent call last):
File "/opt/conda/envs/cuquantum-24.03/bin/cuquantum-benchmarks", line 8, in
sys.exit(run())
File "/opt/conda/envs/cuquantum-24.03/lib/python3.10/site-packages/cuquantum_benchmarks/run.py", line 335, in run
runner.run()
File "/opt/conda/envs/cuquantum-24.03/lib/python3.10/site-packages/cuquantum_benchmarks/run_interface.py", line 92, in run
self._run()
File "/opt/conda/envs/cuquantum-24.03/lib/python3.10/site-packages/cuquantum_benchmarks/run_interface.py", line 304, in _run
preprocess_data = backend.preprocess_circuit(
File "/opt/conda/envs/cuquantum-24.03/lib/python3.10/site-packages/cuquantum_benchmarks/backends/backend_qiskit.py", line 61, in preprocess_circuit
self.transpiled_qc = qiskit.transpile(circuit, self.backend) # (circuit, basis_gates=['u3', 'cx'], backend=self.backend)
File "/opt/conda/envs/cuquantum-24.03/lib/python3.10/site-packages/qiskit/compiler/transpiler.py", line 341, in transpile
_check_circuits_coupling_map(circuits, coupling_map, backend)
File "/opt/conda/envs/cuquantum-24.03/lib/python3.10/site-packages/qiskit/compiler/transpiler.py", line 455, in _check_circuits_coupling_map
raise CircuitTooWideForTarget(
qiskit.transpiler.exceptions.CircuitTooWideForTarget: 'Number of qubits (36) in circuit-158 is greater than maximum (35) in the coupling_map'
Beta Was this translation helpful? Give feedback.
All reactions