Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting an Numpy not find error during run #12

Closed
benlee0423 opened this issue Mar 20, 2024 · 13 comments
Closed

Getting an Numpy not find error during run #12

benlee0423 opened this issue Mar 20, 2024 · 13 comments
Assignees

Comments

@benlee0423
Copy link
Contributor

benlee0423 commented Mar 20, 2024

Command to run the image

singularity run --bind /home/ubuntu/workspace/AWI_09_004:/ngen/ngen/data ciroh-ngen-singularity.sif "/ngen/ngen/data auto"

Command to run inside running image

mpirun --allow-run-as-root -n 2 /dmod/bin/ngen-parallel ./config/datastream.gpkg all ./config/datastream.gpkg all ./config/realization.json ./partitions_2.json 

Error message

Running NextGen model framework in parallel mode
Found paritions file! ./partitions_2.json
NGen Framework 0.1.0
NGen Framework 0.1.0
terminate called after throwing an instance of 'pybind11::error_already_set'
terminate called after throwing an instance of 'pybind11::error_already_set'
  what():  ModuleNotFoundError: No module named 'numpy'
  what():  ModuleNotFoundError: No module named 'numpy'

Some path variables inside running image

Singularity> module show mpi
-------------------------------------------------------------------------------------------------------------------------------------------------------------
   /usr/share/modulefiles/mpi/openmpi-x86_64:
-------------------------------------------------------------------------------------------------------------------------------------------------------------
conflict("mpi")
prepend_path("PATH","/usr/lib64/openmpi/bin")
prepend_path("LD_LIBRARY_PATH","/usr/lib64/openmpi/lib")
prepend_path("PKG_CONFIG_PATH","/usr/lib64/openmpi/lib/pkgconfig")
prepend_path("MANPATH",":/usr/share/man/openmpi-x86_64")
setenv("MPI_BIN","/usr/lib64/openmpi/bin")
setenv("MPI_SYSCONFIG","/etc/openmpi-x86_64")
setenv("MPI_FORTRAN_MOD_DIR","/usr/lib64/gfortran/modules/openmpi")
setenv("MPI_INCLUDE","/usr/include/openmpi-x86_64")
setenv("MPI_LIB","/usr/lib64/openmpi/lib")
setenv("MPI_MAN","/usr/share/man/openmpi-x86_64")
setenv("MPI_PYTHON3_SITEARCH","/usr/lib64/python3.9/site-packages/openmpi")
setenv("MPI_COMPILER","openmpi-x86_64")
setenv("MPI_SUFFIX","_openmpi")
setenv("MPI_HOME","/usr/lib64/openmpi")

Singularity> cat /usr/share/modulefiles/mpi/openmpi-x86_64
#%Module 1.0
#
#  OpenMPI module for use with 'environment-modules' package:
#
conflict		mpi
prepend-path 		PATH 		/usr/lib64/openmpi/bin
prepend-path 		LD_LIBRARY_PATH /usr/lib64/openmpi/lib
prepend-path 		PKG_CONFIG_PATH	/usr/lib64/openmpi/lib/pkgconfig
prepend-path		MANPATH		:/usr/share/man/openmpi-x86_64
setenv 			MPI_BIN		/usr/lib64/openmpi/bin
setenv			MPI_SYSCONFIG	/etc/openmpi-x86_64
setenv			MPI_FORTRAN_MOD_DIR	/usr/lib64/gfortran/modules/openmpi
setenv			MPI_INCLUDE	/usr/include/openmpi-x86_64
setenv	 		MPI_LIB		/usr/lib64/openmpi/lib
setenv			MPI_MAN		/usr/share/man/openmpi-x86_64
setenv			MPI_PYTHON3_SITEARCH	/usr/lib64/python3.9/site-packages/openmpi
setenv			MPI_COMPILER	openmpi-x86_64
setenv			MPI_SUFFIX	_openmpi
setenv	 		MPI_HOME	/usr/lib64/openmpi

Linux:

$ uname -a
Linux ip-172-31-65-149 6.5.0-1014-aws #14~22.04.1-Ubuntu SMP Thu Feb 15 15:27:06 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
@benlee0423
Copy link
Contributor Author

Obviously, this is due to the following path in module. Not sure how to fix it.

Singularity> ls /usr/lib64/python3.9/site-packages/openmpi
ls: cannot access '/usr/lib64/python3.9/site-packages/openmpi': No such file or directory

@benlee0423
Copy link
Contributor Author

-- NGen version: 0.1.0
-- Build configuration summary:
-- Generator: Unix Makefiles
-- Build type:
-- System: Linux
-- C Compiler: /usr/bin/cc
-- C Flags:
-- CXX Compiler: /usr/bin/c++
-- CXX Flags:
-- Flags:
-- NGEN_WITH_MPI: OFF
-- NGEN_WITH_NETCDF: ON
-- NGEN_WITH_SQLITE: ON
-- NGEN_WITH_UDUNITS: ON
-- NGEN_WITH_BMI_FORTRAN: ON
-- NGEN_WITH_BMI_C: ON
-- NGEN_WITH_PYTHON: ON
-- NGEN_WITH_ROUTING: ON
-- NGEN_WITH_TESTS: ON
-- NGEN_WITH_COVERAGE: OFF
-- NGEN_QUIET: ON
-- Extern Models:
-- NGEN_WITH_EXTERN_ALL: OFF
-- NGEN_WITH_EXTERN_SLOTH: ON
-- NGEN_WITH_EXTERN_TOPMODEL: ON
-- NGEN_WITH_EXTERN_CFE: ON
-- NGEN_WITH_EXTERN_PET: ON
-- NGEN_WITH_EXTERN_NOAH_OWP_MODULAR: ON
-- Environment summary:
-- Boost:
-- Version: 1.79.0
-- Include: /usr/include
-- NetCDF:
-- Version: 4.8.1
-- Library: /usr/lib64/libnetcdf.so
-- Library (CXX): /usr/local/lib64/libnetcdf-cxx4.so
-- Include: /usr/include
-- Include (CXX): /usr/local/include
-- Parallel: FALSE
-- SQLite:
-- Version: 3.34.1
-- Library: /usr/lib64/libsqlite3.so
-- Include: /usr/include
-- UDUNITS2:
-- Library: /usr/lib64/libudunits2.so
-- Include: /usr/include/udunits2
-- Fortran:
-- BMI_FORTRAN_ISO_C_LIB_PATH:
-- BMI_FORTRAN_ISO_C_LIB_NAME: OFF
-- BMI_FORTRAN_ISO_C_LIB_DIR: OFF
-- Python:
-- Version: 3.9.18
-- Virtual Env:
-- Executable: /usr/bin/python3.9
-- Interpreter Type: Python
-- Site Library: /usr/lib/python3.9/site-packages
-- Include: /usr/include/python3.9
-- Runtime Library: /usr/lib64
-- NumPy Version: 1.26.4
-- NumPy Include: /usr/local/lib64/python3.9/site-packages/numpy/core/include
-- pybind11 Version:
-- pybind11 Include: /ngen/extern/pybind11/include


-- Configuring done

@benlee0423
Copy link
Contributor Author

This looks like the similar issue raised by Trupesh.
NOAA-OWP/ngen#655

@hellkite500
Copy link

Can you run

ldd /usr/bin/python3.9

In the container runtime?

@benlee0423
Copy link
Contributor Author

Singularity> ldd /usr/bin/python3.9
	linux-vdso.so.1 (0x00007ffc11394000)
	libpython3.9.so.1.0 => /lib64/libpython3.9.so.1.0 (0x00007f71c2c62000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f71c2a59000)
	libm.so.6 => /lib64/libm.so.6 (0x00007f71c297e000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f71c2fd4000)

@hellkite500
Copy link

So this isn't the same issue referenced above (NOAA-OWP/ngen#655) which is caused by the python interpreter being statically linked.

Best guess is a path problem. This looks suspicious:

-- NumPy Include: /usr/local/lib64/python3.9/site-packages/numpy/core/include

It looks like numpy is installed/found in

/usr/local/

Whereas the python path is

-- Site Library: /usr/lib/python3.9/site-packages

Can you simply open a python interpreter in the container and import numpy?

@benlee0423
Copy link
Contributor Author

benlee0423 commented Mar 22, 2024

Able to import numpy in python.

Singularity> python
Python 3.9.18 (main, Jan  4 2024, 00:00:00) 
[GCC 11.4.1 20230605 (Red Hat 11.4.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> array = numpy.array([1,2,3,4,5])
>>> print(array)
[1 2 3 4 5]

Also numpy is found in the location.

Singularity> ls /usr/local/lib64/python3.9/site-packages/numpy/core/include
numpy

python location

Singularity> whereis python
python: /usr/bin/python

no numpy in /usr/lib/python3.9/site-packages

Singularity> ls /usr/lib/python3.9/site-packages
__pycache__      distutils-precedence.pth  mockbuild                 pip-21.2.3.dist-info       python_dateutil-2.8.1-py3.9.egg-info  six.py
_distutils_hack  dnf                       packaging                 pkg_resources              setuptools
asciidocapi.py   dnf-plugins               packaging-20.9.dist-info  pyparsing-2.4.7.dist-info  setuptools-53.0.0.dist-info
dateutil         dnfpluginscore            pip                       pyparsing.py               six-1.15.0.dist-info
Singularity> ls -l /usr/bin/python
lrwxrwxrwx 1 root root 16 Mar 22 02:17 /usr/bin/python -> /usr/bin/python3

@hellkite500
Copy link

Have you tried using a virtual environment for building and running ngen with?

@benlee0423
Copy link
Contributor Author

No virtual environment is used.

@hellkite500
Copy link

What ngen commit are you building? A pybind update was merged a couple days ago.

NOAA-OWP/ngen#755

@benlee0423
Copy link
Contributor Author

I just built the image with ngen master branch.
And, getting the same error.

@benlee0423
Copy link
Contributor Author

Getting the same error in docker build as well.

#17 48.05 terminate called after throwing an instance of 'pybind11::error_already_set'
#17 48.05   what():  ModuleNotFoundError: No module named 'numpy'
#17 48.18 [ 64%] Built target test_geojson
#17 48.18 CMake Error at /usr/local/lib64/python3.9/site-packages/cmake/data/share/cmake-3.28/Modules/GoogleTestAddTests.cmake:112 (message):
#17 48.18   Error running test executable.
#17 48.18 
#17 48.18     Path: '/ngen/ngen/cmake_build_serial/test/test_routing_pybind'
#17 48.18     Result: Subprocess aborted
#17 48.18     Output:

@benlee0423
Copy link
Contributor Author

Build and run are successful with commit id f91e2ea of ngen repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants