Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternatives to lapacke.h #300

Open
LecrisUT opened this issue Nov 22, 2024 · 7 comments
Open

Alternatives to lapacke.h #300

LecrisUT opened this issue Nov 22, 2024 · 7 comments

Comments

@LecrisUT
Copy link
Collaborator

I think it is good to consider the new C++26 proposal for <linalg>, specifically section 9.2

Nevertheless, we have excluded LAPACK-like functionality from this proposal, for the following reasons:

  1. LAPACK is a Fortran library, unlike the BLAS, which is a multilanguage standard.
  2. We intend to support more general element types, beyond the four that LAPACK supports. It’s much more straightforward to make a C++ BLAS work for general element types, than to make LAPACK algorithms work generically.

First, unlike the BLAS, LAPACK is a Fortran library, not a standard. LAPACK was developed concurrently with the “level 3” BLAS functions, and the two projects share contributors. Nevertheless, only the BLAS and not LAPACK got standardized

...

For these reasons, we have left LAPACK-like functionality for future work. It would be natural for a future LAPACK-like C++ library to build on our proposal.

It seems unlikely that LAPACK interface would be ported to standard library and when BLAS is ported to C++ std, I don't believe the BLAS provider projects would have much benefit. Even currently there are issues:

  • lapacke.h is not packaged in all providers
  • distro packages are outdated in their build system and provides for pkg-config support
  • FindLAPACK.cmake does not actually add the relevant include to work properly

The paper recommends other C++ native libraries like Armadillo, Eigen3, etc. It might be useful to look into how the support for these would look like. But in the meantime let's discuss a plan of action

Short-term

Continue to use lapacke.h and test for the presence of the header, and fail the build if it's not present. The user would have to define themselves overrides like BLA_VENDOR to select the vendor that is compatible like OpenBLAS. There are various bugs that need to be addressed upstream:

  • distro packages should add the pkg-config files
  • FindLAPACK.cmake should have an interface to override the include directory
  • the phono3py would need an intermediate support package until these issues are fixed upstream

With regards to the packaged wheels, these will contain bundled BLAS and LAPACK implementations compatible with cibuildwheel. How well this works, we will have to find out and work with cibuildwheel folks. But hopefully the user can rely on the sdist builds to work on their arbitrary environments.

Future

For the future support we should consider some other options:

  • bundling lapacke wrappers
  • alternative C++ eigensolver packages that can be FetchContent in case the user does not have the package
  • using scipy or other PyPI friendly packages
  • splitting the performance critical backend from the python interface

If it's possible to get some performance benchmarking or profiling that would be very helpful for guiding which approach is most suitable.

@atztogo
Copy link
Member

atztogo commented Nov 23, 2024

I think, probably, we can avoid calling LAPACKE routines in the C code in phono3py. There are two types of calling those routines:

  1. Calling a function that blocks calculation (diagonalization, pinv).
  2. Many small callings of a function that are expected running in parallel. In this case, single thread BLAS should be used because OpenMP is used to call the function concurrently in my implementation. If I remember correctly, this way of using is only the following part of diagonalization (zheev) of dynamical matrices over many q-points:
    info = phonopy_zheev(freqs_tmp, eigenvectors + num_band * num_band * gp,

For 1, calling the LAPACK routines via scipy is fine if the BLAS is multithreaded. In old days, installation of scipy was sometimes difficult, so I tried to avoid relying on it. But now I feel OK to depend on scipy, so except for experimental usage (https://phonopy.github.io/phono3py/direct-solution.html#solver-choice-for-diagonalization), use of LAPACKE may be avoided at least for the release version.

For 2, currently I have no idea how to call selectively the multithreaded and single thread BLASs in one installation of phono3py. So I usually choose multithreaded BLAS, and the zheev in the above case is called sequentially over q-points. This diagonalization can be replaced by using numpy eigh in python. I am not sure the performance calling numpy over many q-points (i.e., overhead of python and wrapping LAPACK in numpy), but if this part is realized as not a bottle neck of the phono3py calculation, we can avoid using LAPACKE.

Overall, it is a good challenge to avoid using LAPACKE in a way as written above. Simply I had no reason to try it.

@atztogo
Copy link
Member

atztogo commented Nov 23, 2024

@LecrisUT, I have made an option to remove the BLAS and LAPACKE calls in the C code (#301), and the libraries are unnecessary to be installed and linked. While I haven't conducted an extensive performance evaluation, it seems not bad. I believe this version is suitable for use in both conda and pip wheel packages. Although CMakeLists.txt became even more dirty, but with the following way, phono3py is compiled with this option:

% BUILD_WITHOUT_LAPACKE=ON pip install -e . -vvv 

@LecrisUT
Copy link
Collaborator Author

Ok, I'll look into the changes and rebase. BTW scipy is already an indirect dependency because phono3py -> phonopy -> scipy. So for now we should continue having both in parallel? For cibuildwheel is the scipy/numpy alternative complete, and should it include the lapack approach or just the numpy?

@atztogo
Copy link
Member

atztogo commented Nov 23, 2024

Yes, scipy is absolutely necessary for phono3py if we don't use LAPACKE.

For cibuildwheel is the scipy/numpy alternative complete, and should it include the lapack approach or just the numpy?

It is difficult to understand this sentence... The lapack approach is unnecessary.

@LecrisUT
Copy link
Collaborator Author

For cibuildwheel is the scipy/numpy alternative complete, and should it include the lapack approach or just the numpy?

It is difficult to understand this sentence... The lapack approach is unnecessary.

I am just checking if all the lapacke calls have a redirect to numpy/scipy in that recent PR so that the user would not have a run failure when it's not built with lapack support.

@atztogo
Copy link
Member

atztogo commented Nov 24, 2024

the user would not have a run failure when it's not built with lapack support.

I think so. The following is the test,
https://github.com/phonopy/phono3py/blob/develop/.github/workflows/phono3py-pytest-conda-nolapacke.yml
where lapacke is not used:

  loading initial cache file /tmp/tmpmyqadjbh/build/CMakeInit.txt
  -- Build nanobind module of phono3py
  -- The C compiler identification is GNU 13.3.0
  -- The CXX compiler identification is GNU 13.3.0
  -- Detecting C compiler ABI info
  -- Detecting C compiler ABI info - done
  -- Check for working C compiler: /home/runner/miniconda3/envs/test/bin/x86_64-conda-linux-gnu-cc - skipped
  -- Detecting C compile features
  -- Detecting C compile features - done
  -- Detecting CXX compiler ABI info
  -- Detecting CXX compiler ABI info - done
  -- Check for working CXX compiler: /home/runner/miniconda3/envs/test/bin/x86_64-conda-linux-gnu-c++ - skipped
  -- Detecting CXX compile features
  -- Detecting CXX compile features - done
  -- Found Python: /home/runner/miniconda3/envs/test/bin/python3.12 (found suitable version "3.12.7", minimum required is "3.8") found components: Interpreter Development.Module Development.SABIModule
  -- Build type: Release
  -- CMAKE_SYSTEM_PREFIX_PATH: /home/runner/miniconda3/envs/test/bin/../x86_64-conda-linux-gnu/sysroot/usr;/usr/local;/usr;/;/home/runner/miniconda3/envs/test;/tmp/tmpmyqadjbh/wheel/platlib;/usr/X11R6;/usr/pkg;/opt;/home/runner/miniconda3/envs/test
  -- /home/runner/miniconda3/envs/test
  -- Find OpenMP library
  -- Found OpenMP_C: -fopenmp (found version "4.5")
  -- Found OpenMP_CXX: -fopenmp (found version "4.5")
  -- Found OpenMP: TRUE (found version "4.5")
  -- OpenMP libs: /home/runner/miniconda3/envs/test/lib/libgomp.so;/home/runner/miniconda3/envs/test/x86_64-conda-linux-gnu/sysroot/usr/lib/libpthread.so
  -- OpenMP flags: -fopenmp
  -- Configuring done (1.0s)
  -- Generating done (0.0s)
  -- Build files have been written to: /tmp/tmpmyqadjbh/build

The test with lapacke (https://github.com/phonopy/phono3py/blob/develop/.github/workflows/phono3py-pytest-conda.yml)

  loading initial cache file /tmp/tmpq8dtrmuy/build/CMakeInit.txt
  -- Build nanobind module of phono3py
  -- The C compiler identification is GNU 13.3.0
  -- The CXX compiler identification is GNU 13.3.0
  -- Detecting C compiler ABI info
  -- Detecting C compiler ABI info - done
  -- Check for working C compiler: /home/runner/miniconda3/envs/test/bin/x86_64-conda-linux-gnu-cc - skipped
  -- Detecting C compile features
  -- Detecting C compile features - done
  -- Detecting CXX compiler ABI info
  -- Detecting CXX compiler ABI info - done
  -- Check for working CXX compiler: /home/runner/miniconda3/envs/test/bin/x86_64-conda-linux-gnu-c++ - skipped
  -- Detecting CXX compile features
  -- Detecting CXX compile features - done
  -- Found Python: /home/runner/miniconda3/envs/test/bin/python3.12 (found suitable version "3.12.7", minimum required is "3.8") found components: Interpreter Development.Module Development.SABIModule
  -- Build type: Release
  -- CMAKE_SYSTEM_PREFIX_PATH: /home/runner/miniconda3/envs/test/bin/../x86_64-conda-linux-gnu/sysroot/usr;/usr/local;/usr;/;/home/runner/miniconda3/envs/test;/tmp/tmpq8dtrmuy/wheel/platlib;/usr/X11R6;/usr/pkg;/opt;/home/runner/miniconda3/envs/test
  -- /home/runner/miniconda3/envs/test
  -- Find OpenMP library
  -- Found OpenMP_C: -fopenmp (found version "4.5")
  -- Found OpenMP_CXX: -fopenmp (found version "4.5")
  -- Found OpenMP: TRUE (found version "4.5")
  -- OpenMP libs: /home/runner/miniconda3/envs/test/lib/libgomp.so;/home/runner/miniconda3/envs/test/x86_64-conda-linux-gnu/sysroot/usr/lib/libpthread.so
  -- OpenMP flags: -fopenmp
  -- Looking for sgemm_
  -- Looking for sgemm_ - not found
  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
  -- Looking for pthread_create in pthreads
  -- Looking for pthread_create in pthreads - not found
  -- Looking for pthread_create in pthread
  -- Looking for pthread_create in pthread - found
  -- Found Threads: TRUE
  -- Looking for sgemm_
  -- Looking for sgemm_ - found
  -- Found BLAS: /home/runner/miniconda3/envs/test/lib/libopenblas.so
  -- BLAS libs: /home/runner/miniconda3/envs/test/lib/libopenblas.so
  -- BLAS flags:
  -- Looking for cheev_
  -- Looking for cheev_ - found
  -- Found LAPACK: /home/runner/miniconda3/envs/test/lib/libopenblas.so;-lpthread;-lm;-ldl
  -- LAPACK libs: /home/runner/miniconda3/envs/test/lib/libopenblas.so;-lpthread;-lm;-ldl
  -- LAPACK flags:
  -- OpenBLAS detected.
  -- Set C-macro MULTITHREADED_BLAS to avoid nested OpenMP calls.
  -- Configuring done (1.5s)
  -- Generating done (0.0s)
  -- Build files have been written to: /tmp/tmpq8dtrmuy/build

@atztogo
Copy link
Member

atztogo commented Dec 26, 2024

@LecrisUT, in the latest release of phono3py, #301 was made as the default compilation setting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants