Distributed dense sparse matrix multiplication in NTPoly #173

yaoyi92 · 2021-09-27T19:38:48Z

I am wondering whether it is possible to perform dense sparse matrix multiplication via NTPoly subroutines. In FHI-aims, we need to perform some matrix multiplication for rotation matrices (sparse) on some other matrices (dense, specifically, the screened Coulomb matrices represented in the auxiliary basis). The dense matrices are distributed in BLACS format. Do you have any suggestions or possibly point me to a subroutine in NTPoly I should look into? Thank you very much in advance.

william-dawson · 2021-09-28T14:54:59Z

Thank you for your question. When NTPoly detects that a block of the matrix is dense, it will automatically switch to a dense multiplication routine. But I am not sure how the performance compares to other libraries in this case.

I would like to prepare an example driver routine for you that shows off this feature. Do you also need help converting from BLACS format matrices to NTPoly?

Also, are your matrices square for these calculations?

yaoyi92 · 2021-09-28T17:59:42Z

Thank you so much for offering the help. Yes, an example would be really helpful.

I believe I can use subroutines in ELSI for the format conversion.
They are square matrices.

bhourahine · 2021-09-28T21:16:38Z

@yaoyi92 are you asking for a dense * sparse -> dense operation? That would also be useful for things like the Sternheimer equation.

yaoyi92 · 2021-09-28T21:54:05Z

@bhourahine, yes, that's exactly the operation I am looking for. My primary goal is to reduce the memory cost for my sparse rotation matrices. The MM timing is also not negligible based on my test. I hope the sparsity can be also used for some speedup if possible.

william-dawson · 2021-09-30T06:53:21Z

Thank you for the clarification. Right now what happens in NTPoly is that when two blocks of the matrix are multiplied, it will measure their sparsity. If both matrices appear quite dense, it will switch to a BLAS call. Otherwise, it uses the sparse matrix multiplication routine I wrote. Unfortunately, that means there is no special optimization for sparse * dense operations. See line 58 of Source/Fortran/sparse_includes/GemmMatrix.f90 for details.

Nonetheless, NTPoly's performance might be good enough for you even if the sparse*sparse routine is employed. And it will definitely help you the memory costs. I've attached a small driver program here which I hope shows you how you can call the multiplication routines.

In the future, it wouldn't be too hard to add a specialized sparse-dense routine at the block level. The key thing will be to develop good benchmarks, since usually sparse - dense multiplication is done in a matrix free way.

!> Demonstrate the matrix multiplication routines.
program sparsedens
  use datatypesmodule
  use loggingmodule
  use matrixmapsmodule
  use pmatrixmemorypoolmodule
  use psmatrixalgebramodule
  use psmatrixmodule
  use processgridmodule
  use timermodule
  use tripletmodule
  use tripletlistmodule
  use mpi
  implicit none
  real(ntreal), parameter :: thresh = 1e-6_NTREAL
  integer, parameter :: dimension = 1024
  type(matrix_ps) :: sparse, dense, product
  type(matrixmemorypool_p) :: pool
  integer :: provided, ierr, ii

  !! setup ntpoly
  CALL mpi_init_thread(MPI_THREAD_SERIALIZED, provided, ierr)
  CALL constructprocessgrid(mpi_comm_world)
  if (isroot()) then
     call activatelogger
  end if

  !! construct two matrices: a sparse one and dense one.
  !! (you can probably ignore these details and just use the elsi conversion)
  call create_dense(dense, dimension)
  call mapmatrix_psr(dense, sparse, sparsify)

  !! print out the matrix properties
  call writeheader("sparse matrix properties")
  call entersublog
     call printmatrixinformation(sparse)
  call exitsublog
  CALL writeheader("dense matrix properties")
  call entersublog
     call printmatrixinformation(dense)
  call exitsublog

  !! in this routine, we will do the actual multiplications.
  call run

  !! cleanup
  call printalltimersdistributed()
  if (isroot()) then
     call deactivatelogger
  end if

  call destructmatrixmemorypool(pool)
  call destructmatrix(sparse)
  call destructmatrix(dense)
  call destructmatrix(product)

  call destructprocessgrid()
  CALL mpi_finalize(ierr)

contains
  !> This subroutine will demonstrate the multiplications
  subroutine run()
    call registertimer("sparse-dense")
    call starttimer("sparse-dense")
    do ii = 1, 20
       !! arguments are matrix A, matrix B, matrix AB (the product),
       !! the threshold for removing small elements, and the memory pool.
       !! The memory pool is an optional parameter, but if you are doing
       !! repeated multiplications it will save you the time spent allocating
       !! work arrays.
       call matrixmultiply(sparse, dense, product, &
                           threshold_in=thresh, memory_pool_in=pool)
    end do
    call stoptimer("sparse-dense")

    call registertimer("dense-sparse")
    call starttimer("dense-sparse")
    do ii = 1, 20
       call matrixmultiply(dense, sparse, product, &
                           threshold_in=thresh, memory_pool_in=pool)
    end do
    call stoptimer("dense-sparse")

    call registertimer("sparse-sparse")
    call starttimer("sparse-sparse")
    do ii = 1, 20
       call matrixmultiply(sparse, sparse, product, &
                           threshold_in=thresh, memory_pool_in=pool)
    end do
    call stoptimer("sparse-sparse")

    call registertimer("dense-dense")
    call starttimer("dense-dense")
    do ii = 1, 20
       call matrixmultiply(dense, dense, product, &
                           threshold_in=thresh, memory_pool_in=pool)
    end do
    call stoptimer("dense-dense")

  end subroutine run

  !> create a random dense matrix.
  subroutine create_dense(dense, dimension)
    !> the matrix to create
    type(matrix_ps), intent(inout) :: dense
    !> the dimension of the matrix
    integer :: dimension
    type(tripletlist_r) :: tlist
    type(triplet_r) :: trip
    integer :: ii, jj

    call constructemptymatrix(dense, dimension)
    call constructtripletlist(tlist)
    do ii = 1, dimension
       do jj = 1, dimension
          trip%point_value = rand()
          trip%index_column = ii
          trip%index_row = jj
          call appendtotripletlist(tlist, trip)
       end do
    end do
    call fillmatrixfromtripletlist(dense, tlist)

    call destructtripletlist(tlist)

  end subroutine create_dense

  !> create a sparse matrix by removing far away elements from a dense one
  function sparsify(row, col, val) result(valid)
     integer, intent(inout), optional :: row
     integer, intent(inout), optional :: col
     real(ntreal), intent(inout), optional :: val
     logical :: valid

     if (abs(row - col) < 20) then
        valid = .true.
        val = rand()
     else
        valid = .false.
     end if
  end function sparsify

end program

yaoyi92 · 2021-09-30T13:48:34Z

Thank you so much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed dense sparse matrix multiplication in NTPoly #173

Distributed dense sparse matrix multiplication in NTPoly #173

yaoyi92 commented Sep 27, 2021

william-dawson commented Sep 28, 2021 •

edited

Loading

yaoyi92 commented Sep 28, 2021

bhourahine commented Sep 28, 2021

yaoyi92 commented Sep 28, 2021

william-dawson commented Sep 30, 2021

yaoyi92 commented Sep 30, 2021

Distributed dense sparse matrix multiplication in NTPoly #173

Distributed dense sparse matrix multiplication in NTPoly #173

Comments

yaoyi92 commented Sep 27, 2021

william-dawson commented Sep 28, 2021 • edited Loading

yaoyi92 commented Sep 28, 2021

bhourahine commented Sep 28, 2021

yaoyi92 commented Sep 28, 2021

william-dawson commented Sep 30, 2021

yaoyi92 commented Sep 30, 2021

william-dawson commented Sep 28, 2021 •

edited

Loading