[ITensors] Improve type stability of `svdMPO` and `qn_svdMPO` #1183

terasakisatoshi · 2023-08-24T03:17:24Z

Description

This PR improves the performance of svdMPO and qn_svdMPO which are called from MPO(os::OpSum, sites::Vector{<:Index})::MPO.

My research has revealed the following:

Getting ValType = determineValType(terms(os)) inside the method svdMPO and defining Vs = [Matrix{ValType}(undef, 1, 1) for n in 1:N] without type annotation causes type instability.

Type instability causes the following code sections to run slowly.

ITensors.jl/src/physics/autompo/opsum_to_mpo.jl

Lines 96 to 113 in b54e8d7

    
           if A_row == -1 && A_col == -1 #onsite term 
        
             M[end, 1] += ct 
        
           elseif A_row == -1 #term starting on site n 
        
             for c in 1:size(VR, 2) 
        
               z = ct * VR[A_col, c] 
        
               M[end, 1 + c] += z 
        
             end 
        
           elseif A_col == -1 #term ending on site n 
        
             for r in 1:size(VL, 2) 
        
               z = ct * conj(VL[A_row, r]) 
        
               M[1 + r, 1] += z 
        
             end 
        
           else 
        
             for r in 1:size(VL, 2), c in 1:size(VR, 2) 
        
               z = ct * conj(VL[A_row, r]) * VR[A_col, c] 
        
               M[1 + r, 1 + c] += z 
        
             end 
        
           end

If practical and applicable, please include a minimal demonstration of the previous behavior and new behavior below.

Minimal demonstration of previous behavior

Step1: Prepare the following Python script to generate the test data:

# qph.py
from openfermion import MolecularData
from openfermionpyscf import run_pyscf

# pip3 install quri-parts-openfermion
from quri_parts.openfermion.transforms import jordan_wigner

basis = "6-31g"
multiplicity = 1
charge = 0
distance = 1.6
geometry = [("Li", (0, 0, 0)), ("H", (0, 0, distance))]

pyscf_molecule = MolecularData(geometry, basis, multiplicity, charge)
molecule = run_pyscf(pyscf_molecule)
n_orbs = molecule.n_orbitals
n_elecs = molecule.n_electrons


op_mapper = jordan_wigner.get_of_operator_mapper()

n_active_orbs = 8

qubit_count = 2 * n_active_orbs
hamiltonian = molecule.get_molecular_hamiltonian(
    range(n_elecs // 2), range(n_active_orbs)
)
qp_h = op_mapper(hamiltonian)

li = []

for pauli, c in qp_h.items():
    ps = []
    for i, p in pauli:
        ps.append((i, p))
    li.append(pauli)

print(f"N={qubit_count}")
print("paulis = [")
for e in sorted(li, key=lambda g: len(g)):
    if len(e) == 0:
        continue
    sorted_list = sorted(list(e), key=lambda e: e[0])
    sorted_list = [(1 + s[0], s[1].name) for s in sorted_list]
    out = "    ["
    seqs = []
    for s1, s2 in sorted_list:
        seqs.append('"' + s2 + '"')
        seqs.append(str(s1))
    out += ",".join(seqs)
    out += "],"
    print(out)
print("]")

Step2: Run the following command:

python qph.py > paulis.jl

It will create a file paulis.jl.

You can also find paulis.jl from my gist.

Step3: Prepare the following Julia script mpo_bench.jl to show our Pull Request improves performance for creating MPO.

# mpo_bench.jl 
using ITensors

include("paulis.jl") # export paulis, N

for conserve_qns in [true, false]
  sites = siteinds("Qubit", N; conserve_qns)
  os = ITensors.OpSum()

  for p in paulis
    coeff = rand()
    os += ITensors.Ops.op_term((coeff, p...))
  end

  @time MPO(os, sites)
  @time MPO(os, sites)
  @time MPO(os, sites)
end

On main branch we have:

$ julia mpo_bench.jl 
 26.777017 seconds (84.34 M allocations: 4.188 GiB, 5.44% gc time, 90.48% compilation time)
  2.426720 seconds (44.33 M allocations: 1.601 GiB, 14.37% gc time)
  2.540036 seconds (44.33 M allocations: 1.601 GiB, 13.92% gc time)
 29.800494 seconds (263.40 M allocations: 10.456 GiB, 8.58% gc time, 36.43% compilation time)
 20.952781 seconds (246.38 M allocations: 9.343 GiB, 9.42% gc time)
 16.334621 seconds (246.38 M allocations: 9.343 GiB, 9.50% gc time)

Minimal demonstration of new behavior

Our new pull request reduces the number of allocations and improves the performance of getting MPO(os, site).

julia mpo_bench.jl 
 26.626078 seconds (44.08 M allocations: 3.383 GiB, 5.20% gc time, 95.01% compilation time)
  1.958892 seconds (6.27 M allocations: 968.143 MiB, 17.07% gc time)
  1.869094 seconds (6.27 M allocations: 968.143 MiB, 16.23% gc time)
 20.243849 seconds (21.06 M allocations: 6.434 GiB, 5.73% gc time, 46.55% compilation time)
 11.731973 seconds (3.96 M allocations: 5.315 GiB, 6.59% gc time)
  9.871340 seconds (3.96 M allocations: 5.315 GiB, 5.76% gc time)

How Has This Been Tested?

$ julia -q
julia> ]
] activate
] instantiate
] test

Checklist:

[ x ] My code follows the style guidelines of this project. Please run using JuliaFormatter; format(".") in the base directory of the repository (~/.julia/dev/ITensors) to format your code according to our style guidelines.
[ x ] I have performed a self-review of my own code.
[ x ] I have commented my code, particularly in hard-to-understand areas.
[ ] I have added tests that verify the behavior of the changes I made.
- This PR does not break user interface.
[ ] I have made corresponding changes to the documentation.
[ x ] My changes generate no new warnings.
[ ] Any dependent changes have been merged and published in downstream modules.
- No additional dependencies are required for this PR.

This change leads to significant improvements in terms of performance

codecov-commenter · 2023-08-24T03:46:48Z

Codecov Report

Patch coverage: 87.27% and project coverage change: -18.03% ⚠️

Comparison is base (b54e8d7) 85.40% compared to head (0a53cec) 67.37%.
Report is 1 commits behind head on main.

❗ Current head 0a53cec differs from pull request most recent head 8e8e85b. Consider uploading reports for the commit 8e8e85b to get more accurate results

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the GitHub App Integration for your organization. Read more.

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1183       +/-   ##
===========================================
- Coverage   85.40%   67.37%   -18.03%     
===========================================
  Files          88       87        -1     
  Lines        8405     8389       -16     
===========================================
- Hits         7178     5652     -1526     
- Misses       1227     2737     +1510

Files Changed	Coverage Δ
src/physics/autompo/opsum_to_mpo.jl	`0.00% <0.00%> (-100.00%)`	⬇️
src/qn/flux.jl	`91.66% <81.81%> (-8.34%)`	⬇️
src/global_variables.jl	`100.00% <100.00%> (ø)`
src/physics/autompo/opsum_to_mpo_qn.jl	`97.66% <100.00%> (-1.75%)`	⬇️
src/qn/qnitensor.jl	`69.54% <100.00%> (-14.80%)`	⬇️
src/tensor_operations/tensor_algebra.jl	`87.27% <100.00%> (-2.59%)`	⬇️

... and 32 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mtfishman · 2023-08-25T20:08:30Z

Hi @terasakisatoshi, thanks a lot for the PR. Nice to see a relatively small change leads to big improvements in speed and memory allocations.

Your pull request inspired me to make my own here: #1184 that essentially implements the same design you suggest of creating a function barrier, but also updates some of the style of the surrounding code which I noticed was a bit outdated. Would you mind benchmarking that PR to see if it leads to the same performance improvements as this PR for your use case?

Also, it's neat to see that you are passing Hamiltonians generated by OpenFermion into ITensor. We have similar functionality in an experimental package called ITensorChemistry.jl which calls out to PySCF using PythonCall.jl. We found that generating the MPOs for generic quantum chemistry Hamiltonians can scale pretty badly with the number of orbitals, which was one of the inspirations for the package ITensorParallel.jl, which is based around splitting up the Hamiltonian into parts and generating sums of MPOs instead of a single MPO, which can be more efficient.

terasakisatoshi · 2023-08-26T04:40:21Z

Also, it's neat to see that you are passing Hamiltonians generated by OpenFermion into ITensor. We have similar functionality in an experimental package called ITensorChemistry.jl which calls out to PySCF using PythonCall.jl. We found that generating the MPOs for generic quantum chemistry Hamiltonians can scale pretty badly with the number of orbitals, which was one of the inspirations for the package ITensorParallel.jl, which is based around splitting up the Hamiltonian into parts and generating sums of MPOs instead of a single MPO, which can be more efficient.

This advice is very helpful to me. I would definitely like to try the package you suggested.

src/physics/autompo/opsum_to_mpo.jl

src/physics/autompo/opsum_to_mpo_qn.jl

emstoudenmire · 2023-08-29T02:06:10Z

Thanks for catching this issue!

Co-authored-by: Matt Fishman <[email protected]>

src/physics/autompo/opsum_to_mpo.jl

src/physics/autompo/opsum_to_mpo_qn.jl

mtfishman · 2023-08-30T15:26:01Z

@terasakisatoshi I made some suggestions for comments to add to remind us why the code is written the way it is. Besides that, it looks good from my end.

add a comment Co-authored-by: Matt Fishman <[email protected]>

terasakisatoshi · 2023-09-01T03:08:39Z

It's time to merge?

terasakisatoshi added 3 commits August 24, 2023 11:38

use function barrier to improve type stability

324c91d

annotate eltype

bf4e854

This change leads to significant improvements in terms of performance

remove trailing spaces

c52861c

mtfishman mentioned this pull request Aug 25, 2023

[ITensors] OpSum to MPO conversion type stability #1184

Closed