Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ITensors] Improve type stability of svdMPO and qn_svdMPO #1183

Merged

Conversation

terasakisatoshi
Copy link
Contributor

@terasakisatoshi terasakisatoshi commented Aug 24, 2023

Description

This PR improves the performance of svdMPO and qn_svdMPO which are called from MPO(os::OpSum, sites::Vector{<:Index})::MPO.

My research has revealed the following:

Getting ValType = determineValType(terms(os)) inside the method svdMPO and defining Vs = [Matrix{ValType}(undef, 1, 1) for n in 1:N] without type annotation causes type instability.

Type instability causes the following code sections to run slowly.

if A_row == -1 && A_col == -1 #onsite term
M[end, 1] += ct
elseif A_row == -1 #term starting on site n
for c in 1:size(VR, 2)
z = ct * VR[A_col, c]
M[end, 1 + c] += z
end
elseif A_col == -1 #term ending on site n
for r in 1:size(VL, 2)
z = ct * conj(VL[A_row, r])
M[1 + r, 1] += z
end
else
for r in 1:size(VL, 2), c in 1:size(VR, 2)
z = ct * conj(VL[A_row, r]) * VR[A_col, c]
M[1 + r, 1 + c] += z
end
end

If practical and applicable, please include a minimal demonstration of the previous behavior and new behavior below.

Minimal demonstration of previous behavior

  • Step1: Prepare the following Python script to generate the test data:
# qph.py
from openfermion import MolecularData
from openfermionpyscf import run_pyscf

# pip3 install quri-parts-openfermion
from quri_parts.openfermion.transforms import jordan_wigner

basis = "6-31g"
multiplicity = 1
charge = 0
distance = 1.6
geometry = [("Li", (0, 0, 0)), ("H", (0, 0, distance))]

pyscf_molecule = MolecularData(geometry, basis, multiplicity, charge)
molecule = run_pyscf(pyscf_molecule)
n_orbs = molecule.n_orbitals
n_elecs = molecule.n_electrons


op_mapper = jordan_wigner.get_of_operator_mapper()

n_active_orbs = 8

qubit_count = 2 * n_active_orbs
hamiltonian = molecule.get_molecular_hamiltonian(
    range(n_elecs // 2), range(n_active_orbs)
)
qp_h = op_mapper(hamiltonian)

li = []

for pauli, c in qp_h.items():
    ps = []
    for i, p in pauli:
        ps.append((i, p))
    li.append(pauli)

print(f"N={qubit_count}")
print("paulis = [")
for e in sorted(li, key=lambda g: len(g)):
    if len(e) == 0:
        continue
    sorted_list = sorted(list(e), key=lambda e: e[0])
    sorted_list = [(1 + s[0], s[1].name) for s in sorted_list]
    out = "    ["
    seqs = []
    for s1, s2 in sorted_list:
        seqs.append('"' + s2 + '"')
        seqs.append(str(s1))
    out += ",".join(seqs)
    out += "],"
    print(out)
print("]")
  • Step2: Run the following command:
python qph.py > paulis.jl

It will create a file paulis.jl.

You can also find paulis.jl from my gist.

  • Step3: Prepare the following Julia script mpo_bench.jl to show our Pull Request improves performance for creating MPO.
# mpo_bench.jl 
using ITensors

include("paulis.jl") # export paulis, N

for conserve_qns in [true, false]
  sites = siteinds("Qubit", N; conserve_qns)
  os = ITensors.OpSum()

  for p in paulis
    coeff = rand()
    os += ITensors.Ops.op_term((coeff, p...))
  end

  @time MPO(os, sites)
  @time MPO(os, sites)
  @time MPO(os, sites)
end

On main branch we have:

$ julia mpo_bench.jl 
 26.777017 seconds (84.34 M allocations: 4.188 GiB, 5.44% gc time, 90.48% compilation time)
  2.426720 seconds (44.33 M allocations: 1.601 GiB, 14.37% gc time)
  2.540036 seconds (44.33 M allocations: 1.601 GiB, 13.92% gc time)
 29.800494 seconds (263.40 M allocations: 10.456 GiB, 8.58% gc time, 36.43% compilation time)
 20.952781 seconds (246.38 M allocations: 9.343 GiB, 9.42% gc time)
 16.334621 seconds (246.38 M allocations: 9.343 GiB, 9.50% gc time)

Minimal demonstration of new behavior

Our new pull request reduces the number of allocations and improves the performance of getting MPO(os, site).

julia mpo_bench.jl 
 26.626078 seconds (44.08 M allocations: 3.383 GiB, 5.20% gc time, 95.01% compilation time)
  1.958892 seconds (6.27 M allocations: 968.143 MiB, 17.07% gc time)
  1.869094 seconds (6.27 M allocations: 968.143 MiB, 16.23% gc time)
 20.243849 seconds (21.06 M allocations: 6.434 GiB, 5.73% gc time, 46.55% compilation time)
 11.731973 seconds (3.96 M allocations: 5.315 GiB, 6.59% gc time)
  9.871340 seconds (3.96 M allocations: 5.315 GiB, 5.76% gc time)

How Has This Been Tested?

$ julia -q
julia> ]
] activate
] instantiate
] test

Checklist:

  • [ x ] My code follows the style guidelines of this project. Please run using JuliaFormatter; format(".") in the base directory of the repository (~/.julia/dev/ITensors) to format your code according to our style guidelines.
  • [ x ] I have performed a self-review of my own code.
  • [ x ] I have commented my code, particularly in hard-to-understand areas.
  • [ ] I have added tests that verify the behavior of the changes I made.
    • This PR does not break user interface.
  • [ ] I have made corresponding changes to the documentation.
  • [ x ] My changes generate no new warnings.
  • [ ] Any dependent changes have been merged and published in downstream modules.
    • No additional dependencies are required for this PR.

This change leads to significant improvements in terms of performance
@codecov-commenter
Copy link

codecov-commenter commented Aug 24, 2023

Codecov Report

Patch coverage: 87.27% and project coverage change: -18.03% ⚠️

Comparison is base (b54e8d7) 85.40% compared to head (0a53cec) 67.37%.
Report is 1 commits behind head on main.

❗ Current head 0a53cec differs from pull request most recent head 8e8e85b. Consider uploading reports for the commit 8e8e85b to get more accurate results

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the GitHub App Integration for your organization. Read more.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1183       +/-   ##
===========================================
- Coverage   85.40%   67.37%   -18.03%     
===========================================
  Files          88       87        -1     
  Lines        8405     8389       -16     
===========================================
- Hits         7178     5652     -1526     
- Misses       1227     2737     +1510     
Files Changed Coverage Δ
src/physics/autompo/opsum_to_mpo.jl 0.00% <0.00%> (-100.00%) ⬇️
src/qn/flux.jl 91.66% <81.81%> (-8.34%) ⬇️
src/global_variables.jl 100.00% <100.00%> (ø)
src/physics/autompo/opsum_to_mpo_qn.jl 97.66% <100.00%> (-1.75%) ⬇️
src/qn/qnitensor.jl 69.54% <100.00%> (-14.80%) ⬇️
src/tensor_operations/tensor_algebra.jl 87.27% <100.00%> (-2.59%) ⬇️

... and 32 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mtfishman
Copy link
Member

mtfishman commented Aug 25, 2023

Hi @terasakisatoshi, thanks a lot for the PR. Nice to see a relatively small change leads to big improvements in speed and memory allocations.

Your pull request inspired me to make my own here: #1184 that essentially implements the same design you suggest of creating a function barrier, but also updates some of the style of the surrounding code which I noticed was a bit outdated. Would you mind benchmarking that PR to see if it leads to the same performance improvements as this PR for your use case?

Also, it's neat to see that you are passing Hamiltonians generated by OpenFermion into ITensor. We have similar functionality in an experimental package called ITensorChemistry.jl which calls out to PySCF using PythonCall.jl. We found that generating the MPOs for generic quantum chemistry Hamiltonians can scale pretty badly with the number of orbitals, which was one of the inspirations for the package ITensorParallel.jl, which is based around splitting up the Hamiltonian into parts and generating sums of MPOs instead of a single MPO, which can be more efficient.

@terasakisatoshi
Copy link
Contributor Author

Also, it's neat to see that you are passing Hamiltonians generated by OpenFermion into ITensor. We have similar functionality in an experimental package called ITensorChemistry.jl which calls out to PySCF using PythonCall.jl. We found that generating the MPOs for generic quantum chemistry Hamiltonians can scale pretty badly with the number of orbitals, which was one of the inspirations for the package ITensorParallel.jl, which is based around splitting up the Hamiltonian into parts and generating sums of MPOs instead of a single MPO, which can be more efficient.

This advice is very helpful to me. I would definitely like to try the package you suggested.

@emstoudenmire
Copy link
Collaborator

Thanks for catching this issue!

@mtfishman
Copy link
Member

@terasakisatoshi I made some suggestions for comments to add to remind us why the code is written the way it is. Besides that, it looks good from my end.

terasakisatoshi and others added 4 commits August 31, 2023 09:21
add a comment

Co-authored-by: Matt Fishman <[email protected]>
add a comment

Co-authored-by: Matt Fishman <[email protected]>
add a comment

Co-authored-by: Matt Fishman <[email protected]>
add a comment

Co-authored-by: Matt Fishman <[email protected]>
@terasakisatoshi
Copy link
Contributor Author

It's time to merge?

@mtfishman mtfishman merged commit 185a69b into ITensor:main Sep 5, 2023
@terasakisatoshi terasakisatoshi deleted the improve-type-stability-of-svdMPO branch September 5, 2023 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants