Add Rust-based SparsePauliOp.to_matrix and Miri tests (Qiskit#11388)

* Add Rust-based `SparsePauliOp.to_matrix` This rewrites the numerical version of `SparsePauliOp.to_matrix` to be written in parallelised Rust, building up the matrices row-by-row rather than converting each contained operator to a matrix individually and summing them. The new algorithms are complete row-based, which is embarrassingly parallel for dense matrices, and parallelisable with additional copies and cumulative sums in the CSR case. The two new algorithms are an asymptotic complexity improvement for both dense and sparse matrices over the "sum the individual matrices" version. In the dense case, the cost goes from O(4 ** num_qubits * num_ops) to O(4 ** num_qubits + (2 ** num_qubits) * reduced_num_ops) where the first term is from the zeroing of the matrix, and the second is from updating the elements in-place. `reduced_num_ops` is the number of explicitly stored operations after Pauli-operator uniqueness compaction, so is upper-bounded as `4 ** num_qubits`. (The Pauli compaction goes as `O(num_ops)`, so is irrelevant to the complexity discussion.) The CSR form of the algorithm goes as O(2 ** num_qubits * reduced_num_ops * lg(reduced_num_ops)) which (I think! - I didn't fully calculate it) is asymptotically the same as before, but in practice the constant factors and intermediate memory use are *dramatically* reduced, and the new algorithm is threadable with an additional `O(2 ** num_qubits * reduced_num_ops)` intermediate memory overhead (the serial form has only `O(reduced_num_ops)` memory overhead). The `object`-coefficients form is left as-is to avoid exploding the complexity in Rust space; these objects are already slow and unsuited for high-performance code, so the effects should be minimal. * Add non-blocking Miri to CI As we begin to include more `unsafe` code in the Rust-accelerated components, it is becoming more important for us to test these in an undefined-behaviour sanitiser. This is done in a separate CI job because: - we do not yet know how stable Miri will be, so we don't want to block on it. - some dependencies need their version-resolution patching to Miri-compatible versions, but we want to run our regular test suites with the same versions of packages we will be building against. * Parallelise cumulative nnz sum This parallelises the previously serial-only cumulative sum of the `indptr` array of number of non-zero entries at the end. In practice, I didn't actually see any change in performance from this, but philosophically it feels like the right thing to do. * Update Miri pin to later version of crossbeam-epohc * Improve error handling and messages * Simplify unnecessary match * Add link to environment variable configuration * Add link to Rayon plumbing README * Add explicit test of serial and parallel modes
amaloney · Apr 26, 2024 · d084aeb · d084aeb
1 parent 1802732
commit d084aeb
Show file tree

Hide file tree

Showing 9 changed files with 983 additions and 14 deletions.
diff --git a/.github/workflows/miri.yml b/.github/workflows/miri.yml
@@ -0,0 +1,46 @@
+name: Miri
+on:
+  push:
+  pull_request:
+concurrency:
+  group: ${{ github.repository }}-${{ github.ref }}-${{ github.head_ref }}-${{ github.workflow }}
+  # Only cancel in PR mode.  In push mode, don't cancel so we don't see spurious test "failures",
+  # and we get coverage reports on Coveralls for every push.
+  cancel-in-progress: ${{ github.event_name == 'pull_request' }}
+
+jobs:
+  miri:
+    if: github.repository_owner == 'Qiskit'
+    name: Miri
+    runs-on: ubuntu-latest
+    env:
+      RUSTUP_TOOLCHAIN: nightly
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Install Rust toolchain
+        uses: dtolnay/rust-toolchain@nightly
+        with:
+          components: miri
+
+      - name: Prepare Miri
+        run: |
+          set -e
+          # Some of our dependencies aren't Miri-safe with their current release versions.  These
+          # need overriding with known-good versions to run against until the Miri-safe versions are
+          # released and updated in our Cargo.lock.
+          cat >>Cargo.toml <<EOF
+
+          [patch.crates-io]
+          crossbeam-epoch = { git = "https://github.com/crossbeam-rs/crossbeam", rev = "9e859610" }
+          EOF
+          cargo miri setup
+
+      - name: Run Miri
+        run: cargo miri test
+        env:
+          # - `tree-borrows` is required for crossbeam components.
+          # - `symbolic-alignment-check` is extra checking.
+          # - `strict-provenance` is extra checking.
+          MIRIFLAGS: '-Zmiri-tree-borrows -Zmiri-symbolic-alignment-check -Zmiri-strict-provenance'
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -556,6 +556,58 @@ you just need to update the reference images as follows:
 
 Note: If you have run `test/ipynb/mpl_tester.ipynb` locally it is possible some file metadata has changed, **please do not commit and push changes to this file unless they were intentional**.
 
+
+### Testing Rust components
+
+Rust-accelerated functions are generally tested from Python space, but in cases
+where there is Rust-specific internal details to be tested, `#[test]` functions
+can be included inline.  Typically it's most convenient to place these in a
+separate inline module that is only conditionally compiled in, such as
+
+```rust
+#[cfg(test)]
+mod tests {
+    #[test]
+    fn my_first_test() {
+        assert_eq!(2, 1 + 1);
+    }
+}
+```
+
+To run the Rust-space tests, do
+
+```bash
+cargo test --no-default-features
+```
+
+Our Rust-space components are configured such that setting the
+``-no-default-features`` flag will compile the test runner, but not attempt to
+build a linked CPython extension module, which would cause linker failures.
+
+### Unsafe code and Miri
+
+Any `unsafe` code added to the Rust logic should be exercised by Rust-space
+tests, in addition to the more complete Python test suite.  In CI, we run the
+Rust test suite under [Miri](https://github.com/rust-lang/miri) as an
+undefined-behavior sanitizer.
+
+Miri is currently only available on `nightly` Rust channels, so to run it
+locally you will need to ensure you have that channel available, such as by
+```bash
+rustup install nightly --components miri
+```
+
+After this, you can run the Miri test suite with
+```bash
+MIRIFLAGS="<flags go here>" cargo +nightly miri test
+```
+
+For the current set of `MIRIFLAGS` used by Qiskit's CI, see the
+[`miri.yml`](https://github.com/Qiskit/qiskit/blob/main/.github/workflows/miri.yml)
+GitHub Action file.  This same file may also include patches to dependencies to
+make them compatible with Miri, which you would need to temporarily apply as
+well.
+
 ## Style and lint
 
 Qiskit uses three tools for verify code formatting and lint checking. The

diff --git a/crates/accelerate/src/lib.rs b/crates/accelerate/src/lib.rs
@@ -31,6 +31,10 @@ pub mod two_qubit_decompose;
 pub mod utils;
 pub mod vf2_layout;
 
+mod rayon_ext;
+#[cfg(test)]
+mod test;
+
 #[inline]
 pub fn getenv_use_multiple_threads() -> bool {
     let parallel_context = env::var("QISKIT_IN_PARALLEL")

diff --git a/crates/accelerate/src/rayon_ext.rs b/crates/accelerate/src/rayon_ext.rs
@@ -0,0 +1,171 @@
+// This code is part of Qiskit.
+//
+// (C) Copyright IBM 2023
+//
+// This code is licensed under the Apache License, Version 2.0. You may
+// obtain a copy of this license in the LICENSE.txt file in the root directory
+// of this source tree or at http://www.apache.org/licenses/LICENSE-2.0.
+//
+// Any modifications or derivative works of this code must retain this
+// copyright notice, and modified files need to carry a notice indicating
+// that they have been altered from the originals.
+
+//! Extension structs for use with Rayon parallelism.
+
+// See https://github.com/rayon-rs/rayon/blob/v1.10.0/src/iter/plumbing/README.md (or a newer
+// version) for more of an explanation of how Rayon's plumbing works.
+
+use rayon::iter::plumbing::*;
+use rayon::prelude::*;
+
+pub trait ParallelSliceMutExt<T: Send>: ParallelSliceMut<T> {
+    /// Create a parallel iterator over mutable chunks of uneven lengths for this iterator.
+    ///
+    /// # Panics
+    ///
+    /// Panics if the sums of the given lengths do not add up to the length of the slice.
+    #[track_caller]
+    fn par_uneven_chunks_mut<'len, 'data>(
+        &'data mut self,
+        chunk_lengths: &'len [usize],
+    ) -> ParUnevenChunksMut<'len, 'data, T> {
+        let mut_slice = self.as_parallel_slice_mut();
+        let chunk_sum = chunk_lengths.iter().sum::<usize>();
+        let slice_len = mut_slice.len();
+        if chunk_sum != slice_len {
+            panic!("given slices of total size {chunk_sum} for a chunk of length {slice_len}");
+        }
+        ParUnevenChunksMut {
+            chunk_lengths,
+            data: mut_slice,
+        }
+    }
+}
+
+impl<T: Send, S: ?Sized> ParallelSliceMutExt<T> for S where S: ParallelSliceMut<T> {}
+
+/// Very similar to Rayon's [rayon::slice::ChunksMut], except that the lengths of the individual
+/// chunks are arbitrary, provided they sum to the total length of the slice.
+#[derive(Debug)]
+pub struct ParUnevenChunksMut<'len, 'data, T> {
+    chunk_lengths: &'len [usize],
+    data: &'data mut [T],
+}
+
+impl<'len, 'data, T: Send + 'data> ParallelIterator for ParUnevenChunksMut<'len, 'data, T> {
+    type Item = &'data mut [T];
+
+    #[track_caller]
+    fn drive_unindexed<C: UnindexedConsumer<Self::Item>>(self, consumer: C) -> C::Result {
+        bridge(self, consumer)
+    }
+}
+
+impl<'len, 'data, T: Send + 'data> IndexedParallelIterator for ParUnevenChunksMut<'len, 'data, T> {
+    #[track_caller]
+    fn drive<C: Consumer<Self::Item>>(self, consumer: C) -> C::Result {
+        bridge(self, consumer)
+    }
+
+    fn len(&self) -> usize {
+        self.chunk_lengths.len()
+    }
+
+    #[track_caller]
+    fn with_producer<CB: ProducerCallback<Self::Item>>(self, callback: CB) -> CB::Output {
+        callback.callback(UnevenChunksMutProducer {
+            chunk_lengths: self.chunk_lengths,
+            data: self.data,
+        })
+    }
+}
+
+struct UnevenChunksMutProducer<'len, 'data, T: Send> {
+    chunk_lengths: &'len [usize],
+    data: &'data mut [T],
+}
+
+impl<'len, 'data, T: Send + 'data> Producer for UnevenChunksMutProducer<'len, 'data, T> {
+    type Item = &'data mut [T];
+    type IntoIter = UnevenChunksMutIter<'len, 'data, T>;
+
+    fn into_iter(self) -> Self::IntoIter {
+        Self::IntoIter::new(self.chunk_lengths, self.data)
+    }
+
+    #[track_caller]
+    fn split_at(self, index: usize) -> (Self, Self) {
+        // Technically quadratic for a full-depth split, but let's worry about that later if needed.
+        let data_mid = self.chunk_lengths[..index].iter().sum();
+        let (chunks_left, chunks_right) = self.chunk_lengths.split_at(index);
+        let (data_left, data_right) = self.data.split_at_mut(data_mid);
+        (
+            Self {
+                chunk_lengths: chunks_left,
+                data: data_left,
+            },
+            Self {
+                chunk_lengths: chunks_right,
+                data: data_right,
+            },
+        )
+    }
+}
+
+#[must_use = "iterators do nothing unless consumed"]
+struct UnevenChunksMutIter<'len, 'data, T> {
+    chunk_lengths: &'len [usize],
+    // The extra `Option` wrapper here is to satisfy the borrow checker while we're splitting the
+    // `data` reference.  We need to consume `self`'s reference during the split before replacing
+    // it, which means we need to temporarily set the `data` ref to some unowned value.
+    // `Option<&mut [T]>` means we can replace it temporarily with the null reference, ensuring the
+    // mutable aliasing rules are always upheld.
+    data: Option<&'data mut [T]>,
+}
+
+impl<'len, 'data, T> UnevenChunksMutIter<'len, 'data, T> {
+    fn new(chunk_lengths: &'len [usize], data: &'data mut [T]) -> Self {
+        Self {
+            chunk_lengths,
+            data: Some(data),
+        }
+    }
+}
+
+impl<'len, 'data, T> Iterator for UnevenChunksMutIter<'len, 'data, T> {
+    type Item = &'data mut [T];
+
+    #[track_caller]
+    fn next(&mut self) -> Option<Self::Item> {
+        if self.chunk_lengths.is_empty() {
+            return None;
+        }
+        let (out_data, rem_data) = self
+            .data
+            .take()
+            .unwrap()
+            .split_at_mut(self.chunk_lengths[0]);
+        self.chunk_lengths = &self.chunk_lengths[1..];
+        self.data = Some(rem_data);
+        Some(out_data)
+    }
+
+    fn size_hint(&self) -> (usize, Option<usize>) {
+        (self.chunk_lengths.len(), Some(self.chunk_lengths.len()))
+    }
+}
+impl<'len, 'data, T> ExactSizeIterator for UnevenChunksMutIter<'len, 'data, T> {}
+impl<'len, 'data, T> DoubleEndedIterator for UnevenChunksMutIter<'len, 'data, T> {
+    #[track_caller]
+    fn next_back(&mut self) -> Option<Self::Item> {
+        if self.chunk_lengths.is_empty() {
+            return None;
+        }
+        let pos = self.chunk_lengths.len() - 1;
+        let data_pos = self.data.as_ref().map(|x| x.len()).unwrap() - self.chunk_lengths[pos];
+        let (rem_data, out_data) = self.data.take().unwrap().split_at_mut(data_pos);
+        self.chunk_lengths = &self.chunk_lengths[..pos];
+        self.data = Some(rem_data);
+        Some(out_data)
+    }
+}