Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: In-place crypto #2385

Merged
merged 28 commits into from
Feb 4, 2025
Merged

Conversation

larseggert
Copy link
Collaborator

@larseggert larseggert commented Jan 23, 2025

Fixes #2246 (eventually)

Only in-place encryption so far, and only for the main data path.

Fixes mozilla#2246 (eventually)
Copy link

codecov bot commented Jan 23, 2025

Codecov Report

Attention: Patch coverage is 99.57265% with 1 line in your changes missing coverage. Please review.

Project coverage is 95.29%. Comparing base (384b4bc) to head (9df3b7b).
Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
neqo-crypto/src/aead.rs 98.03% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2385      +/-   ##
==========================================
+ Coverage   95.26%   95.29%   +0.02%     
==========================================
  Files         114      114              
  Lines       36903    37113     +210     
  Branches    36903    37113     +210     
==========================================
+ Hits        35155    35365     +210     
  Misses       1742     1742              
  Partials        6        6              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Jan 23, 2025

Failed Interop Tests

QUIC Interop Runner, client vs. server, differences relative to 108fb8d.

neqo-latest as client

neqo-latest as server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Copy link

github-actions bot commented Jan 23, 2025

Benchmark results

Performance differences relative to e365730.

decode 4096 bytes, mask ff: No change in performance detected.
       time:   [11.793 µs 11.849 µs 11.930 µs]
       change: [-4.1333% -1.2448% +0.5654%] (p = 0.49 > 0.05)

Found 13 outliers among 100 measurements (13.00%)
1 (1.00%) low severe
2 (2.00%) low mild
1 (1.00%) high mild
9 (9.00%) high severe

decode 1048576 bytes, mask ff: No change in performance detected.
       time:   [2.9058 ms 2.9165 ms 2.9286 ms]
       change: [-0.1476% +0.3278% +0.8637%] (p = 0.20 > 0.05)

Found 12 outliers among 100 measurements (12.00%)
12 (12.00%) high severe

decode 4096 bytes, mask 7f: No change in performance detected.
       time:   [19.717 µs 19.848 µs 20.053 µs]
       change: [-0.5297% +0.0020% +0.5684%] (p = 0.99 > 0.05)

Found 17 outliers among 100 measurements (17.00%)
1 (1.00%) low severe
1 (1.00%) high mild
15 (15.00%) high severe

decode 1048576 bytes, mask 7f: No change in performance detected.
       time:   [4.7132 ms 4.7244 ms 4.7371 ms]
       change: [-0.2998% +0.0468% +0.4013%] (p = 0.78 > 0.05)

Found 13 outliers among 100 measurements (13.00%)
1 (1.00%) low mild
12 (12.00%) high severe

decode 4096 bytes, mask 3f: Change within noise threshold.
       time:   [6.2397 µs 6.2782 µs 6.3218 µs]
       change: [+0.1124% +0.8835% +1.6191%] (p = 0.02 < 0.05)

Found 18 outliers among 100 measurements (18.00%)
5 (5.00%) low mild
1 (1.00%) high mild
12 (12.00%) high severe

decode 1048576 bytes, mask 3f: No change in performance detected.
       time:   [2.1124 ms 2.1192 ms 2.1262 ms]
       change: [-0.6450% -0.1280% +0.3322%] (p = 0.63 > 0.05)

Found 11 outliers among 100 measurements (11.00%)
4 (4.00%) high mild
7 (7.00%) high severe

coalesce_acked_from_zero 1+1 entries: No change in performance detected.
       time:   [93.302 ns 93.606 ns 93.914 ns]
       change: [-0.9063% +0.0093% +0.7964%] (p = 0.98 > 0.05)

Found 13 outliers among 100 measurements (13.00%)
10 (10.00%) high mild
3 (3.00%) high severe

coalesce_acked_from_zero 3+1 entries: No change in performance detected.
       time:   [110.93 ns 111.31 ns 111.73 ns]
       change: [-1.3922% -0.1714% +0.7161%] (p = 0.80 > 0.05)

Found 18 outliers among 100 measurements (18.00%)
3 (3.00%) low mild
1 (1.00%) high mild
14 (14.00%) high severe

coalesce_acked_from_zero 10+1 entries: No change in performance detected.
       time:   [110.38 ns 110.72 ns 111.14 ns]
       change: [-0.3298% +0.1731% +0.7170%] (p = 0.55 > 0.05)

Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low severe
4 (4.00%) low mild
5 (5.00%) high severe

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.
       time:   [92.566 ns 92.744 ns 92.990 ns]
       change: [-0.7984% -0.0234% +0.8683%] (p = 0.96 > 0.05)

Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) high mild
6 (6.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.
       time:   [111.78 ms 111.84 ms 111.89 ms]
       change: [-0.1886% -0.1201% -0.0530%] (p = 0.00 < 0.05)

Found 10 outliers among 100 measurements (10.00%)
8 (8.00%) low mild
2 (2.00%) high mild

SentPackets::take_ranges: No change in performance detected.
       time:   [5.2445 µs 5.4303 µs 5.6259 µs]
       change: [-2.4306% +0.4653% +3.4569%] (p = 0.76 > 0.05)

Found 8 outliers among 100 measurements (8.00%)
8 (8.00%) high mild

transfer/pacing-false/varying-seeds: 💚 Performance has improved.
       time:   [37.047 ms 37.122 ms 37.195 ms]
       change: [-8.2698% -7.9849% -7.7178%] (p = 0.00 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild

transfer/pacing-true/varying-seeds: 💚 Performance has improved.
       time:   [37.290 ms 37.358 ms 37.426 ms]
       change: [-7.5052% -7.2561% -7.0183%] (p = 0.00 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

transfer/pacing-false/same-seed: 💚 Performance has improved.
       time:   [36.858 ms 36.926 ms 36.992 ms]
       change: [-8.5997% -8.3275% -8.0696%] (p = 0.00 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild

transfer/pacing-true/same-seed: 💚 Performance has improved.
       time:   [37.691 ms 37.757 ms 37.824 ms]
       change: [-8.1079% -7.8677% -7.6187%] (p = 0.00 < 0.05)
1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: 💚 Performance has improved.
       time:   [835.52 ms 844.39 ms 853.44 ms]
       thrpt:  [117.17 MiB/s 118.43 MiB/s 119.69 MiB/s]
change:
       time:   [-4.4010% -2.8467% -1.1933%] (p = 0.00 < 0.05)
       thrpt:  [+1.2077% +2.9301% +4.6036%]
1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: Change within noise threshold.
       time:   [315.26 ms 318.59 ms 321.90 ms]
       thrpt:  [31.065 Kelem/s 31.389 Kelem/s 31.720 Kelem/s]
change:
       time:   [-2.9558% -1.5039% -0.0329%] (p = 0.04 < 0.05)
       thrpt:  [+0.0329% +1.5268% +3.0458%]
1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.
       time:   [25.601 ms 25.759 ms 25.922 ms]
       thrpt:  [38.578  elem/s 38.821  elem/s 39.061  elem/s]
change:
       time:   [-0.3503% +0.5074% +1.3684%] (p = 0.25 > 0.05)
       thrpt:  [-1.3499% -0.5049% +0.3515%]

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

1-conn/1-100mb-resp/mtu-1504 (aka. Upload)/client: No change in performance detected.
       time:   [1.8285 s 1.8459 s 1.8633 s]
       thrpt:  [53.669 MiB/s 54.175 MiB/s 54.691 MiB/s]
change:
       time:   [-2.5204% -1.1058% +0.3205%] (p = 0.14 > 0.05)
       thrpt:  [-0.3195% +1.1182% +2.5855%]

Client/server transfer results

Transfer of 33554432 bytes over loopback.

Client Server CC Pacing MTU Mean [ms] Min [ms] Max [ms]
gquiche gquiche 1504 518.7 ± 11.9 505.7 543.6
neqo gquiche reno on 1504 728.5 ± 9.0 716.7 741.7
neqo gquiche reno 1504 737.3 ± 11.8 720.8 754.0
neqo gquiche cubic on 1504 762.7 ± 18.4 741.0 805.3
neqo gquiche cubic 1504 748.9 ± 25.8 730.0 818.4
msquic msquic 1504 145.4 ± 92.1 89.3 390.5
neqo msquic reno on 1504 211.5 ± 12.1 196.7 237.5
neqo msquic reno 1504 207.2 ± 11.0 196.7 226.3
neqo msquic cubic on 1504 204.2 ± 10.2 193.1 223.0
neqo msquic cubic 1504 208.6 ± 12.3 196.6 227.0
gquiche neqo reno on 1504 643.7 ± 85.1 524.5 765.6
gquiche neqo reno 1504 639.2 ± 81.2 528.1 761.6
gquiche neqo cubic on 1504 635.4 ± 75.8 530.3 762.6
gquiche neqo cubic 1504 647.2 ± 85.0 522.4 812.0
msquic neqo reno on 1504 448.6 ± 12.5 432.0 474.2
msquic neqo reno 1504 478.3 ± 80.6 425.7 637.3
msquic neqo cubic on 1504 437.4 ± 12.5 421.2 463.8
msquic neqo cubic 1504 424.7 ± 9.6 415.8 442.8
neqo neqo reno on 1504 423.7 ± 12.2 410.3 448.8
neqo neqo reno 1504 418.5 ± 9.4 406.4 430.0
neqo neqo cubic on 1504 422.2 ± 8.7 410.9 435.5
neqo neqo cubic 1504 419.7 ± 9.9 409.0 438.5

⬇️ Download logs

@larseggert
Copy link
Collaborator Author

@mxinden when you have a moment, would you take a look at the borrow-checker issue in PublicPacket::decode? It's the last one I couldn't figure out how to address.

@mxinden
Copy link
Collaborator

mxinden commented Jan 24, 2025

Took a quick look.

            let dcid = Self::opt(dcid_decoder.decode_cid(&mut decoder))?;
            if decoder.remaining() < SAMPLE_OFFSET + SAMPLE_SIZE {
                return Err(Error::InvalidPacket);
            }
            let header_len = decoder.offset();
            return Ok((
                Self {
                    packet_type: PacketType::Short,
                    dcid,
                    scid: None,
                    token: &[],
                    header_len,
                    version: None,
                    data,
                },
                &[],
                &mut [],
            ));
  • decoder has an immutable reference to data.
  • Through decoder dcid has an immutable reference to data.
  • The function returns both dcid (immutable reference to data) AND data. In other words, it returns both an immutable and a mutable reference to data, which is disallowed.

I can take a deeper look and try to fix it.

@larseggert
Copy link
Collaborator Author

Thanks for the analysis! Wonder if we can make dcid a Range into data...

@mxinden
Copy link
Collaborator

mxinden commented Jan 24, 2025

Ah, never seen this before. That would be error prone as the bytes within the range in data could change at any point in time, right? I will give this more thought.

@mxinden
Copy link
Collaborator

mxinden commented Jan 25, 2025

The above described issue, namely that of dcid and data being a conflicting (im-) mutable borrow, can be fixed by "allocating" dcid on the stack via an owned ConnectionId backed by a SmallVec:

diff --git a/neqo-transport/src/packet/mod.rs b/neqo-transport/src/packet/mod.rs
index 73b47bcc..779ca72b 100644
--- a/neqo-transport/src/packet/mod.rs
+++ b/neqo-transport/src/packet/mod.rs
@@ -563,7 +563,7 @@ pub struct PublicPacket<'a> {
     /// The packet type.
     packet_type: PacketType,
     /// The recovered destination connection ID.
-    dcid: ConnectionIdRef<'a>,
+    dcid: ConnectionId,
     /// The source connection ID, if this is a long header packet.
     scid: Option<ConnectionIdRef<'a>>,
     /// Any token that is included in the packet (Retry always has a token; Initial sometimes

That leaves us with another issue, namely rustc not being able to infer that early returns of data don't interfere with the final return of remainder. I have an idea which I will explore tomorrow.

@mxinden
Copy link
Collaborator

mxinden commented Jan 26, 2025

Okay, I got it.

Let's take a look at PacketBuilder on main:

/// `PublicPacket` holds information from packets that is public only.  This allows for
/// processing of packets prior to decryption.
pub struct PublicPacket<'a> {
    /// The packet type.
    packet_type: PacketType,
    /// The recovered destination connection ID.
    dcid: ConnectionIdRef<'a>,
    /// The source connection ID, if this is a long header packet.
    scid: Option<ConnectionIdRef<'a>>,
    /// Any token that is included in the packet (Retry always has a token; Initial sometimes
    /// does). This is empty when there is no token.
    token: &'a [u8],
    /// The size of the header, not including the packet number.
    header_len: usize,
    /// Protocol version, if present in header.
    version: Option<WireVersion>,
    /// A reference to the entire packet, including the header.
    data: &'a [u8],
}

dcid, scid, token and data are all immutable references into the same underlying memory allocation, here our long lived receive buffer.

This pull request introduces the following change:

@@ -564,7 +574,7 @@ pub struct PublicPacket<'a> {
     /// Protocol version, if present in header.
     version: Option<WireVersion>,
     /// A reference to the entire packet, including the header.
-    data: &'a [u8],
+    data: &'a mut [u8],
 }

While dcid, scid and token are untouched, data is now a mutable reference. Having both immutable and mutable references to the same memory allocation is illegal, thus the compiler error.

An easy fix would be to make dcid, scid and token owned types. Given their small footprint, this is likely fine. There might be some additional optimizations, but I doubt they are worth it.

diff --git a/neqo-transport/src/packet/mod.rs b/neqo-transport/src/packet/mod.rs
index 73b47bcc..dc85bbd0 100644
--- a/neqo-transport/src/packet/mod.rs
+++ b/neqo-transport/src/packet/mod.rs
@@ -563,12 +563,12 @@ pub struct PublicPacket<'a> {
     /// The packet type.
     packet_type: PacketType,
     /// The recovered destination connection ID.
-    dcid: ConnectionIdRef<'a>,
+    dcid: ConnectionId,
     /// The source connection ID, if this is a long header packet.
-    scid: Option<ConnectionIdRef<'a>>,
+    scid: Option<ConnectionId>,
     /// Any token that is included in the packet (Retry always has a token; Initial sometimes
     /// does). This is empty when there is no token.
-    token: &'a [u8],
+    token: Vec<u8>,
     /// The size of the header, not including the packet number.
     header_len: usize,
     /// Protocol version, if present in header.

The above, plus a couple of smaller lifetime changes resolve the borrow checker issues.

I will propose a commit with my local changes.

@mxinden
Copy link
Collaborator

mxinden commented Jan 26, 2025

@larseggert let me know what you think of larseggert#34.

Note that it only addresses the borrow-checker issues in neqo-transport/src/packet/mod.rs.

Happy to look at the neqo-http3 failures as well.

neqo-crypto/src/aead.rs Outdated Show resolved Hide resolved
neqo-crypto/src/aead.rs Show resolved Hide resolved
neqo-crypto/src/aead.rs Outdated Show resolved Hide resolved
neqo-crypto/src/aead.rs Outdated Show resolved Hide resolved
neqo-crypto/src/aead.rs Outdated Show resolved Hide resolved
neqo-transport/src/packet/mod.rs Outdated Show resolved Hide resolved
neqo-transport/src/packet/mod.rs Show resolved Hide resolved
@martinthomson
Copy link
Member

FWIW, I looked at the aead crate interface here. I don't think that is going to fit our needs very well.

@larseggert larseggert marked this pull request as ready for review January 29, 2025 17:52
@larseggert
Copy link
Collaborator Author

FWIW, I looked at the aead crate interface here. I don't think that is going to fit our needs very well.

I actually think they might? Our API after the last round of changes is very similar. (Or I'm missing something.)

@martinthomson
Copy link
Member

The challenge I see is twofold:

  1. Their encryption API returns the tag as a return parameter, where we'd probably just prefer that it be stuck at the end (I really don't like that APIs treat tags as separate, I just want a seal()/open() API), meaning that would move costs to the caller
  2. Their decryption API relies on being passed one of their Buffer traits (using dyn no less!). That trait is not implemented by slices because it includes the option to resize, which the encryption API uses to ensure that the modified buffer is 16 byte shorter.

The latter is worse for us because we're passing in a mutable slice, which can't be resized in that way (we might do something with &mut &mut [u8], but ugh, gross).

Copy link
Member

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few gotchas here, but I like how this is shaping up.

The real question I have is: does this really make it go faster? The benchmarks show some improvements. Are those consistent enough for you to be happy? I see that the improvements aren't 100% consistent.

neqo-crypto/src/aead.rs Outdated Show resolved Hide resolved
neqo-crypto/src/aead.rs Show resolved Hide resolved
neqo-crypto/src/aead.rs Show resolved Hide resolved
neqo-crypto/src/aead.rs Outdated Show resolved Hide resolved
neqo-crypto/src/aead_null.rs Outdated Show resolved Hide resolved
neqo-transport/src/packet/mod.rs Outdated Show resolved Hide resolved
larseggert and others added 7 commits January 31, 2025 11:48
Co-authored-by: Martin Thomson <[email protected]>
Signed-off-by: Lars Eggert <[email protected]>
Co-authored-by: Martin Thomson <[email protected]>
Signed-off-by: Lars Eggert <[email protected]>
Co-authored-by: Martin Thomson <[email protected]>
Signed-off-by: Lars Eggert <[email protected]>
Co-authored-by: Martin Thomson <[email protected]>
Signed-off-by: Lars Eggert <[email protected]>
@larseggert
Copy link
Collaborator Author

I have a few gotchas here, but I like how this is shaping up.

The real question I have is: does this really make it go faster? The benchmarks show some improvements. Are those consistent enough for you to be happy? I see that the improvements aren't 100% consistent.

I see a few percentage points (3-5%) locally. It's not a lot; I guess those extra heap allocations aren't that costly.

1-conn/1-100mb-resp (aka. Download)/client
                        time:   [293.84 ms 294.75 ms 295.65 ms]
                        thrpt:  [338.24 MiB/s 339.27 MiB/s 340.32 MiB/s]
                 change:
                        time:   [-4.0668% -3.3831% -2.8379%] (p = 0.00 < 0.05)
                        thrpt:  [+2.9208% +3.5015% +4.2392%]
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

@larseggert larseggert added this pull request to the merge queue Feb 3, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 3, 2025
@larseggert
Copy link
Collaborator Author

I'm kinda surprised the Upload test isn't seeing any improvement though.

@martinthomson
Copy link
Member

It's quite possible that the work we do to generate a packet is still dominated by other factors. There is a non-significant improvement according to the runs, maybe try running it 10x more to get the noise down some more.

@larseggert larseggert added this pull request to the merge queue Feb 4, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 4, 2025
@larseggert larseggert merged commit 2406bfa into mozilla:main Feb 4, 2025
66 of 69 checks passed
@larseggert larseggert deleted the feat-inplace-crypto branch February 4, 2025 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

perf: consider in-place en- and decryption
3 participants