[8/5] Reduce VMM reservation contention #7533

smklein · 2025-02-12T21:10:14Z

#7498 was introduced to benchmark the cost of concurrent instance provisioning, and it demonstrated that through contention, performance can be significantly on the VMM reservation pathway.

This PR optimizes that pathway, by removing the VMM reservation transaction, and instead replacing it with some non-transactional queries:

First, we query to see if the VMM reservation has already succeeded (for idempotency)
Next, we query for all viable sled targets and affinity information (sled_find_targets_query)
After parsing that data and picking a sled, we call sled_insert_resource_query to INSERT a desired VMM record, and to re-validate our constraints.

This change significantly improves performance in the vmm-reservation benchmark, while upholding the necessary constraints implicit to VMM provisioning.

smklein · 2025-02-14T22:40:11Z

nexus/db-queries/src/db/queries/sled_reservation.rs

+                COALESCE(SUM(CAST(sled_resource_vmm.reservoir_ram AS INT8)), 0) + "
+            ).param().sql(" <= sled.reservoir_size
+        ),
+        our_aa_groups AS (


So, here's a small detail that might be worth fixing...

This query is used alongside sled_find_targets_query, so we do:

sled_find_targets_query

Pick a candidate sled (in Rust)

sled_insert_resource_query, to insert the sled reservation if it's still valid

So, for the "really bad cases" (e.g., no space, affinity group changes with policy = fail), we prevent the reservation if some concurrent action has changed the state of the world from underneath us.

HOWEVER, it is technically possible that we pick an unfavorable sled due to a concurrent operation.

For example:

We get a set of sleds for an instance in an anti-affinity group with policy = allow

We pick a sled, S, for our VMM (maybe no other instances are using S)

Someone else concurrently provisions another VMM to S, and that VMM's instance belongs to our anti-affinity group.

We call sled_insert_resource_query, thinking that this is a reasonable choice for a sled target. And with this query as-written, S is still technically an allowed choice (as long as our VMM still fits). However, because of the concurrent action in step (3), it's not a "good" choice - there's a member of our anti-affinity group co-located with us. Ideally, we would try a different sled.

Today, this means we can pick sled targets less-than-favorably -- but we do still prevent co-locating anti-affinity group members with policy = fail, and we prevent anti-locating affinity group members with policy = fail. It's this more permissive case that could use some cleanup.

smklein · 2025-02-14T22:54:05Z

Before:

After:

(Please note the difference in scale on the X-axis, it is significant)

smklein · 2025-02-14T23:31:41Z

Similarly, with affinity + anti-affinity groups:

Before:

After:

(The results align with those of the group-less benchmarks)

smklein added 3 commits February 12, 2025 11:19

Start working towards a reduced-contention sled reservation

2eaef4f

passing tests

c9bb4c7

cleanup

8e5b1b0

hawkw self-requested a review February 12, 2025 21:18

smklein added 7 commits February 12, 2025 15:18

Updated expectorate output

d1356f8

Testing contention more explicitly

7abc2b4

fmt

bd14e7b

Patch benchmark

d55e540

Partway through affinity group testing

f710687

cache instance/group records

82b028c

Merge branch 'vmm-reserve-bench' into vmm-reduce-contention

7e80208

smklein mentioned this pull request Feb 14, 2025

(7/5) [nexus-db-queries] Benchmark for VMM reservation #7498

Open

Merge branch 'vmm-reserve-bench' into vmm-reduce-contention

df9a1f8

smklein commented Feb 14, 2025

View reviewed changes

Remove unused test output

2ea4dea

smklein marked this pull request as ready for review February 14, 2025 22:54

smklein requested a review from gjcolombo February 14, 2025 22:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[8/5] Reduce VMM reservation contention #7533

[8/5] Reduce VMM reservation contention #7533

smklein commented Feb 12, 2025 •

edited

Loading

smklein Feb 14, 2025

smklein commented Feb 14, 2025

smklein commented Feb 14, 2025 •

edited

Loading

[8/5] Reduce VMM reservation contention #7533

Are you sure you want to change the base?

[8/5] Reduce VMM reservation contention #7533

Conversation

smklein commented Feb 12, 2025 • edited Loading

smklein Feb 14, 2025

Choose a reason for hiding this comment

smklein commented Feb 14, 2025

smklein commented Feb 14, 2025 • edited Loading

smklein commented Feb 12, 2025 •

edited

Loading

smklein commented Feb 14, 2025 •

edited

Loading