Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ingest new Propolis VM creation API #7211

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

gjcolombo
Copy link
Contributor

(NOTE: This draft points to an unmerged Propolis commit. I don't expect many, if any, additional changes on the Propolis side that will affect this PR, though. Once the Propolis changes have merged I'll update the pointers here and remove the draft tag.)

Propolis has a shiny new VM creation API that takes instance specs instead of the now-obsolete InstanceEnsureRequest. See oxidecomputer/propolis#804 for details and a list of related PRs.

In sled agent, construct instance specs (for newly-starting VMs) or replacement component lists (for migrating VMs) and pass them to Propolis in place of an InstanceEnsureRequest. This logic will become the foundation for a more general "virtual platform" abstraction in the future; for now the goal is just to adopt the new VM creation API while creating VMs as Propolis would have created them.

Slightly adjust the sled agent instance APIs so that Nexus specifies disks and boot orderings using sled-agent-defined types and not re-exported Propolis types.

Finally, update Nexus to handle a change to the way some Crucible types are exposed to Omicron. Previously, propolis_client's generated types contained generated definitions of the VolumeConstructionRequest and CrucibleOpts types, which sled_agent_client re-exported for Nexus to use. Propolis's new API definition doesn't include these types; instead, propolis_client directly exports these types from its dependency on crucible_client_types. Update sled agent's exports to reflect this, then update Nexus code broken by this change. This mainly involves storing SocketAddrs in their native form instead of converting them to/from strings.

Tests: cargo nextest; started a dev cluster, created an instance, and verified that it had the expected hardware in the expected PCI slots.

This ends up touching a substantial number of files because of the way
Nexus accessed Crucible client types:

- Nexus imported Crucible types from the generated sled agent client
- Sled agent's code included generated Crucible types from the Propolis
  client
- The Propolis client generates Crucible types because the Propolis
  server API used a DiskRequest type that referred to the Crucible
  client library's VolumeConstructionRequest and CrucibleOpts types

The Propolis server API no longer has a DiskRequest type, so generated
Propolis clients no longer include the Crucible types, and neither do
the generated sled agent clients.

Deriving Crucible types from the Propolis client is useful because it
ensures that Nexus's conception of a VolumeConstructionRequest is the
same as Propolis's. To maintain this property, the Propolis client
depends on the crucible-client-types crate and re-exports types from it;
the sled agent client then re-exports from the Propolis client
library.

The catch is that Progenitor-generated versions of these types use
Strings for socket addresses, while the native types use native
SocketAddrs. This requires a fair bit of munging in the tests to remove
some conversions between strings and socket addresses.
Copy link
Contributor Author

@gjcolombo gjcolombo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making notes of a couple of things I saw while writing the PR description.

I also need to check that disk snapshots against a running instance work as intended, since that API also changed (I've tested this on the Propolis side but need to make sure it works here too).

sled-agent/types/src/instance.rs Outdated Show resolved Hide resolved
sled-agent/src/instance.rs Show resolved Hide resolved
@gjcolombo gjcolombo force-pushed the gjcolombo/ingest-propolis-ensure-api branch from 222ff66 to dd376b0 Compare December 6, 2024 19:53
@gjcolombo
Copy link
Contributor Author

I also need to check that disk snapshots against a running instance work as intended, since that API also changed (I've tested this on the Propolis side but need to make sure it works here too).

I did this by spinning up a dev cluster, creating an instance and logging into it, touching some files there, and taking a snapshot of the image's disk while it was still running. The Propolis logs showed it servicing the snapshot request, and afterward I had a usable snapshot that I could use to create a new image/disk/instance that contains the files I'd touched pre-snapshot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant