Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: consolidate VM creation APIs #806

Closed
wants to merge 25 commits into from

Conversation

gjcolombo
Copy link
Contributor

(N.B. I'm still seeing if I can spin parts of this off into separate PRs, but I'm putting this up as-is for now (a) in case folks want to take a look as-is, and (b) to have commit hashes to refer to in my falcon/omicron clones so I can stop using path references everywhere.)

Consolidate the two separate instance-ensure APIs into a single API using the approach described in #804. There are several related changes in this PR, which are:

  • The server now has a single instance PUT endpoint that takes the revised version of the InstanceEnsureRequest described in two instance ensure APIs is one too many #804.
  • Instance specs now use a SpecKey enum for their component map keys. This enum has UUID and string variants. This allows the control plane to supply UUID keys when it's possible and fall back to strings when it's not, and it allows Propolis to store/compare UUIDs in binary form when the control plane supplies them.
  • Components' spec keys are now used as identifiers in a VM's device, block backend, and Crucible backend maps.
  • During live migration, the target initializes devices using an amended version of the source's instance spec. The amendments are supplied by the target's caller, which may need to replace Crucible VCRs and VNIC names to allow the VM to function properly after migration. One consequence of this is that device-level migration compatibility checks are no longer needed--the target's device list is the source's device list, so they're always compatible.
  • propolis-server processes no longer use config TOMLs. Instead, they take a bootrom path (and optional bootrom version string) on the command line, and expect all devices to specified through the ensure endpoint. To allow existing users to continue using their config TOMLs, add a TOML-to-instance-spec library to the propolis-config-toml crate and teach propolis-cli to read config TOMLs and create specs from them.

The main mechanical challenge in this PR is that it's no longer possible to create the VMM tokio runtime before starting the state driver task, because the runtime's thread count depends on the guest's vCPU count, and during a migration that's not known until after the migration protocol has begun to run. This is not intractable but accounts for the most significant changes to propolis-server/src/lib/vm.

Tests (so far):

  • cargo test
  • PHD (including migration variations, which now use the replacement-component functionality in several places)
  • created some ad hoc VMs using propolis-cli and verified they had the expected components and the expected number of tokio runtime threads
  • drafted changes to Falcon and Omicron that depend on this new API
    • tested a two-node Falcon topology with a softnpu link between the nodes & verified that the expected components ended up attached to the expected VMs
    • ran cargo test in Omicron (exercises the stripped-down mock server)
    • deployed an Omicron dev cluster, booted a VM there, verified that its component list is correct

Fixes #804. Fixes #776. Fixes #772.

This is the first step in a long road towards two goals:

1. Get live migration targets to initialize from their sources
2. Allow the control plane to configure instances using instance specs
   to access all the features they present that instance ensure requests
   do not

This particular commit does a couple of different things:

- Define the One True Instance Initialization Request type
- Change the key type in instance specs from String to a special
  SpecKey type

# One True Init Type

There are now three ways for Propolis clients to initialize a VM:

- From its config TOML
- From an instance spec provided as an argument
- Via live migration

The first option is meant for local development. The second is meant to
be used when starting new VMs. The third is used when migrating a VM.
The "migrate" option allows the client to specify new backend
configurations for some or all of the backends in the source's instance
spec. This allows the control plane to substitute things like Crucible
generation numbers or OPTE port names that change when a VMM migrates.

For this scheme to work, the control plane needs to be able to name the
backends that it's replacing (so that the source and target can line up
the backend replacements during the sync phase). This requires a more
disciplined approach to instance spec component naming.

# SpecKey

Today instance specs use strings as their key names. Using UUIDs instead
has the nice property that it becomes very easy to correlate control
plane objects (e.g. a Disk or a NetworkInterface record) with the
Propolis components that instantiate them. It is also cheaper to compare
UUIDs directly than to compare their string representations.
Unfortunately, it's not possible to change the existing SpecKey type
alias from String to Uuid, for (at least) two reasons:

1. The config TOML method of initializing an instance accepts arbitrary
   strings as device and backend names. Having to generate UUIDs for
   these would be a hassle for local development.
2. More importantly: Some control plane objects need more than one spec
   element; for example, a disk has both a device and a backend
   component, and they can't use the same ID.

To get around these problems, this commit introduces the `SpecKey` enum,
which has `Uuid` and `Name` variants and uses adapters from the
`serde_with` crate to serialize and deserialize it using its `Display`
and `FromStr` implementations. This allows the key to be used in the
components map while keeping the map `Serialize`. String keys that can
convert to UUIDs are stored in that representation; otherwise they're
stored as name strings. This allows Nexus to use an ID for one half of a
device and then append a disambiguator to the other component.

This commit fixes the build for most of the instance spec-related
functions in the server, but does not deal with anything in the VM state
machine or the migration protocol, where the build is still completely
on the floor.
Tidy up a few more easy-to-fix errors (usually use of strings where we
want the use of SpecKeys) before tackling the bigger problem of how to
rearrange VM initialization.
This is needed because the spec is not known right away during a
migration in.
The VCR replacement endpoint takes a disk name (in the request body) and
a "disk ID" (in the request path). The former item is used to look up
the Crucible element in the instance spec; the latter is used to look it
up in the VM's Crucible backend map. Change this around as follows:

- The VCR replacement request path takes a SpecKey. The request body no
  longer takes a name.
- Crucible backend components are now identified by their SpecKeys. The
  UUID obtained from the Crucible backend itself is no longer used for
  this purpose.
- Enable metrics for Crucible disks only if they're identified by a
  UUID. This is a constraint of the virtual disk provider, which expects
  disk IDs to be used in its time-series identifiers. In practice this
  means that Nexus will need to identify Crucible backends using disk
  IDs and use some other name for the corresponding devices (which is
  probably what we were going to do anyway, so no big deal).
- Change the other Crucible-dependent endpoints (snapshot, volume
  status) to use similar path parameters.
Refactor the `VmEnsure` type state machine so that it no longer assumes
that an instance spec is present at all times (it might be injected
later during live migration).

The biggest challenge here is that it's no longer possible to run the
entire VM initialization process on the VMM runtime, because there might
not be an instance spec when initialization starts, and the VMM runtime
can't be created until there's a spec that specifies how many vCPUs the
VM will have. To get around this, move some things around so that the
`VmEnsure` state machine creates the runtime instead of running on the
previously-created runtime.

Slightly tweak the migration destination code so that it's ready to
provide an instance spec to the ensure logic if the sync phase produces
one from the preamble (it doesn't, but after the next commit it will!).
Instead of comparing the source and target instance specs for migration
compatibility, have the target read the source's spec, substitute in any
new backend information the target has, and then attempt to initialize
itself from the resulting spec.

This deletes 900+ lines of compat checking code that's no longer needed.

At this point the server builds, so it should be possible to run tests
on it and then regenerate the OpenAPI endpoints, after which we'll see
if the new instance ensure API is worth a darn or not.
Clean up the first couple of bits of build fallout from the new API
spec:

- Crucible client types no longer appear directly in the propolis-client
  generated types (because DiskRequest is gone and instance specs expect
  to get serialized VCRs instead). Have PHD depend directly on
  crucible-client-types for these.
- Remove most of the InstanceEnsureRequest validation code from the mock
  server. Most of these checks are for the validity of slot inputs,
  and slot conversions aren't part of the API anymore. The mock serial
  port logic (which is the bit that the Omicron integration tests really
  make use of) remains unchanged.
There's no longer a mode of operation where the server combines config
TOML devices with ensure requests to create a complete VM configuration.
Move the TOML-entry-to-spec-component conversion logic into the
`propolis-server-config` crate so that it's available to clients who
want it, then remove config-TOML device parsing from the server
entirely.

The server still needs the config TOML contents to get the bootrom path.
That won't change, at least for now.
Change PHD's instance ensure paths to use the new VM creation APIs.
Remove two tests that are no longer relevant: the "incompatible VMs"
migration test is no longer useful now that targets inherit
configuration from their sources, and the "ensure API" test is no longer
useful now that everyone has to use a single ensure API.

One minor annoyance here (of which I've made a note) is that instance
GET requests always return their specs in an Option, causing a couple of
test cases to sprout new `expect`s. Maybe we can improve on this.

Tested with Alpine and Debian 11 guests.
propolis-server no longer has an option to read VM device configuration
from a TOML file. This is a useful option for local development. The
goal of this change is to lay the groundwork for propolis-cli to read
existing config TOMLs and repackage them into instance specs that can be
used with the new VM creation API.

propolis-server reads config TOMLs using the propolis-server-config
crate, which (now) contains logic to convert them to propolis-api-types
instance specs. This logic is obnoxious to move to propolis-cli, because
the CLI uses propolis-client, whose spec types are Progenitor-generated
and don't match the propolis-api-types types.

Instead of building dozens of `From` impls or festooning the
propolis-client generator with `replace` directives, this commit
completely breaks propolis-server's dependency on
propolis-server-config. This is possible because the server only reads
two pieces of information from the config TOML, both of which are easily
replaced:

1. The bootrom path is now specified as a command-line argument to the
   server (where the config TOML path would have gone in times before).
2. The "migration failure" test device can now be specified in an
   instance spec (though the server will only accept it if it was built
   in the correct configuration, i.e. without the omicron-build
   feature).

Once this dependency is removed, propolis-server-config can take a
dependency on propolis-client without creating a circular dependency or
layering violation. (It would probably also make sense to rename the
crate to something like `propolis-config-toml`, but I haven't done that
here.) This in turn allows propolis-cli to pick up
propolis-server-config and use it to convert config TOMLs into
propolis-client instance specs.
This test was assuming that it could write a new instance spec for its
target and that the target would pick it up. The right way to do that is
with the replacement list, so use that instead.
Now that config TOMLs don't directly configure propolis-server binaries,
the old name makes less sense, so switch to a newer, more sensible name.
Up in Omicron, Nexus modules that need to work with volume construction
requests import `sled_agent_client::types::VolumeConstructionRequest`.
Sled agent's generated client includes the VCR type because the sled
agent API includes a `propolis_client::types::DiskRequest`, which (when
it existed) contained a `crucible_client_types::DiskRequest`.

The advantage of this scheme is that Nexus code that works with VCRs is
guaranteed to use the same VCR definition as Propolis does. To maintain
it, have `propolis_client` re-export its VCR definition directly (since
it no longer appears in the generated types).
The propolis-config-toml library wasn't inserting any SoftNpuPort
devices into the specs it generated, which is a regression from the
previous version of the config code. I need to figure out why I didn't
catch this in Falcon testing!

Also, reorganize things a bit to clarify the following:

- a `SoftNpuPort` is a data port on a virtual ASIC
- the data ports have names that are distinct from their instance spec
  keys

SoftNPU treats port names as part of the overall virtual ASIC
configuration, so they should go in the actual port specs; spec keys
exist solely to identify components, and changing a component's key
shouldn't affect how it operates (provided that any other components
that refer to it also have their names updated).
@gjcolombo
Copy link
Contributor Author

This is split into multiple PRs by the series beginning with #807.

@gjcolombo gjcolombo closed this Nov 18, 2024
@gjcolombo gjcolombo deleted the gjcolombo/vm-create-api-rework branch November 18, 2024 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant