Skip to content

Commit

Permalink
Add docs on full local snapshots
Browse files Browse the repository at this point in the history
Signed-off-by: Amory Hoste <[email protected]>
  • Loading branch information
amohoste committed Jun 12, 2022
1 parent 7bde409 commit c990d06
Show file tree
Hide file tree
Showing 5 changed files with 69 additions and 6 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
## [Unreleased]

### Added
- Add support for [fullLocal snapshots](docs/fulllocal_snapshots.md) mode

### Changed

Expand Down
7 changes: 7 additions & 0 deletions configs/.wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,9 @@ DCNs
De
debian
deployer
deterministically
dev
devicemapper
devmapper
df
DialGRPCWithUnaryInterceptor
Expand Down Expand Up @@ -285,6 +287,7 @@ microarchitectural
Microarchitecture
microbenchmark
microbenchmarks
microVM
microVMs
minio
MinIO
Expand Down Expand Up @@ -395,11 +398,13 @@ rebasing
repo
Repos
roadmap
rootfs
RPC
rperf
RPerf
RPERF
rsquo
rsync
runc
runtime
runtimes
Expand Down Expand Up @@ -432,6 +437,7 @@ SinkBinding
SinkBindings
sms
SMT
snapshotted
snapshotting
SoC
SOCACHE
Expand Down Expand Up @@ -461,6 +467,7 @@ TestProfileIncrementConfiguration
TestProfileSingleConfiguration
TextFormatter
th
thinpool
Timeseries
timeseriesdb
TimeseriesDB
Expand Down
4 changes: 3 additions & 1 deletion docs/developers_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,14 +108,16 @@ We also offer self-hosted stock-Knative environments powered by KinD. To be able

* vHive supports both the baseline Firecracker snapshots and our advanced
Record-and-Prefetch (REAP) snapshots.

* vHive integrates with Kubernetes and Knative via its built-in CRI support.
Currently, only Knative Serving is supported.

* vHive supports arbitrary distributed setup of a serverless cluster.

* vHive supports arbitrary functions deployed with OCI (Docker images).

* Remote snapshot restore functionality can be integrated through the [full local snapshot functionality](./fulllocal_snapshots.md).

* vHive has robust Continuous-Integration and our team is committed to deliver
high-quality code.

Expand Down
61 changes: 56 additions & 5 deletions docs/fulllocal_snapshots.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,58 @@
# vHive fulllocal snapshots guide
# vHive full local snapshots

The default snapshots in vHive use an offloading based technique that leaves the shim and other resources running upon shutting down a VM such that it can be re-used in the future. This technique has the advantage that a shim does not have to be recreated and the block and network devices of the previously stopped VM can be reused. This approach does however limit the amount of VMs that can be booted from a snapshot to the amount of VMs that have been offloaded. An alternative approach is to allow loading an arbitrary amount of VMs from a single snapshot by creating a new shim, block and network devices upon loading a snapshot. This functionality can be enabled by running vHive using the `-snapshots -fulllocal` flags. Additionally, the following flags can be used to further configure the fullLocal snapshots
When using Firecracker as the sandbox technology in vHive, two snapshotting modes are supported: a default mode and a
full local mode. The default snapshot mode use an offloading based technique which leaves the shim and other resources
running upon shutting down a microVM such that it can be re-used in the future. This technique has the advantage that
the shim does not have to be recreated and the block and network devices of the previously stopped microVM can be
reused, but limits the amount of microVMs that can be booted from a snapshot to the amount of microVMs that have been
offloaded. The full local snapshot mode instead allows loading an arbitrary amount of microVMs from a single snapshot.
This is done by creating a new shim and the required block and network devices upon loading a snapshot and creating an
extra patch file containing the filesystem differences written by the microVM upon snapshot creation. To enable the
full local snapshot functionality, vHive must be run with the `-snapshots` and `-fulllocal` flags. In addition, the
full local snapshot mode can be further configured using the following flags:

* `-isSparseSnaps`: store the memory file as a sparse file to make the storage size closer to the actual memory utilized by the VM, rather than the memory allocated to the VM
* `-snapsStorageSize [capacityGiB]`: specify the amount of capacity that can be used to store snapshots
* `-netPoolSize [capacity]`: keep around a pool of [capacity] network devices that can be used by VMs to keep network creation off the cold start path
- `isSparseSnaps`: store the memory file as a sparse file to make its storage size closer to the actual size of the memory utilized by the microVM, rather than the memory allocated to the microVM
- `snapsStorageSize [capacityGiB]`: specify the amount of capacity that can be used to store snapshots
- `netPoolSize [capacity]`: the amount of network devices in the network pool, which can be used by microVMs to keep the network initialization off the cold start path

## Remote snapshots

Rather than only using the snapshots available locally on a node, snapshots can also be transferred between nodes to
potentially accelerate cold start times and reduce memory utilization, given that proper mechanisms are in place to
minimize the snapshot network transfer latency. This could be done by storing snapshots in a global storage solution
such as S3, or directly distributing snapshots between compute nodes. The full local snapshot functionality in vHive
can be used to implement such functionality. To implement this, the container image used by the snapshotted microVM
must be available on the local node where the remote snapshot will be restored. This container image can be used in
combination with the filesystem changes stored in the snapshot patch file to create a device mapper snapshot that
contains the root filesystem needed by the restored microVM. After recreating the root filesystem block device, the
microVM can be created from the fetched memory file and microVM state similarly to how this is done for the full local
snapshots.

## Incompatibilities and limitations

### Snapshot filesystem changes capture and restoration

Currently, the filesystem changes are captured in a “patch file”, which is created by mounting both the original
container image and the microVM block device and extracting the changes between both using rsync. Even though rsync
uses some optimisations such as using timestamps and file sizes to limit the amount of reads, this procedure is quite
inefficient and could be sped up by directly extracting the changed block offsets from the thinpool metadata device
and directly reading these blocks from the microVM rootfs block device. These extracted blocks could then be written
back at the correct offsets on top of the base image block device to create a root filesystem for the to be restored
microVM. Support for this alternative approach is provided through the `ForkContainerSnap` and `CreateDeviceSnapshot`
functions. However, for this approach to work across nodes for remote snapshots, support to [deterministically flatten a container image into a filesystem](https://www.youtube.com/watch?v=A-7j0QlGwFk)
would be required to ensure the block devices of identical images pulled to different nodes are bit identical.
In addition, further optimizations would be necessary to more efficiently extract filesystem changes from the thinpool
metadata device rather than current method, which relies on the devicemapper `reserve_metadata_snap` method to create
a snapshot of the current metadata state in combination with `thin_delta` to extract changed blocks.

### Performance limitations

The full local snapshot mode requires a new block device and network device with the exact state of the snapshotted
microVM to be created before restoring the snapshot. The network namespace and devicemapper block device creation turn
out to be a bottleneck when concurrently restoring many snapshots. Approaches that reduce the impact of these operations
could further speedup the microVM snapshot restore latency at high load.

### UPF snapshot compatibility

The full local snapshot functionality is currently not integrated with the [Record-and-Prefetch (REAP)](papers/REAP_ASPLOS21.pdf)
accelerated snapshots and thus cannot be used in combination with the `-upf` flag.
2 changes: 2 additions & 0 deletions docs/quickstart_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,8 @@ SSD-equipped nodes are highly recommended. Full list of CloudLab nodes can be fo
> By default, the microVMs are booted, `-snapshots` enables snapshots after the 2nd invocation of each function.
>
> If `-snapshots` and `-upf` are specified, the snapshots are accelerated with the Record-and-Prefetch (REAP) technique that we described in our ASPLOS'21 paper ([extended abstract][ext-abstract], [full paper](papers/REAP_ASPLOS21.pdf)).
>
> If `-snapshots` and `-fulllocal` are specified, a single snapshot can be used to restore many microVMs ([full local snapshots](./fulllocal_snapshots.md)). Note that this mode is currently not compatible with the REAP technique.
### 3. Configure Master Node
**On the master node**, execute the following instructions below **as a non-root user with sudo rights** using **bash**:
Expand Down

0 comments on commit c990d06

Please sign in to comment.