-
Notifications
You must be signed in to change notification settings - Fork 39
What is the Veracruz programming model?
The Veracruz runtime exposes a set of host functions (or H-calls) to the WASM program running on top of the runtime. These H-calls serve a similar purpose to syscalls in operating systems, namely provide a programming interface for relatively-unprivileged programs to access services managed by the privileged runtime/operating system. Whilst a typical desktop operating system may provide hundreds of different syscalls, the Veracruz runtime provides a very limited interface to programs. The following table describes this interface in its entirety:
H-call name | Description |
---|---|
getrandom(buf, sz) |
Writes sz bytes of random data, taken from a trusted entropy source, into the buffer buf . The precise source used differs depending on the particular containerisation technology being used by the Veracruz computation. If no suitable entropy source is available, or if random number generation fails for some reason, the H-call returns an appropriate error. |
input_count() |
Returns the number of secret inputs, provisioned by the data providers, that can be accessed by the program. This number will always match the number of data sources specified by the global policy, and is used to programmatically discover that number, so that programs parametric in the number of inputs can be written. |
input_size(idx) |
Returns the size, in bytes, of input idx . Note that data providers can provision their data sets in an arbitrary order, as part of the Veracruz provisioning process. However, once all data sets are provisioned, the runtime sorts the data sets to match the order specified in the global policy. As a consequence, the result of this H-call is always well-defined. Returns an error if idx is greater than the number of data sets provisioned by the data providers. |
get_input(buf, sz, idx) |
Reads input idx into the buffer buf with size sz . Returns an error if idx is greater than the number of data sets provisioned by the data providers, or if the size of input idx is grater than sz . |
write_result(buf, sz) |
Writes the result of a computation by copying the content of buffer buf with size sz into a buffer in the trusted Veracruz runtime. Returns an error if a result has already previously been written. |
The table above is provided for illustrative purposes, only: the Veracruz H-call
layer is explicitly not stable, and may change over time. As is the case
with a typical operating system, programmers are not expected to invoke the
trusted Veracruz runtime's H-calls directly from application code, but are
instead expected to make use of an abstraction layer. With Unix-family
operating systems, this abstraction layer is libc
; for Veracruz, it is
libveracruz
which provides a stable layer over the Veracruz H-call API, and
also provides higher-level functions, more amenable for use in typical
application code, that build on top of the raw H-call layer.
At the time of writing, a version of libveracruz
is only provided for the Rust
programming language, and therefore all Veracruz programs must be written in
Rust — or rather, it's only convenient to write Veracruz programs in Rust
without manually invoking the unstable H-call API. Support for other languages,
including C and C++, is planned but not a priority at present.
In the next subsection, we walk through a simple Veracruz program, written using
libveracruz
, and make reference to various notable aspects of the Veracruz
programming model.
We will now walk through a simple Veracruz program, built using libveracruz
,
which expects a single data source of binary-encoded 64-bit floating point
values as input. The program will then perform a linear regression on the
input data set, before returning a binary-encoded answer as its result. Note
that this example is both complete and executable on Veracruz, and can be found
in the $VERACRUZ/sdk/examples/linear-regression
directory.
#![no_std]
extern crate alloc;
extern crate veracruz_rt;
use alloc::vec::Vec;
use libveracruz::{generate_write_result, host, return_code};
use serde::Serialize;
For this example, we do not need to link against the full Rust standard library,
std
, but rather a subset of this library called alloc
which assumes the
presence of a global allocator, and provides a host of useful data structures
that need to allocate memory. As a result, we mark the code as
#![no_std]
using the attribute on the first line, above. Veracruz
supports programming with or without the Rust standard library, as is
appropriate. In !#[no_std]
contexts, we also provide a support library,
veracruz_rt
, which can be linked against the program, and sets up a global
allocator and other low-level tasks, for convenience.
fn read_input() -> Result<Vec<(f64, f64)>, i32> {
if host::input_count() != 1 {
return return_code::fail_invariant_failed();
} else {
let dataset: Vec<u8> = host::read_input(0).unwrap();
match pinecone::from_bytes(&dataset) {
Err(_err) => return_code::fail_bad_input(),
Ok(dataset) => Ok(dataset)
}
}
}
Above, the program uses functions exported from the host
module of
libveracruz
to first query the number of inputs that have been provisioned by
the data providers, and are therefore available to it. In this instance, the
program is hardwired to expect a single input — other programs can work
parametrically in the number of inputs, as appropriate — and returns a
suitable error code if this is not the case.
Once this test is completed, the program grabs the raw bytes of the first input
(at index 0) from the Veracruz runtime. This step should never fail, as we've
already checked that an input is available, so the call to
read_input(0).unwrap()
is safe. In our examples and test code, we've
fixed on using the Rust pinecone
library as a way of serializing and
deserializing Rust data structures to-and-from raw bytes. We therefore try to
deserialize the collection of pairs of 64-bit floats, returning an error code if
this fails.
Generally speaking, the Veracruz runtime doesn't care how inputs and outputs are serialized, as all it sees is unstructured byte data and any structured imposed on these bytes is something computation participants need to negotiate out-of-band, before the computation begins.
Below, we introduce a simple data structure that will capture the result of our linear regression algorithm, if it is successful:
#[derive(Serialize)]
struct LinearRegression {
gradient: f64,
intercept: f64,
}
Note that, above, we use the Serialize
trait with the LinearRegression
struct. This is a standard trait from the Rust Serde library, a common
framework for serializing and deserializing data structures. Moreover, earlier,
we used the pinecone
library, which again is another "off-the-shelf" Rust
library. Any Rust library that can be compiled to WASM can be used when
programming Veracruz computations — to a first approximation, this means
any Rust library written in pure Rust, and not depending on external C
libraries, or any library not depending on operating system-specific
functionality, such as devices or filesystems, can be used with Veracruz.
The code below is the meat of the linear regression implementation. This is straightforward, not tied to Veracruz, and is left here for completeness. We offer no comment on this code.
fn means(dataset: &[(f64, f64)]) -> (f64, f64) {
let mut xsum: f64 = 0.0;
let mut ysum: f64 = 0.0;
let length = dataset.len();
for (x, y) in dataset.iter() {
xsum += *x; ysum += *y;
}
(xsum / length as f64, ysum / length as f64)
}
fn linear_regression(data: &[(f64, f64)]) -> LinearRegression {
let (xmean, ymean) = means(&data);
let mut n: f64 = 0.0;
let mut d: f64 = 0.0;
for datum in data {
n += (datum.0 - xmean) * (datum.1 - ymean);
d += (datum.0 - xmean) * (datum.0 - xmean);
}
LinearRegression {
gradient: n / d,
intercept: ymean - (n / d) * xmean,
}
}
We now come to the entry point of the Veracruz computation. Like all standard
Rust and C programs, a Veracruz program uses main()
as its entry:
fn main() -> return_code::Veracruz {
let data = read_input()?;
let result = linear_regression(&data);
write_result(result)
}
Our libveracruz
offers a range of pre-baked error codes that can be used for
signalling failures, of different kinds, of a Veracruz computation. We've
already seen some above in read_inputs()
. These errors are "floated up" to
main()
, and used as the return code of the program. The Veracruz runtime can
recognize these error codes, and forward them as appropriate to participants of
the computation.
The main
function above captures the typical form of a Veracruz computation:
- Inputs are checked, grabbed from the runtime, and deserialized.
- The main algorithm is run on the deserialized inputs, with the result serialized into the correct, negotiated form.
- The result is written back to the Veracruz runtime. Here, this "write back"
step is achieved by calling the synthesized
write_result()
function, discussed above, with the result.
Veracruz programs are compiled for a custom 32-bit WASM compile target, called
wasm32-arm-veracruz
. A target specification file is provided as part of the
Veracruz SDK.
It would be awkward to have to develop and debug a program, like the one above,
in a distributed setting. As a result the Veracruz SDK provides a freestanding
version of the execution engine used inside the Veracruz runtime for offline
testing before deployment. This can be found in
$VERACRUZ/sdk/freestanding-chihuahua
. We can execute our linear regression
example, above, using this as follows:
$ RUST_LOG=info
./target/release/freestanding-chihuahua
--program ../examples/linear-regression/target/wasm32-arm-veracruz/release/linear-regression.wasm
--data ../datasets/melbourne-houses-distance-price.dat
--execution-strategy jit
The program successfully executes, returning a success error code, and produces
16 bytes of pinecone
-encoded data corresponding to a LinearRegression
structure.
Many prospective Veracruz programs depend on the ability to generate random numbers, and Rust provides a series of programmer-friendly libraries for working with random number sources, generating collections of random numbers, and working with different distributions of random numbers.
As part of the Veracruz SDK, we have ported the standard Rust getrandom
and
rand
crates to the wasm32-arm-veracruz
target, making these libraries
available to programmers wishing to target Veracruz. Note that our fork of
getrandom
ultimately calls out to the getrandom()
H-call provided by
the Veracruz runtime described above.
The WebAssembly System Interface (WASI, henceforth) aims to provide a POSIX-style interface to system resources for WASM programs. We do not use this interface, instead preferring our own, as described above as for our purposes WASI is overkill. A Veracruz program is relatively constrained, reading inputs, processing them, and writing outputs. It cannot open files on a delegate's machine, by design, so large chunks of the WASI API are completely irrelevant, for Veracruz's purposes.
Whilst we could stub out the aspects of WASI that are irrelevant for our purposes, we instead take an alternative approach, beginning with a minimalist host API, and slowly adding features when there is a demonstrated need for them.
Also: see the Veracruz homepage for the latest project news.