Skip to content

What is the Veracruz programming model?

dominic-mulligan-arm edited this page Oct 31, 2020 · 1 revision

Host calls, and libveracruz

The Veracruz runtime exposes a set of host functions (or H-calls) to the WASM program running on top of the runtime. These H-calls serve a similar purpose to syscalls in operating systems, namely provide a programming interface for relatively-unprivileged programs to access services managed by the privileged runtime/operating system. Whilst a typical desktop operating system may provide hundreds of different syscalls, the Veracruz runtime provides a very limited interface to programs. The following table describes this interface in its entirety:

H-call name Description
getrandom(buf, sz) Writes sz bytes of random data, taken from a trusted entropy source, into the buffer buf. The precise source used differs depending on the particular containerisation technology being used by the Veracruz computation. If no suitable entropy source is available, or if random number generation fails for some reason, the H-call returns an appropriate error.
input_count() Returns the number of secret inputs, provisioned by the data providers, that can be accessed by the program. This number will always match the number of data sources specified by the global policy, and is used to programmatically discover that number, so that programs parametric in the number of inputs can be written.
input_size(idx) Returns the size, in bytes, of input idx. Note that data providers can provision their data sets in an arbitrary order, as part of the Veracruz provisioning process. However, once all data sets are provisioned, the runtime sorts the data sets to match the order specified in the global policy. As a consequence, the result of this H-call is always well-defined. Returns an error if idx is greater than the number of data sets provisioned by the data providers.
get_input(buf, sz, idx) Reads input idx into the buffer buf with size sz. Returns an error if idx is greater than the number of data sets provisioned by the data providers, or if the size of input idx is grater than sz.
write_result(buf, sz) Writes the result of a computation by copying the content of buffer buf with size sz into a buffer in the trusted Veracruz runtime. Returns an error if a result has already previously been written.

The table above is provided for illustrative purposes, only: the Veracruz H-call layer is explicitly not stable, and may change over time. As is the case with a typical operating system, programmers are not expected to invoke the trusted Veracruz runtime's H-calls directly from application code, but are instead expected to make use of an abstraction layer. With Unix-family operating systems, this abstraction layer is libc; for Veracruz, it is libveracruz which provides a stable layer over the Veracruz H-call API, and also provides higher-level functions, more amenable for use in typical application code, that build on top of the raw H-call layer.

At the time of writing, a version of libveracruz is only provided for the Rust programming language, and therefore all Veracruz programs must be written in Rust — or rather, it's only convenient to write Veracruz programs in Rust without manually invoking the unstable H-call API. Support for other languages, including C and C++, is planned but not a priority at present.

In the next subsection, we walk through a simple Veracruz program, written using libveracruz, and make reference to various notable aspects of the Veracruz programming model.

Case-study: linear regression

We will now walk through a simple Veracruz program, built using libveracruz, which expects a single data source of binary-encoded 64-bit floating point values as input. The program will then perform a linear regression on the input data set, before returning a binary-encoded answer as its result. Note that this example is both complete and executable on Veracruz, and can be found in the $VERACRUZ/sdk/examples/linear-regression directory.

#![no_std]
extern crate alloc;
extern crate veracruz_rt;
use alloc::vec::Vec;
use libveracruz::{generate_write_result, host, return_code};
use serde::Serialize;

For this example, we do not need to link against the full Rust standard library, std, but rather a subset of this library called alloc which assumes the presence of a global allocator, and provides a host of useful data structures that need to allocate memory. As a result, we mark the code as #![no_std] using the attribute on the first line, above. Veracruz supports programming with or without the Rust standard library, as is appropriate. In !#[no_std] contexts, we also provide a support library, veracruz_rt, which can be linked against the program, and sets up a global allocator and other low-level tasks, for convenience.

fn read_input() -> Result<Vec<(f64, f64)>, i32> {
    if host::input_count() != 1 {
         return return_code::fail_invariant_failed();
    } else {
        let dataset: Vec<u8> = host::read_input(0).unwrap();
        match pinecone::from_bytes(&dataset) {
            Err(_err) => return_code::fail_bad_input(),
            Ok(dataset) => Ok(dataset)
        }
    }
}

Above, the program uses functions exported from the host module of libveracruz to first query the number of inputs that have been provisioned by the data providers, and are therefore available to it. In this instance, the program is hardwired to expect a single input — other programs can work parametrically in the number of inputs, as appropriate — and returns a suitable error code if this is not the case.

Once this test is completed, the program grabs the raw bytes of the first input (at index 0) from the Veracruz runtime. This step should never fail, as we've already checked that an input is available, so the call to read_input(0).unwrap() is safe. In our examples and test code, we've fixed on using the Rust pinecone library as a way of serializing and deserializing Rust data structures to-and-from raw bytes. We therefore try to deserialize the collection of pairs of 64-bit floats, returning an error code if this fails.

Generally speaking, the Veracruz runtime doesn't care how inputs and outputs are serialized, as all it sees is unstructured byte data and any structured imposed on these bytes is something computation participants need to negotiate out-of-band, before the computation begins.

Below, we introduce a simple data structure that will capture the result of our linear regression algorithm, if it is successful:

#[derive(Serialize)]
struct LinearRegression {
    gradient: f64,
    intercept: f64,
}

Note that, above, we use the Serialize trait with the LinearRegression struct. This is a standard trait from the Rust Serde library, a common framework for serializing and deserializing data structures. Moreover, earlier, we used the pinecone library, which again is another "off-the-shelf" Rust library. Any Rust library that can be compiled to WASM can be used when programming Veracruz computations — to a first approximation, this means any Rust library written in pure Rust, and not depending on external C libraries, or any library not depending on operating system-specific functionality, such as devices or filesystems, can be used with Veracruz.

The code below is the meat of the linear regression implementation. This is straightforward, not tied to Veracruz, and is left here for completeness. We offer no comment on this code.

fn means(dataset: &[(f64, f64)]) -> (f64, f64) {
    let mut xsum: f64 = 0.0;
    let mut ysum: f64 = 0.0;
    let length = dataset.len();
    for (x, y) in dataset.iter() {
        xsum += *x; ysum += *y;
    }
    (xsum / length as f64, ysum / length as f64)
}
fn linear_regression(data: &[(f64, f64)]) -> LinearRegression {
    let (xmean, ymean) = means(&data);
    let mut n: f64 = 0.0;
    let mut d: f64 = 0.0;
    for datum in data {
        n += (datum.0 - xmean) * (datum.1 - ymean);
        d += (datum.0 - xmean) * (datum.0 - xmean);
    }
    LinearRegression {
      gradient: n / d,
      intercept: ymean - (n / d) * xmean,
    }
}

We now come to the entry point of the Veracruz computation. Like all standard Rust and C programs, a Veracruz program uses main() as its entry:

fn main() -> return_code::Veracruz {
    let data = read_input()?;
    let result = linear_regression(&data);
    write_result(result)
}

Our libveracruz offers a range of pre-baked error codes that can be used for signalling failures, of different kinds, of a Veracruz computation. We've already seen some above in read_inputs(). These errors are "floated up" to main(), and used as the return code of the program. The Veracruz runtime can recognize these error codes, and forward them as appropriate to participants of the computation.

The main function above captures the typical form of a Veracruz computation:

  1. Inputs are checked, grabbed from the runtime, and deserialized.
  2. The main algorithm is run on the deserialized inputs, with the result serialized into the correct, negotiated form.
  3. The result is written back to the Veracruz runtime. Here, this "write back" step is achieved by calling the synthesized write_result() function, discussed above, with the result.

Veracruz programs are compiled for a custom 32-bit WASM compile target, called wasm32-arm-veracruz. A target specification file is provided as part of the Veracruz SDK.

Testing with our freestanding execution engine

It would be awkward to have to develop and debug a program, like the one above, in a distributed setting. As a result the Veracruz SDK provides a freestanding version of the execution engine used inside the Veracruz runtime for offline testing before deployment. This can be found in $VERACRUZ/sdk/freestanding-chihuahua. We can execute our linear regression example, above, using this as follows:

$ RUST_LOG=info
  ./target/release/freestanding-chihuahua
    --program ../examples/linear-regression/target/wasm32-arm-veracruz/release/linear-regression.wasm
    --data ../datasets/melbourne-houses-distance-price.dat
    --execution-strategy jit

The program successfully executes, returning a success error code, and produces 16 bytes of pinecone-encoded data corresponding to a LinearRegression structure.

Random number generation

Many prospective Veracruz programs depend on the ability to generate random numbers, and Rust provides a series of programmer-friendly libraries for working with random number sources, generating collections of random numbers, and working with different distributions of random numbers.

As part of the Veracruz SDK, we have ported the standard Rust getrandom and rand crates to the wasm32-arm-veracruz target, making these libraries available to programmers wishing to target Veracruz. Note that our fork of getrandom ultimately calls out to the getrandom() H-call provided by the Veracruz runtime described above.

Why not WASI?

The WebAssembly System Interface (WASI, henceforth) aims to provide a POSIX-style interface to system resources for WASM programs. We do not use this interface, instead preferring our own, as described above as for our purposes WASI is overkill. A Veracruz program is relatively constrained, reading inputs, processing them, and writing outputs. It cannot open files on a delegate's machine, by design, so large chunks of the WASI API are completely irrelevant, for Veracruz's purposes.

Whilst we could stub out the aspects of WASI that are irrelevant for our purposes, we instead take an alternative approach, beginning with a minimalist host API, and slowly adding features when there is a demonstrated need for them.