-
Notifications
You must be signed in to change notification settings - Fork 92
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: Xynnn007 <[email protected]>
- Loading branch information
Showing
1 changed file
with
281 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,281 @@ | ||
# Initdata Specification | ||
|
||
The Initdata Specification defines the key data structure and algorithms | ||
to inject arbitrary data from untrusted host into TEE. To guarantee the | ||
integrity of that data, TEE evidence's hostdata ability or (v)TPM dynamic | ||
measurement ability will be leveraged. | ||
|
||
## Introduction | ||
|
||
TEE gives users an isolated execution environment to prevent untrusted | ||
hosts and external software stacks from eavesdropping and tampering with user | ||
data in use within the TEE. | ||
|
||
Remote attestation technology verifies whether the footprint of software | ||
running in the TEE meets expectations. The softwares to be measured are | ||
often provided by hardware vendor (like the firmware and tcb security version | ||
of TEE hardware) or software vendor (like guest kernel for VMs). These | ||
components are relatively static, which means they may be the same among | ||
multiple deployments | ||
|
||
In some scenarios, users would inject some other information like | ||
[policy files for kata](https://github.com/kata-containers/kata-containers/blob/main/docs/how-to/how-to-use-the-kata-agent-policy.md), | ||
[configuration files for components running in guest](https://github.com/confidential-containers/guest-components/tree/main/confidential-data-hub#configuration-file), | ||
[identity files to specify the identity of TEE](https://github.com/keylime/rust-keylime/blob/master/keylime-agent.conf) | ||
into the TEE guest when launching. | ||
|
||
Compared with static software running in TEE (like guest firmware for TDX VM, | ||
libos for SGX enclave), these information changes dynamically between different | ||
deployments and are usually configurations | ||
We call these information or configurations _Initdata_. Initdata mechanism will | ||
provides a way to protect their integrity by remote attestation. One thing to note | ||
is that the confidentiality will not be protected by initdata mechanism because | ||
the untrusted host can see the plaintext of the data. | ||
|
||
To achieve this goal, we defined the following things | ||
- A data structure named **Initdata Metadata**. This structure is provided by the | ||
user to contain any initdata in key-value format to untrusted host to inject into | ||
the TEE when launching. We do not limit the encoding of this data structure, which | ||
means that JSON and TOML are optional. This will be introduced in [Initdata Metadata](#initdata-metadata) | ||
- A data integrity binding mechanism. It will guide the untrusted host to bind the | ||
digest of the `data` part in the Initdata Metadata to the hardware TEE-specific | ||
field in evidence. This field will be checked by the verifier during the remote | ||
attestation. This will be introduced in | ||
[Integrity Binding for Different TEEs](#integrity-binding-for-different-tees) | ||
- A data serialization method. This method is used to serialize the Initdata Metadata | ||
into a canonicalized one, which will help to get consistent cryptographic hash for | ||
different forms of a same Initdata Metadata. This will be introduced in | ||
[Data Canonicalization Algorithm](#data-canonicalization-algorithm). | ||
|
||
This spec does not define how the Initdata Metadata will be delivered into the TEE. | ||
Different projects will have its own way to do this. For Confidential Containers, | ||
we will use kata-runtime and kata-agent to collaborate to achieve this function. | ||
|
||
## Terminology | ||
|
||
This section will introduce the terminology used in this spec to avoid ambiguity. | ||
|
||
- `Initdata`: Data that needs to be injected when the TEE is started. This data | ||
requires integrity protection but does not require confidentiality protection. | ||
- `Initdata Metadata`: A data structure that includes initdata and other information | ||
that will help to calculate the initdata digest. | ||
- `Initdata digest`: Digest of the initdata. It will be used as the value of the | ||
TEE HOSTDATA field. | ||
- `HOSTDATA`: Fields that can be bound to a specific TEE instance. This field | ||
information will be included in the TEE-signed remote attestation report. Typically, | ||
Intel TDX's `mr_config_id`, AMD SNP's `hostdata` and Arm CCA's `CCA_REALM_PERSONALIZATION_VALUE`. | ||
In order to avoid confusion with the hostdata field of AMD SNP, when we do not | ||
emphasize a specific platform s.t. SNP, we are referring to the corresponding fields of | ||
various TEE platforms. | ||
|
||
|
||
## Specifications | ||
|
||
### Initdata Metadata | ||
|
||
Initdata Metadata defines a standardized initdata data format. Please note that it | ||
does not indicate the specific encoding format, but requires that the encoding format | ||
must support the expression of key-value data pairs. Typical encodings that meet | ||
this requirement include JSON, TOML and YAML, etc. | ||
|
||
An Initdata Metadata SHOULD have the following fields | ||
- `version`: The format version of the initdata metadata. Version number will provide | ||
extensibility. The definition in this spec is all `0.1.0`. | ||
- `algorithm`: The hash algorithm to calculate the value to set as `HOSTDATA`. The typical | ||
algorithms are `SHA-256`, `SHA-384`, `SHA-512`. The names are consistent with that used | ||
in [RFC 6234](https://datatracker.ietf.org/doc/rfc6234/). | ||
- `data`: a key-value map. Including the concrete content of initdata. Nested maps[^1] are | ||
also supported. | ||
- `digest`: the digest of the canonicalized `data` field using hash algorithm specified by | ||
`algorithm`. Note that the `digest` itself is not integrity protected in the current | ||
specification, so it may be tampered with. Therefore, this field itself can only be used as | ||
a reference. It is **NOT** recommended to rely on this field in the software stack in a specific TEE. | ||
|
||
[^1]: Nested maps looks like | ||
```json | ||
{ | ||
"data": { | ||
"key1": "value1", | ||
"object": { | ||
"key2": "value2" | ||
} | ||
} | ||
} | ||
``` | ||
|
||
#### Examples for Different Encodings | ||
|
||
Suppose there is an initdata metadata with the following values | ||
- `version`: `0.1.0` | ||
- `algorithm`: `sha384` | ||
- `data`: there are two objects. The first's key name is `attestation-agent.json` and the value | ||
is a string of a JSON. The second's key name is `policy.rego` and the value is a string of a rego file. | ||
- `digest`: related digest of `data`. | ||
|
||
##### JSON version | ||
|
||
The JSON version initdata metadata looks like the following | ||
```json | ||
{ | ||
"algorithm": "sha384", | ||
"version": "0.1.0", | ||
"data": { | ||
"attestation-agent.json": "{\"aa_kbc_params\": \"cc_kbc::http://127.0.0.1:8080\"}", | ||
"policy.rego": "package agent_policy\nimport future.keywords.in\nimport future.keywords.every\nimport input\n\n# Default values, returned by OPA when rules cannot be evaluated to true.\ndefault CopyFileRequest := false\ndefault CreateContainerRequest := false\ndefault CreateSandboxRequest := true\ndefault DestroySandboxRequest := true\ndefault ExecProcessRequest := false\ndefault GetOOMEventRequest := true\ndefault GuestDetailsRequest := true\ndefault OnlineCPUMemRequest := true\ndefault PullImageRequest := true\ndefault ReadStreamRequest := false\ndefault RemoveContainerRequest := true\ndefault RemoveStaleVirtiofsShareMountsRequest := true\ndefault SignalProcessRequest := true\ndefault StartContainerRequest := true\ndefault StatsContainerRequest := true\ndefault TtyWinResizeRequest := true\ndefault UpdateEphemeralMountsRequest := true\ndefault UpdateInterfaceRequest := true\ndefault UpdateRoutesRequest := true\ndefault WaitProcessRequest := true\ndefault WriteStreamRequest := false" | ||
}, | ||
"digest": "e4744fddbb00dd4201326d16ee5a647debf86562a77f1aca176d23017c1cf88821af5c05c03b1ac9b46afa93b9a1d368" | ||
} | ||
``` | ||
|
||
it would involve a lot of escape characters. JSON is better to set simple key | ||
values, like the following | ||
```json | ||
{ | ||
"algorithm": "sha384", | ||
"version": "0.1.0", | ||
"data": { | ||
"key1": "value1", | ||
"key2": "value2" | ||
}, | ||
"digest": "0d25829cc97872f0086fbb7eec3a08c0b76899986dc135e9545434ae9eba7d48196efa7323519f00384881b659c02087" | ||
} | ||
``` | ||
|
||
##### TOML version | ||
|
||
If you want to avoid escape characters and use the plaintext of ascii file contents as initdata, | ||
TOML format will be better. A TOML version initdata metadata looks like the following | ||
```toml | ||
algorithm = "sha384" | ||
version = "0.1.0" | ||
digest = "e4744fddbb00dd4201326d16ee5a647debf86562a77f1aca176d23017c1cf88821af5c05c03b1ac9b46afa93b9a1d368" | ||
|
||
[data] | ||
"attestation-agent.json" = ''' | ||
{ | ||
"kbs_addr": "http://172.18.0.1:8080" | ||
} | ||
''' | ||
|
||
"policy.rego" = ''' | ||
package agent_policy | ||
import future.keywords.in | ||
import future.keywords.every | ||
import input | ||
# Default values, returned by OPA when rules cannot be evaluated to true. | ||
default CopyFileRequest := false | ||
default CreateContainerRequest := false | ||
default CreateSandboxRequest := true | ||
default DestroySandboxRequest := true | ||
default ExecProcessRequest := false | ||
default GetOOMEventRequest := true | ||
default GuestDetailsRequest := true | ||
default OnlineCPUMemRequest := true | ||
default PullImageRequest := true | ||
default ReadStreamRequest := false | ||
default RemoveContainerRequest := true | ||
default RemoveStaleVirtiofsShareMountsRequest := true | ||
default SignalProcessRequest := true | ||
default StartContainerRequest := true | ||
default StatsContainerRequest := true | ||
default TtyWinResizeRequest := true | ||
default UpdateEphemeralMountsRequest := true | ||
default UpdateInterfaceRequest := true | ||
default UpdateRoutesRequest := true | ||
default WaitProcessRequest := true | ||
default WriteStreamRequest := false''' | ||
``` | ||
|
||
### Integrity Binding for Different TEEs | ||
|
||
There are multiple ways to binding the integrity of initdata to the TEE evidence. | ||
Many TEE platforms supports HOSTDATA field. The HOSTDATA field could be set by | ||
the untrusted host when launching the TEE, and the field will be included in the | ||
TEE evidence for remote attestation. | ||
|
||
Platforms and corresponding field of the evidence | ||
- Intel TDX: `mr_config_id`, 48 bytes. Actually `mr_owner` and `mr_owner_config` have similiar | ||
attributes, but we select only `mr_config_id` for such use. | ||
- AMD SNP: `hostdata`, 32 bytes. | ||
- Arm CCA: `CCA_REALM_PERSONALIZATION_VALUE`, 64 bytes. | ||
- Intel SGX: `CONFIGID`, 64 bytes. | ||
|
||
When users want to deploy a TEE, they need to prepare an initdata metadata. The host | ||
(probably untrusted) SHOULD start TEE instance with metadata's `digest` as hostdata | ||
(or calculate `digest` with `algorithm` and `data`). | ||
|
||
The software outside the TEE would deliver the initdata metadata into the TEE in | ||
some way due to concrete architecture design. This spec does not define the concrete | ||
way to deliver initdata metadata, but strongly recommend that before using the received | ||
initdata, software stack inside the TEE MUST check whether the digest of the received | ||
initdata is aligned with the one inside evidence. | ||
|
||
Other platforms, such as (v)TPM based platforms, can record the initdata digest | ||
by extending the PCR before using it. This way will also accomplish the integrity binding | ||
to the TEE evidence. | ||
|
||
### Data Canonicalization Algorithm | ||
|
||
When we use JSON or Toml to represent an initdata metadata, control characters like | ||
line breaks that do not affect the semantics but will cause the original initdata | ||
metadata content to change. This fact will make it difficult for us to calculate a | ||
determined digest from different forms of a same initdata. We need a canonicalization | ||
algorithm. With this algorithm, all different forms of a same initdata will be calculated | ||
into a same digest. The difference of forms at least include the following | ||
|
||
1. The encoding of the initdata metadata, e.g. JSON and TOML. | ||
2. Initdata metadata with different expressions but the same semantics, e.g. multiple line | ||
JSON vs one line JSON like the following | ||
```json | ||
{ | ||
"algorithm": "sha384", | ||
"version": "0.1.0", | ||
"data": { | ||
"key1": "value1", | ||
"key2": "value2" | ||
}, | ||
"digest": "0d25829cc97872f0086fbb7eec3a08c0b76899986dc135e9545434ae9eba7d48196efa7323519f00384881b659c02087" | ||
} | ||
``` | ||
and | ||
```json | ||
{"algorithm":"sha384","version":"0.1.0","data":{"key1":"value1","key2":"value2"},"digest":"0d25829cc97872f0086fbb7eec3a08c0b76899986dc135e9545434ae9eba7d48196efa7323519f00384881b659c02087"} | ||
``` | ||
|
||
The algorithm has 2 steps as following | ||
|
||
1. Transform the original initdata metadata's `data` field into JSON. This step requires converting | ||
different forms of initdata metadata into JSON without changing the semantics. This is practical | ||
because JSON, TOML and YAML all support key-value, string, and nested formats. | ||
2. Canonicalize the JSON resulted in step 1. The canonicalization follows [RFC 8785](https://www.rfc-editor.org/rfc/rfc8785). This includes removing extra whitespace characters and ordering of object members. | ||
|
||
After performing the algorithm, the result string can be used to apply hash `algorithm` and get the `digest`. | ||
|
||
## Use cases | ||
|
||
### Confidential Containers | ||
|
||
In Confidential Containers, we need initdata to deliver the following configurations into the | ||
guest. | ||
- KBS public key and address | ||
- Kata agent policy | ||
- Configuration files for Attestation Agent, Confidential Data Hub and probably kata-agent in future. | ||
|
||
The initdata metadata will be inject into guest by kata-runtime via kata-agent's API. After receiving | ||
initdata metadata, kata-agent will calculate the digest it self due to the metadata, and call Attestation | ||
Agent's `CheckInitData()` api to check if the received initdata's integrity is aligned with the one | ||
inside the evidence. Next, it will use the key-value pairs recorded in the data as the file name and | ||
content, and place them in the `/run/confidential-containers/initdata` directory. | ||
|
||
The encoding format is TOML in order to facilitate viewing and modifying specific content. | ||
|
||
#### Fields of Data | ||
|
||
To prevent malicious host inject any files especially binaries into the TEE, only defined `key`s in | ||
`data` can be accepted. The defined `key`s including the following: | ||
- `attestation-agent.json`: The configuration file of Attestation Agent. | ||
- `confidential-data-hub.toml`: The configuration file of Confidential Data Hub. | ||
- `policy.rego`: The rego policy file used by kata. |