Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SecureComms: Add Support for activating using InitData #2072

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

davidhadas
Copy link
Member

@davidhadas davidhadas commented Sep 30, 2024

See:

Add apj.json to InitData
apf.josn includes an sc key used to activate secure comms (if not already activated using an agent-protocol-forwarder.service flag)

@davidhadas davidhadas requested a review from a team as a code owner September 30, 2024 10:10
@davidhadas davidhadas force-pushed the secComms_initData branch 2 times, most recently from 8630009 to 206ce0f Compare October 7, 2024 12:37
@davidhadas
Copy link
Member Author

cc: @bpradipt

davidhadas added a commit to davidhadas/cloud-api-adaptor that referenced this pull request Oct 8, 2024
Install KBS and test SecureComms with KBS
Based on confidential-containers#2072 which should be merged first

Signed-off-by: David Hadas <[email protected]>
davidhadas added a commit to davidhadas/cloud-api-adaptor that referenced this pull request Oct 8, 2024
Install KBS and test SecureComms with KBS
Based on confidential-containers#2072 which should be merged first

Signed-off-by: David Hadas <[email protected]>
davidhadas added a commit to davidhadas/cloud-api-adaptor that referenced this pull request Oct 8, 2024
Install KBS and test SecureComms with KBS
Based on confidential-containers#2072 which should be merged first

Signed-off-by: David Hadas <[email protected]>
Copy link
Member

@stevenhorsman stevenhorsman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments. It would be good to fix the typos in the commit message too.

@@ -52,6 +54,8 @@ func load(path string, obj interface{}) error {
return fmt.Errorf("failed to decode a Agent Protocol Forwarder config file file: %s: %w", path, err)
}

logger.Printf("succesful loading config from %s\n", path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to say "successfully loaded config..."?

@@ -114,7 +114,7 @@ func (cfg *daemonConfig) Setup() (cmd.Starter, error) {
flags.BoolVar(&secureComms, "secure-comms", false, "Use SSH to secure communication between cluster and peer pods")
flags.StringVar(&secureCommsInbounds, "secure-comms-inbounds", "", "Inbound tags for secure communication tunnels")
flags.StringVar(&secureCommsOutbounds, "secure-comms-outbounds", "", "Outbound tags for secure communication tunnels")
flags.StringVar(&secureCommsKbsAddr, "secure-comms-kbs", "kbs-service.kbs-operator-system:8080", "Address of a KBS Service for Secure-Comms")
flags.StringVar(&secureCommsKbsAddr, "secure-comms-kbs", "kbs-service.trustee-operator-system:8080", "Address of a KBS Service for Secure-Comms")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trustee operator namespace changes should be in a separate commit with an explanation that it's triggered by the change in the trustee-operator project

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just noticed a bunch of these change are in #2073. Is this PR supposed to depend on that one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - #2073 should be merged first

Comment on lines 40 to 45
```sh
kubectl get secrets -n trustee-operator-system
NAME TYPE DATA AGE
kbs-auth-public-key Opaque 1 28h
kbs-client Opaque 1 28h
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for this command being added?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just seen this code is under #2065 now. What's going on with these PR and their duplication of code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a slight improvement to the SecureComms doc which shows the correct result after following the instructions trustee operator and following the recommendation: "Make sure to uncomment the secret generation as recommended for both public and private key (kbs-auth-public-key and kbs-client secrets). "

We can add it to #2073 if it will make things clearer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to leave this extra documentation detail as is, although it also appears in #2065 - unless someone finds this very disturbing.

kubectl -n confidential-containers-system get cm peer-pods-cm -o yaml | sed "s/SECURE_COMMS: \"false\"/SECURE_COMMS: \"true\"/"|kubectl apply -f -
```

Set InitData to point KBC services to IP address 127.0.0.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have a heading like Build a podvm that enforces Secure-Comms (Optional) section does as it's an alternative to it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some expansion of the explanation of what this is doing and why would be nice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be adapting the documentation based on this comment. Please reconsider the next version.

src/cloud-api-adaptor/docs/SecureComms.md Outdated Show resolved Hide resolved
src/cloud-api-adaptor/docs/SecureComms.md Outdated Show resolved Hide resolved
src/cloud-api-adaptor/docs/SecureComms.md Outdated Show resolved Hide resolved
src/cloud-api-adaptor/docs/SecureComms.md Show resolved Hide resolved
@@ -114,7 +114,7 @@ func (cfg *daemonConfig) Setup() (cmd.Starter, error) {
flags.BoolVar(&secureComms, "secure-comms", false, "Use SSH to secure communication between cluster and peer pods")
flags.StringVar(&secureCommsInbounds, "secure-comms-inbounds", "", "Inbound tags for secure communication tunnels")
flags.StringVar(&secureCommsOutbounds, "secure-comms-outbounds", "", "Outbound tags for secure communication tunnels")
flags.StringVar(&secureCommsKbsAddr, "secure-comms-kbs", "kbs-service.kbs-operator-system:8080", "Address of a KBS Service for Secure-Comms")
flags.StringVar(&secureCommsKbsAddr, "secure-comms-kbs", "kbs-service.trustee-operator-system:8080", "Address of a KBS Service for Secure-Comms")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just noticed a bunch of these change are in #2073. Is this PR supposed to depend on that one?

Comment on lines 40 to 45
```sh
kubectl get secrets -n trustee-operator-system
NAME TYPE DATA AGE
kbs-auth-public-key Opaque 1 28h
kbs-client Opaque 1 28h
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just seen this code is under #2065 now. What's going on with these PR and their duplication of code?

Copy link
Collaborator

@mkulke mkulke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that kata is proceeding with an init-data approach that is passed from the runtime to the agent via a SetInitData() RPC call for the upcoming CoCo release.

This looks like a CAA-specific extension to InitData. If the project adopts kata's init-data approach, would this still work?

@davidhadas
Copy link
Member Author

@huoqifeng, can you please review this PR as it extends your work in #1895

@davidhadas
Copy link
Member Author

davidhadas commented Oct 21, 2024

@mkulke, thanks for pointing this,

I understand that kata is proceeding with an init-data approach that is passed from the runtime to the agent via a SetInitData() RPC call for the upcoming CoCo release.

This looks like a CAA-specific extension to InitData. If the project adopts kata's init-data approach, would this still work?

This PR extends #1895 which introduced a CAA-specific extension to InitData. Here we added configuration of the APF to the already merged configuration of AA and CDH.

Regardless, it seems that the SetInitData() RPC call cannot be used under peer-pods (even without this PR) - see kata-containers/kata-containers#10163 (comment)

@mkulke
Copy link
Collaborator

mkulke commented Oct 21, 2024

@mkulke, thanks for pointing this,

I understand that kata is proceeding with an init-data approach that is passed from the runtime to the agent via a SetInitData() RPC call for the upcoming CoCo release.
This looks like a CAA-specific extension to InitData. If the project adopts kata's init-data approach, would this still work?

This PR extends #1895 which introduced a CAA-specific extension to InitData. Here we added configuration of the APF to the already merged configuration of AA and CDH.

Regardless, it seems that the SetInitData() RPC call cannot be used under peer-pods (even without this PR) - see kata-containers/kata-containers#10163 (comment)

Can you elaborate on why a SetInitData() RPC wouldn't work for CAA? I think I POC'd that approach when the init-data design was being discussed to make sure it would work and I didn't run into issues.

My understanding is that agent-protocol-forwarder, daemon.json and user-data are implementation details of CAA that are responsible for setting the stage to allow runtime <=> agent communication and kata should ideally not be concerned about those.

@stevenhorsman
Copy link
Member

Can you elaborate on why a SetInitData() RPC wouldn't work for CAA? I think I POC'd that approach when the init-data design was being discussed to make sure it would work and I didn't run into issues.

In your prototype did you keep the guest-components managed by systemd, or the kata-agent? It seems to be that in if the kata-agent isn't managing them and they are started before the kata-agent then relying on config from kata-agent endpoint doesn't really make sense as we then have this weird undefined start until the kata is up and running and receiving the setInitData request.

In fairness I'm still not sure why bare-metal doesn't adopt the approach of systemd managing the processes like the peer pod does anyway to avoid this either, so I'm obviously missing something.

@davidhadas
Copy link
Member Author

davidhadas commented Oct 21, 2024

@mkulke,

Can you elaborate on why a SetInitData() RPC wouldn't work for CAA?

To add to what @stevenhorsman indicated, we use attestation when connecting CAA to APF in SecureComms.
To do attestation, we need the measurements of the configuration.
To get the measurements, we need InitData.
We connect runtime to kata-agent only after the attestation is done and keys delivered - keys we are using to complete the establishment of the CAA to APF secure communication channel.

I would assume that allowing someone to change the InitData (using the kata agent RPC) after attestation was already done and after keys delivered to the podvm, is not what we want to do. Am I missing something?

@mkulke
Copy link
Collaborator

mkulke commented Oct 21, 2024

In your prototype did you keep the guest-components managed by systemd, or the kata-agent? It seems to be that in if the kata-agent isn't managing them and they are started before the kata-agent then relying on config from kata-agent endpoint doesn't really make sense as we then have this weird undefined start until the kata is up and running and receiving the setInitData request.

In fairness I'm still not sure why bare-metal doesn't adopt the approach of systemd managing the processes like the peer pod does anyway to avoid this either, so I'm obviously missing something.

I would have to dig a bit to find the details, but I think in the PoC code kata-agent was talking via ttRPC to attestation-agent to update the configuration, so it was indeed half-initialized. however, that endpoint is about to be removed as I understand, though. somehow AA will now have to perform a binding to the TEE itself (e.g. compare host-data, extend an init-data PCR).

@mkulke
Copy link
Collaborator

mkulke commented Oct 21, 2024

We connect runtime to kata-agent only after the attestation is done and keys delivered - keys we are using to complete the establishment of the CAA to APF secure communication channel.

I understand this applies if you use the secure communications feature, but in a default CAA installation there is no attestation ceremony before runtime <=> agent communication is established, afaik?

Add apj.json to InitData
apf.josn includes a secure-comms key used to activate secure comms (if not
already activated using an agent-protocol-forwarder.service flag)

Signed-off-by: David Hadas <[email protected]>
@davidhadas
Copy link
Member Author

@mkulke, @stevenhorsman, @bpradipt

This PR follows on the footsteps of the initData support that already exists in peer pods to-date.
While we have a discussion if to change this mechanism some time in the future to follow what will be done by Kata RPC etc., is there any impediment to continue supporting the existing mechanism that we have till now?
Is there any impediment to adjust it to also deliver the secureComms flag?

I would like to suggest that whatever implementation we will have for initData, we will still need to add this flag to enable/disable secureComms using measured initData (or whatever we choose to call it), so adding it now seems like a progress in the right direction.

The two benefits of adding it now and not waiting is that (1) it will allow e2e testing of the secureComms with a complete attestation, and key retrieval phase, (2) it will allow using/testing secureComms by users/downstream without having a dedicated podvm image as needed today.

I would prefer to move forward and add the e2e tests regardless of how the future measured initData will be implemented in peer-pods and by following the same mechanism that works today. Does this make sense?

@mkulke
Copy link
Collaborator

mkulke commented Nov 12, 2024

@mkulke, @stevenhorsman, @bpradipt

This PR follows on the footsteps of the initData support that already exists in peer pods to-date. While we have a discussion if to change this mechanism some time in the future to follow what will be done by Kata RPC etc., is there any impediment to continue supporting the existing mechanism that we have till now? Is there any impediment to adjust it to also deliver the secureComms flag?

afaiu the init-data feature will be in next release of coco and unless there will be a change of plans the agent will receive init-data via RPC then. That means whatever we are doing in terms of initdata on the daemonset and podvm is being redundant/conflicting (since a CAA release is synced with kata/guest-components/operator).

To a coco user this change should (mostly) be transparent, but internally we would have to remove the related code and AA + CDH would only run after the agent has received init-data.

in this context I would not recommend building anything on top of the initdata impl in peerpods, currently.

@huoqifeng
Copy link

@huoqifeng, can you please review this PR as it extends your work in #1895

Sorry my late, I'll have a look...


[token_configs.kbs]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As KBS endpoint is different when enabling SecureComms, shall we also add link from ./initdata.md to this file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants