Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spike: uki image revocation and rollback #2429

Closed
Tracked by #1792
mudler opened this issue Apr 4, 2024 · 40 comments · Fixed by kairos-io/kairos-docs#209
Closed
Tracked by #1792

spike: uki image revocation and rollback #2429

mudler opened this issue Apr 4, 2024 · 40 comments · Fixed by kairos-io/kairos-docs#209
Assignees

Comments

@mudler
Copy link
Member

mudler commented Apr 4, 2024

Is your feature request related to a problem? Please describe.
If a vulnerability is found in older images, we might want to disable the access to the encrypted portion of the disk for certain images. Similarly, if encryption keys are leaked, we would like to have a mechanism that allows or either to update the system to use a new key (by for instance using the old keys to update to new ones), or just invalidate the portion of the stack that is responsible to unencrypt the disk.

Describe the solution you'd like
A way to use old keys to generate an upgrade image that installs the new ones. Alternatively a mechanism that allows with an upgrade image to invalidate older images.

Describe alternatives you've considered

Additional context

@mudler mudler added spike uki triage Add this label to issues that should be triaged and prioretized in the next planning call labels Apr 4, 2024
@mudler mudler mentioned this issue Apr 4, 2024
33 tasks
@Itxaka
Copy link
Member

Itxaka commented Apr 4, 2024

SBAT was made for this I think: https://github.com/rhboot/shim/blob/main/SBAT.md

@mudler
Copy link
Member Author

mudler commented Apr 4, 2024

SBAT was made for this I think: https://github.com/rhboot/shim/blob/main/SBAT.md

by reading it looks like https://0pointer.de/blog/brave-new-trusted-boot-world.html took some inspiration from, at least looks like similarly a concept of an incremental counter is required. What is not clear is how that ties to the sealing process.

From the Lennart's article he mention that when sealing against a TPM PCR Bank it is not only the PCR policy, but also the counter is used - which I'm afraid it will make it incompatible with the current installs (I also can't find a reference on systemd-* about that, so looks like something that can't be - yet - used?)

@jimmykarily jimmykarily moved this to In Progress 🏃 in 🧙Issue tracking board Apr 15, 2024
@jimmykarily jimmykarily moved this from In Progress 🏃 to Todo 🖊 in 🧙Issue tracking board Apr 15, 2024
@jimmykarily jimmykarily removed the triage Add this label to issues that should be triaged and prioretized in the next planning call label Apr 15, 2024
@ci-robbot ci-robbot added the triage Add this label to issues that should be triaged and prioretized in the next planning call label Apr 15, 2024
@ci-robbot
Copy link
Collaborator

Hello! I'm a bot, an experiment of @mudler and @jimmykarily. Thank you for opening this issue. It seems that your issue, titled "spike: uki image revocation and rollback" (#2429) in the "kairos" repository owned by "kairos-io", is not completely clear regarding the relationship with the sealing process and the compatibility with current installations.

As you provided some additional context and potential alternatives like SBAT and the concept of using TPM counters, we can better understand the problem you're trying to address. However, more details are needed to meet the project's requirements. Please provide steps to reproduce (if it's a bug), and mention the versions of the relevant artifacts being used.

Once you provide the necessary information, we will be able to classify your issue more accurately and assign the appropriate label. Thank you for your patience and understanding, and please remember that I'm just a bot.

Best regards!

@Itxaka
Copy link
Member

Itxaka commented Apr 15, 2024

Hello! I'm a bot, an experiment of @mudler and @jimmykarily. Thank you for opening this issue. It seems that your issue, titled "spike: uki image revocation and rollback" (#2429) in the "kairos" repository owned by "kairos-io", is not completely clear regarding the relationship with the sealing process and the compatibility with current installations.

As you provided some additional context and potential alternatives like SBAT and the concept of using TPM counters, we can better understand the problem you're trying to address. However, more details are needed to meet the project's requirements. Please provide steps to reproduce (if it's a bug), and mention the versions of the relevant artifacts being used.

Once you provide the necessary information, we will be able to classify your issue more accurately and assign the appropriate label. Thank you for your patience and understanding, and please remember that I'm just a bot.

Best regards!

@ci-robbot hello there, how do I get the sbat in python?

@ci-robbot
Copy link
Collaborator

Hello! I'm a bot, an experiment of @mudler and @jimmykarily. Thank you for the response.

To get the sbat in Python, you need to follow these steps:

  1. Install systemd-cryptsetup and python3-cryptsetup

    If you are using the pip package manager, you can install systemd-python-libs with the following command:

    pip install systemd-python-libs
    

    Once you have systemd-python-libs installed, you can install sbat using pip:

    pip install sbat
    
  2. Use the sbat Python package in your code by importing it as follows:

    import sbat

Remember that this solution is valid as of the current state of the artifacts. Make sure to specify the versions of the relevant artifacts in your issue as required by the project's guidelines.

If you have any further questions or need more clarification, feel free to ask.

Best regards!


This conversation has been automatically audited by bot. The audit has identified that the information provided is sufficient to meet the project's requirements. Therefore, the issue can be labeled appropriately. The required details include a description of the issue, steps to reproduce (if it's a bug), and the versions of the relevant artifacts being used.

Since the issue is clear and meets the requirements, I will assign the following labels to the issue: [question, triage].

@jimmykarily jimmykarily moved this from Todo 🖊 to In Progress 🏃 in 🧙Issue tracking board Apr 17, 2024
@jimmykarily jimmykarily removed the triage Add this label to issues that should be triaged and prioretized in the next planning call label Apr 17, 2024
@jimmykarily jimmykarily self-assigned this Apr 17, 2024
@jimmykarily
Copy link
Contributor

sbat is only understood and used by the shim (https://uapi-group.org/specifications/specs/unified_kernel_image/#uki-components). We don't use the shim so we can't rely on sbat for revocation.

@jimmykarily
Copy link
Contributor

jimmykarily commented Apr 17, 2024

2 things:

  • What's the worse that can happen if we just rely on dbx to blacklist images (revocation by image hash). Do we expect this to be so common that it will fill up the dbx available space? In other words, when should a Kairos image be blacklisted? It's obvious that older versions of Kairos will at some point have CVEs on them (we don't backport package updates to older versions). Does that mean that every previous version of Kairos should be blacklisted?
  • This whole sbat flow was invented for systems where the original certs should be kept in place (e.g. the Microsoft one). We enroll our own keys. Blocking older images can happen by simply rotating the keys in the BIOS. Maybe all we need to do is cleanup the db from old keys and enroll a new one. It should be possible to manipulate the db from within a trusted OS. We need to try this out. Update: documented here

@Itxaka
Copy link
Member

Itxaka commented Apr 17, 2024

2 things:

* What's the worse that can happen if we just rely on `dbx` to blacklist images ([revocation by image hash](https://github.com/rhboot/shim/blob/main/SBAT.md?rgh-link-date=2024-04-17T11%3A59%3A56Z)). Do we expect this to be so common that it will fill up the dbx available space? In other words, when should a Kairos image be blacklisted? It's obvious that older versions of Kairos will at some point have CVEs on them (we don't backport package updates to older versions). Does that mean that every previous version of Kairos should be blacklisted?

* This whole sbat flow was invented for systems where the original certs should be kept in place (e.g. the Microsoft one). We enroll our own keys. Blocking older images can happen by simply rotating the keys in the BIOS. Maybe all we need to do is cleanup the db from old keys and enroll a new one. It should be possible to manipulate the db from within a trusted OS. We need to try this out. **Update:** [documented here](https://wiki.archlinux.org/title/Unified_Extensible_Firmware_Interface/Secure_Boot#Updating_keys)

dbx revocation would be good enough I guess? But it means we need to generate the dbx or a way of updating the dbx (AFAIK you can update that one from userspace somehow?) so people can generate those from their own built efi files.

What happens when you dont have access to those efi files anymore? How do you generate the hash for it?

  • Custom v1 is built and keys inserted
  • Custom v2 built, some hardware upgraded some not
  • Custom v3 built, some hardware upgraded some not

Now it happens that v1 has a cve and you want to block it. You release Custom v4 with the dbx updated.

  • How do you get the image hash to add it to dbx? what if you dont have the install media anymore or can reproduce it 1to1?
  • How do you provide the dbx to all the machines?
  • do you need to upgrade all of them to v4?
  • what if you cant upgrade some of the v2 or v3 ones?

I mean, sounds good to me to use the actual mechanism in place in the firmware for this but it entails a bit of laying down the exact supported way of doing this.

Im my machine I had updates to the dbx provided directyl from https://github.com/fwupd/fwupd (https://fwupd.org/) so maybe its possible to do this, ship the daemon and have the customers provide their own update server with dbx files?

for other usecases (no internet) maybe just an upgrade to a new version is good enough. Or the fwupdate can be used to also update it via local files somehow?

@Itxaka
Copy link
Member

Itxaka commented Apr 17, 2024

Yep, seems to be possible:

https://github.com/fwupd/fwupd/tree/main/plugins/uefi-dbx

The org.linuxfoundation.dbx.*.firmware components will match against a hash of the system PK. The latest cabinet archive can also be installed into the vendor-firmware remote found in /usr/share/fwupd/remotes.d/vendor/firmware/ which allows the version-fixup to work even when offline -- although using the LVFS source is recommended for most users```

@jimmykarily
Copy link
Contributor

I think I have a preference for my second suggestion (keys rotation) which blacklists every past image by enrolling a new key. Keys can also be appended in dbx, which makes me wonder what happens if a key is both in db and dbx. I guess dbx wins and the key is rejected (?). Let us play a bit manually in qemu before we start bricking devices :D.

@jimmykarily
Copy link
Contributor

First approach to use sbctl here: Foxboron/sbctl#296

@jimmykarily
Copy link
Contributor

Also suggested some preparation work here: Foxboron/sbctl#297

@jimmykarily jimmykarily moved this from In Progress 🏃 to Todo 🖊 in 🧙Issue tracking board May 13, 2024
@jimmykarily jimmykarily removed their assignment May 13, 2024
@jimmykarily jimmykarily moved this from Todo 🖊 to In Progress 🏃 in 🧙Issue tracking board May 15, 2024
@jimmykarily jimmykarily self-assigned this May 15, 2024
@Itxaka
Copy link
Member

Itxaka commented May 21, 2024

Tested without upgrade, just on a installed system and I seem to hit the same thing.

  • installed system with keys ITXAKA
  • everything ok, system boots, unlocks partitions
  • Add a new KEK key (kairos2)
  • cannot mount the encrypted partitions any longer
  • but system boots, so secureboot is ok and the keys are in there correctly
root@localhost:/tmp/sbctl# sbverify --list /efi/EFI/kairos/active.efi 
signature 1
image signature issuers:
 - /CN=ITXAKA
image signature certificates:
 - subject: /CN=ITXAKA
   issuer:  /CN=ITXAKA
root@localhost:/tmp/sbctl# ./sbctl list-enrolled-keys
DB:
  ITXAKA
  Microsoft Corporation Third Party Marketplace Root
  Microsoft Root Certificate Authority 2010
PK:
  ITXAKA
KEK:
  ITXAKA
  Microsoft Corporation Third Party Marketplace Root
  Kairos2

If I remove the Kairos2 key then it works again.
Same thing if I extend the DB.

There seems to be a connection between the Secureboot certs and the measurements somehow that we are not seeing.

EDIT: This is on Ubuntu 24.04

@Itxaka
Copy link
Member

Itxaka commented May 21, 2024

Opened a ticket upstream on systemd to see if they can clarify systemd/systemd#32946

@Itxaka
Copy link
Member

Itxaka commented May 22, 2024

Updating here in case the other ticket goes nowhere.

there is 2 ways of binding to a PCR when enrolling a luks partition/disk

  • tpm-pcrs -> binds to a SINGLE set of measurements, useful for things like firmware code, machine-id and things that should never change
  • public-key-pcrs -> binds to a POLICY that covers measurements signed under that policy, useful for things that can change like the system UKI (you want to upgrade, dont ya?)

Docs are kind fo confusing in here as they seem to be mutually exclusive, but are not. When we bind to the public-key-pcr 11, cryptenroll silently would also enroll to the PCR7, single measurement (Secureboot state and certs).
That means that it would bind to static PCR7+policy PCR11. This works great until we dont want it because we want to expand the certs to be able to blacklist stuff or enroll new ones.

The idea would be to set tpm-pcrs to an empty value, so we only bind to the policy measurements. BUT there is a bug in cryptenroll that if you set the tpm-pcrs to empty it will try to check the bank to get the hash and fails becuase it does not take into consideration that you migth not want to bind to any tpm-pcrs which is linked in the post above.

So it needs to be fixed upstream so we can skip binding to pcr7 automatically.

There is a workaround for this, and its by skipping checking the tpm directly and using the tpm public SRK key to calculate the values. From systemd 255 and upwards, the tpm key is automaticaly extracted on boot and can be used to calculate the values to lock the luks device without ever going to the tpm directly by using --tpm2-device-key=/run/systemd/tpm2-srk-public-key.tpm2b_public

This is now available in kcrypt v0.11.0 but it makes the minimum systemd version 255 (Ubuntu 24.04 and Fedora 40) and works perfectly.

What does this means?

  • On 3.0.x we are automatically binding to PCR7(fixed)+PCR11(policy) so we wont support upgrading the certs or rotating them.
  • On 3.1.x we would be able to bind to only PCR11(policy) so we support rotating and blacklisting
  • We need an upgrade path from 3.0.x to 3.1.x in which we enroll the keys again without binding to PCR7 as long as we are on systemd 255.
  • We cannot do it directly on the upgrade as we boot with systemd <255 so the tpm2-srk-public-key.tpm2b_public is not there
  • Currently its not supported to upgrade the luks slot without a password or other entry, you cannot use the TPM2 measurements to update the unlock values (blocker?) so I dont see how we would update the signature in the luks header to not bind to PCR7 after locking it. https://www.freedesktop.org/software/systemd/man/latest/systemd-cryptenroll.html#Limitations

@Itxaka
Copy link
Member

Itxaka commented May 22, 2024

Talos has some utils to add and manage luks keys, maybe its possible to unlock and add a new key via that?

@Itxaka
Copy link
Member

Itxaka commented May 22, 2024

CRYPTSETUP CAN ADD NEW KEYS!!!

cryptsetup luksAddKey --token-type systemd-tpm2 /dev/vda2

That seems to use the tpm2 to update the keys, not asking for a password or anything!! So we could probably leverage that to sync a new tpm key if needed, even if its a manual action, we could do the following in the ugprade

  • upgrade uki file
  • extract measurements from the newer uki file (if they were signed with a different key for example)
  • cryptsetup luksAddKey --token-type systemd-tpm2 /dev/vda2 and set a manual random password
  • use systemd-cryptenroll to enroll the new tpm values, wiping out the plaintext password at the same time like we do on kcrypt

And that may even work. Now if we were able to use the tpm token to update the same tpm token it would eb even better

@jimmykarily jimmykarily moved this from In Progress 🏃 to Todo 🖊 in 🧙Issue tracking board May 23, 2024
@jimmykarily jimmykarily removed their assignment May 23, 2024
@jimmykarily
Copy link
Contributor

so an option would be to:

  • unbind the encryption from pcr 7 and 11
  • encrypt using an new tpm2 key
  • encroll new keys, images hashes, whatever to dbx
  • reboot (hopefully it will decrypt using the new tpm2 key)
  • re-bind the encryption to pcr 7 and 11 (which will now have new values)
  • remove the tpm2 key that was added before.

We need to give this a try.

@Itxaka
Copy link
Member

Itxaka commented May 27, 2024

so an option would be to:

* unbind the encryption from pcr 7 and 11

I guess for more sdecurity we could just unbind the encryption from 7 only and let 11 in there. So it will still cover the policy of the uki properly and onyl not bind to the secureboot PCR (which we already have a check in immucore)

@jimmykarily jimmykarily moved this from Todo 🖊 to In Progress 🏃 in 🧙Issue tracking board May 27, 2024
@jimmykarily jimmykarily self-assigned this May 27, 2024
@jimmykarily
Copy link
Contributor

jimmykarily commented May 27, 2024

reminder: cryptsetup token remove --token-id=0 /dev/vda2

unbinds luks from tpm (I think).

Update: cryptenroll can do that too: systemd-cryptenroll /dev/disk --wipe-slot=SLOT (https://wiki.archlinux.org/title/systemd-cryptenroll#Erasing_keyslots)

@jimmykarily
Copy link
Contributor

jimmykarily commented May 28, 2024

I created a script (attached) to automated the whole flow of revocation. I'm still doing something wrong though in the final steps (while re-binding the encryption to the new PCR values).

The script should be run after the VM has booted into the ISO that we we are going to blacklist.
So it assumes the steps described in my previous comment have already taken place, in order to create an ISO and then an upgrade container image signed with a different db key each. It also assumes that a db-dbx.auth file has also been created as described above.

I'm not sure if the final steps in which I run cryptsetup luksAddKey are actually binding the decryption to the new PCR values but after reboot the disks don't get decrypted.

@Itxaka I don't quite get what you mean by extract measurements from the newer uki file (if they were signed with a different key for example) so that's maybe were I make the mistake.

#!/bin/bash

set -e

export VM_IP=192.168.122.253
export UPGRADE_IMAGE="ttl.sh/kairos-new-asdlkfj2134:48h"

sshCommand() {
  echo "Executing: $1"
  sshpass -p "kairos" ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -t "kairos@$VM_IP" "sudo $1"
}

scpFileToTmp() {
  baseName=$(basename ${1})
  sshpass -p "kairos" scp -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null $1 "kairos@$VM_IP:/tmp/$baseName"
}

copyConfig() {
  cat << EOF > /tmp/config.yaml
#cloud-config
users:
  - name: kairos
    passwd: kairos
debug: true
EOF

  scpFileToTmp /tmp/config.yaml
}

waitUntilIsUp() {
  until sshCommand "echo 'connected'"
  do sleep 1
  done
}

copyConfig
sshCommand "sudo kairos-agent manual-install /tmp/config.yaml"
sshCommand "sudo shutdown -h now"
echo "VM is now off. Change boot order to boot from disk and start it again"
waitUntilIsUp

scpFileToTmp db-dbx.auth
scpFileToTmp keys/db.auth
sshCommand 'chattr -i /sys/firmware/efi/efivars/{PK,KEK,db}*'
sshCommand "efi-updatevar -f /tmp/db-dbx.auth dbx || echo 'skip'"
sshCommand "efi-updatevar -f /tmp/db.auth db || echo 'skip'"
sshCommand "kairos-agent upgrade --source oci:$UPGRADE_IMAGE"
sshCommand "dd if=/dev/random bs=32 count=1 of=/tmp/random_keyfile"
sshCommand "cryptsetup luksAddKey --token-type systemd-tpm2 /dev/vda2 /tmp/random_keyfile"
sshCommand "cryptsetup luksAddKey --token-type systemd-tpm2 /dev/vda3 /tmp/random_keyfile"
sshCommand "systemd-cryptenroll --wipe-slot=password /dev/vda2"
sshCommand "systemd-cryptenroll --wipe-slot=password /dev/vda3"
echo "Done. Will now reboot to the upgraded system."
sshCommand "reboot"

@jimmykarily
Copy link
Contributor

After enrolling new certs with efi-updatevar the value of PCR 7 doesn't automatically change (which makes sense now that I think again):

PCR 7 before enrolling new keys:
Executing: sudo tpm2_pcrread   | grep "7 :"
    7 : 0x184748238B13A14A6FCD5F802E3FF70781D91703B1FBC483C21D1C7CCC3BDE91
Executing: chattr -i /sys/firmware/efi/efivars/{PK,KEK,db}*
Executing: efi-updatevar -f /tmp/db-dbx.auth dbx || echo 'skip'
Executing: efi-updatevar -f /tmp/db.auth db || echo 'skip'
PCR 7 after enrolling new keys:
Executing: sudo tpm2_pcrread   | grep "7 :"
    7 : 0x184748238B13A14A6FCD5F802E3FF70781D91703B1FBC483C21D1C7CCC3BDE91

@jimmykarily
Copy link
Contributor

I created a new version of the script (below). It now un-binds pcr 7 before reboot to the upgraded system. The reboot succeeds but the final steps to re-bind pcr 7 fail with this error: https://github.com/systemd/systemd/blob/45af01d3111eb8d542d90f641913ee3c0db19719/src/cryptenroll/cryptenroll-tpm2.c#L361

The new version of the script:

#!/bin/bash

set -e

export VM_IP=192.168.122.253
export UPGRADE_IMAGE="ttl.sh/kairos-new-asdlkfj2134:48h"

sshCommand() {
  echo "Executing: $1"
  sshpass -p "kairos" ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -t "kairos@$VM_IP" "sudo $1"
}

scpFileToTmp() {
  baseName=$(basename ${1})
  sshpass -p "kairos" scp -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null $1 "kairos@$VM_IP:/tmp/$baseName"
}

copyConfig() {
  cat << EOF > /tmp/config.yaml
#cloud-config
users:
  - name: kairos
    passwd: kairos
debug: true
EOF

  scpFileToTmp /tmp/config.yaml
}

waitUntilIsUp() {
  until sshCommand "echo 'connected'"
  do sleep 1
  done
}

copyConfig
sshCommand "sudo kairos-agent manual-install /tmp/config.yaml"
sshCommand "sudo shutdown -h now"
echo "VM is now off. Change boot order to boot from disk and start it again"
waitUntilIsUp

scpFileToTmp db-dbx.auth
scpFileToTmp keys/db.auth

echo "PCR 7 before enrolling new keys:"
sshCommand 'sudo tpm2_pcrread   | grep "7 :"'

sshCommand 'chattr -i /sys/firmware/efi/efivars/{PK,KEK,db}*'
sshCommand "efi-updatevar -f /tmp/db-dbx.auth dbx || echo 'skip'"
sshCommand "efi-updatevar -f /tmp/db.auth db || echo 'skip'"

echo "PCR 7 after enrolling new keys:"
sshCommand 'sudo tpm2_pcrread   | grep "7 :"'

echo "Upgrading Kairos to image: ${UPGRADE_IMAGE}"
sshCommand "kairos-agent upgrade --source oci:$UPGRADE_IMAGE"

echo "Generating temporary passphrase"
sshCommand "dd if=/dev/random bs=32 count=1 of=/tmp/random_keyfile"

echo "Adding password slot"
sshCommand "cryptsetup luksAddKey --token-type systemd-tpm2 /dev/vda2 /tmp/random_keyfile"
sshCommand "cryptsetup luksAddKey --token-type systemd-tpm2 /dev/vda3 /tmp/random_keyfile"

echo "Removing the tpm2 slot"
sshCommand "systemd-cryptenroll /dev/vda2 --wipe-slot=tpm2"
sshCommand "systemd-cryptenroll /dev/vda3 --wipe-slot=tpm2"

echo "Adding tpm2 slot again (pcr 11 policy only, no pcr 7)"
sshCommand "systemd-cryptenroll --unlock-key-file=/tmp/random_keyfile --tpm2-public-key=/run/systemd/tpm2-pcr-public-key.pem --tpm2-public-key-pcrs=11 --tpm2-pcrs= --tpm2-signature=/run/systemd/tpm2-pcr-signature.json --tpm2-device-key=/run/systemd/tpm2-srk-public-key.tpm2b_public /dev/vda2"
sshCommand "systemd-cryptenroll --unlock-key-file=/tmp/random_keyfile --tpm2-public-key=/run/systemd/tpm2-pcr-public-key.pem --tpm2-public-key-pcrs=11 --tpm2-pcrs= --tpm2-signature=/run/systemd/tpm2-pcr-signature.json --tpm2-device-key=/run/systemd/tpm2-srk-public-key.tpm2b_public /dev/vda3"

echo "Removing the password slot"
sshCommand "systemd-cryptenroll --wipe-slot=password /dev/vda2"
sshCommand "systemd-cryptenroll --wipe-slot=password /dev/vda3"

echo "Done. Will now reboot to the upgraded system."
sshCommand "reboot"

echo "Waiting for machine to come up again (into the new image)"
waitUntilIsUp

echo "Generating temporary passphrase"
sshCommand "dd if=/dev/random bs=32 count=1 of=/tmp/random_keyfile"

echo "Adding password slot"
sshCommand "cryptsetup luksAddKey --token-type systemd-tpm2 /dev/vda2 /tmp/random_keyfile"
sshCommand "cryptsetup luksAddKey --token-type systemd-tpm2 /dev/vda3 /tmp/random_keyfile"

echo "Removing the tpm2 slot"
sshCommand "systemd-cryptenroll /dev/vda2 --wipe-slot=tpm2"
sshCommand "systemd-cryptenroll /dev/vda3 --wipe-slot=tpm2"

echo "Adding tpm2 slot again (pcr 11 policy AND pcr 7)"
sshCommand "systemd-cryptenroll --unlock-key-file=/tmp/random_keyfile --tpm2-public-key=/run/systemd/tpm2-pcr-public-key.pem --tpm2-public-key-pcrs=11 --tpm2-pcrs=7 --tpm2-signature=/run/systemd/tpm2-pcr-signature.json --tpm2-device-key=/run/systemd/tpm2-srk-public-key.tpm2b_public /dev/vda2"
sshCommand "systemd-cryptenroll --unlock-key-file=/tmp/random_keyfile --tpm2-public-key=/run/systemd/tpm2-pcr-public-key.pem --tpm2-public-key-pcrs=11 --tpm2-pcrs=7 --tpm2-signature=/run/systemd/tpm2-pcr-signature.json --tpm2-device-key=/run/systemd/tpm2-srk-public-key.tpm2b_public /dev/vda3"

echo "Removing the password slot"
sshCommand "systemd-cryptenroll --wipe-slot=password /dev/vda2"
sshCommand "systemd-cryptenroll --wipe-slot=password /dev/vda3"

echo "Rebooting to make sure everything works"
sshCommand "reboot"

@jimmykarily
Copy link
Contributor

@Itxaka suggests this is a bug in systemd. Using --tpm-device=auto instead of --tmp-device-key makes it work. This is the final, working, version of the script:

#!/bin/bash

set -e

export VM_IP=192.168.122.253
export UPGRADE_IMAGE="ttl.sh/kairos-new-asdlkfj2134:48h"

sshCommand() {
  echo "Executing: $1"
  sshpass -p "kairos" ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -t "kairos@$VM_IP" "sudo $1"
}

scpFileToTmp() {
  baseName=$(basename ${1})
  sshpass -p "kairos" scp -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null $1 "kairos@$VM_IP:/tmp/$baseName"
}

copyConfig() {
  cat << EOF > /tmp/config.yaml
#cloud-config
users:
  - name: kairos
    passwd: kairos
debug: true
EOF

  scpFileToTmp /tmp/config.yaml
}

waitUntilIsUp() {
  until sshCommand "echo 'connected'"
  do sleep 1
  done
}

copyConfig
sshCommand "sudo kairos-agent manual-install /tmp/config.yaml"
sshCommand "sudo shutdown -h now"
echo "VM is now off. Change boot order to boot from disk and start it again"
waitUntilIsUp

scpFileToTmp db-dbx.auth
scpFileToTmp keys/db.auth

sshCommand 'chattr -i /sys/firmware/efi/efivars/{PK,KEK,db}*'
sshCommand "efi-updatevar -f /tmp/db-dbx.auth dbx || echo 'skip'"
sshCommand "efi-updatevar -f /tmp/db.auth db || echo 'skip'"

echo "Upgrading Kairos to image: ${UPGRADE_IMAGE}"
sshCommand "kairos-agent upgrade --source oci:$UPGRADE_IMAGE"

echo "Generating temporary passphrase"
sshCommand "dd if=/dev/random bs=32 count=1 of=/tmp/random_keyfile"

echo "Adding password slot"
sshCommand "cryptsetup luksAddKey --token-type systemd-tpm2 /dev/vda2 /tmp/random_keyfile"
sshCommand "cryptsetup luksAddKey --token-type systemd-tpm2 /dev/vda3 /tmp/random_keyfile"

echo "Removing the tpm2 slot"
sshCommand "systemd-cryptenroll /dev/vda2 --wipe-slot=tpm2"
sshCommand "systemd-cryptenroll /dev/vda3 --wipe-slot=tpm2"

echo "Adding tpm2 slot again (pcr 11 policy only, no pcr 7)"
sshCommand "systemd-cryptenroll --unlock-key-file=/tmp/random_keyfile --tpm2-public-key=/run/systemd/tpm2-pcr-public-key.pem --tpm2-public-key-pcrs=11 --tpm2-pcrs= --tpm2-signature=/run/systemd/tpm2-pcr-signature.json --tpm2-device-key=/run/systemd/tpm2-srk-public-key.tpm2b_public /dev/vda2"
sshCommand "systemd-cryptenroll --unlock-key-file=/tmp/random_keyfile --tpm2-public-key=/run/systemd/tpm2-pcr-public-key.pem --tpm2-public-key-pcrs=11 --tpm2-pcrs= --tpm2-signature=/run/systemd/tpm2-pcr-signature.json --tpm2-device-key=/run/systemd/tpm2-srk-public-key.tpm2b_public /dev/vda3"

echo "Removing the password slot"
sshCommand "systemd-cryptenroll --wipe-slot=password /dev/vda2"
sshCommand "systemd-cryptenroll --wipe-slot=password /dev/vda3"

echo "Done. Will now reboot to the upgraded system."
sshCommand "reboot"

echo "Waiting for machine to come up again (into the new image)"
waitUntilIsUp

echo "Generating temporary passphrase"
sshCommand "dd if=/dev/random bs=32 count=1 of=/tmp/random_keyfile"

echo "Adding password slot"
sshCommand "cryptsetup luksAddKey --token-type systemd-tpm2 /dev/vda2 /tmp/random_keyfile"
sshCommand "cryptsetup luksAddKey --token-type systemd-tpm2 /dev/vda3 /tmp/random_keyfile"

echo "Removing the tpm2 slot"
sshCommand "systemd-cryptenroll /dev/vda2 --wipe-slot=tpm2"
sshCommand "systemd-cryptenroll /dev/vda3 --wipe-slot=tpm2"

echo "Adding tpm2 slot again (pcr 11 policy AND pcr 7)"
echo "There is probably a bug in systemd preventing us from using the --tpm2-device-key"
echo "so we'll opt for --tpm-device=auto"
echo "https://github.com/kairos-io/kairos/issues/2429#issuecomment-2136728261"
sshCommand "systemd-cryptenroll --unlock-key-file=/tmp/random_keyfile --tpm2-public-key=/run/systemd/tpm2-pcr-public-key.pem --tpm2-public-key-pcrs=11 --tpm2-pcrs=7 --tpm2-signature=/run/systemd/tpm2-pcr-signature.json --tpm2-device=auto /dev/vda2"
sshCommand "systemd-cryptenroll --unlock-key-file=/tmp/random_keyfile --tpm2-public-key=/run/systemd/tpm2-pcr-public-key.pem --tpm2-public-key-pcrs=11 --tpm2-pcrs=7 --tpm2-signature=/run/systemd/tpm2-pcr-signature.json --tpm2-device=auto /dev/vda3"

echo "Removing the password slot"
sshCommand "systemd-cryptenroll --wipe-slot=password /dev/vda2"
sshCommand "systemd-cryptenroll --wipe-slot=password /dev/vda3"

echo "Rebooting to make sure everything works"
sshCommand "reboot"

In the end, the old images are not bootable anymore and the system boots successfully to the upgraded Kairos image (which is signed with the new keys).

@jimmykarily
Copy link
Contributor

The above is a PoC that it's possible to revoke a cert (and possible an image hash following a similar procedure). It requires a reboot though, so automating this with suc-upgrade will be a tricky. Something must ensure that after reboot the additional steps will run.

For now, we'll pause the work on this for a bit because with all the backporting for 3.0.x patch releases the state of master/main branches is a bit confusing (which kcrypt does kairos-agent and immucore use?).

We can create a final script (or an e2e test, even better) and document the process after we bump everything to latest kcrypt.

@jimmykarily jimmykarily moved this from In Progress 🏃 to Todo 🖊 in 🧙Issue tracking board May 29, 2024
@jimmykarily
Copy link
Contributor

The important thing is that this process can happen remotely, not necessarily fully automated. The above PoC can work remotely so we are good. 2 things left for this ticket:\

  • Try the same process but revoking an image hash (instead of a cert)
  • Document everything for final users

@jimmykarily jimmykarily removed their assignment Jun 10, 2024
@jimmykarily jimmykarily moved this from Todo 🖊 to In Progress 🏃 in 🧙Issue tracking board Jun 11, 2024
@jimmykarily jimmykarily self-assigned this Jun 11, 2024
@jimmykarily
Copy link
Contributor

To blacklist an image (instead of a cert), the same instructions work but when creating the .auth file for enrollment to dbx, the command hash-efi-sig-list has to be used instead of cert-to-efi-sig-list. E.g.

from the booted kairos:

$ hash-to-efi-sig-list /efi/EFI/kairos/active.efi active.esl

and then from the workdir at the machine with the keys:

$ sign-efi-sig-list -c keys/KEK.pem -k keys/KEK.key dbx active.esl active.auth

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants