Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pRuntime: Manually encrypted worker privkey and gk master key #1370

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

kvinwang
Copy link
Collaborator

After a system reboot, some workers are no longer able to load their private keys. Upon investigation, we discovered that the CPU_SVN may change due to kernel updates or CPU firmware upgrades, which consequently breaks the Gramine encrypted file system.

This PR resolves the issue by abandoning the Gramine file system. Instead, we manually encrypt the keys using the MRENCLAVE sealing key. The CPU_SVN and ISV_SVN values are saved in the front of the encrypted file, allowing for the retrieval of the sealing key even after a firmware upgrade, using the previously stored CPU_SVN.

Tested:

Saving Environment Loading Environment Can load sealed data?
kernel 5.15.0-79 no changes
kernel 5.15.0-71 kernel 5.15.0-79
kernel 5.15.0-79 kernel 5.15.0-71
pruntime with hash A pruntime with hash B

@h4x3rotab @shelvenzhou This PR touches the critical part. Please review carefully.

@wowvwow
Copy link

wowvwow commented Aug 31, 2023

目前有没有临时的办法去恢复正常使用?

@kvinwang
Copy link
Collaborator Author

目前有没有临时的办法去恢复正常使用?

很遗憾没有。即便这个PR上线也只是让以后不发生这个故障,我们也无法恢复已经坏掉的worker。
将linux内核回退到之前的版本有机会恢复,但也有人回退也不能恢复。

@zozyo
Copy link

zozyo commented Aug 31, 2023

目前有没有临时的办法去恢复正常使用?

很遗憾没有。即便这个PR上线也只是让以后不发生这个故障,我们也无法恢复已经坏掉的worker。 将linux内核回退到之前的版本有机会恢复,但也有人回退也不能恢复。

重启就会坏掉的情况下,暂时未重启时,使用这个PR上线后的版本后,再重启可以不坏吗?

@kvinwang
Copy link
Collaborator Author

重启就会坏掉的情况下,暂时未重启时,使用这个PR上线后的版本后,再重启可以不坏吗?

目前我们观察到的坏掉的原因是CPU固件升级后cpu_svn变化,导致gramine的加密文件系统从CPU取得的密钥发生变化。
这个PR解决的问题是cpu_svn变化引起密钥变化的情况。如果是其它原因引起密钥变化,比如intel升级microcode后derive密钥的算法发生变化,则此PR不能修复。

您说的重启就坏掉我们目前暂未清楚具体原因,当然大多数情况是cpu_svn变化引起的,也就是一般情况此PR能解决。

@wowvwow
Copy link

wowvwow commented Aug 31, 2023

已全部恢复

@nanometerzhu
Copy link
Contributor

nanometerzhu commented Sep 1, 2023

@wowvwow @zozyo 降级 intel-microcode 可以恢复,很快会告知社区具体指南

@wowvwow
Copy link

wowvwow commented Sep 1, 2023

是的,我通过降级到更新前的微码,全部恢复成功,那么后面还需要更新到最近的pruntime吗,有没有发布时间

@zozyo
Copy link

zozyo commented Sep 1, 2023

降级只是临时方案,总还是得升

@nanometerzhu
Copy link
Contributor

这个改动涉及基础的 key 操作,比较重要,review 和测试估计会多花一些时间。先 workaround 保证挖矿能正常运行吧,我们弄完了尽快 Update

@kvinwang kvinwang marked this pull request as draft October 27, 2023 06:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants