Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable tests in SGX HW-mode (locally) #463

Closed

Conversation

bvavala
Copy link
Member

@bvavala bvavala commented Jan 19, 2024

This PR enables tests in SGX HW-mode.

Some important points and things to be discussed:

  1. there appears to be some redundancy between (a) docker build and (b) docker-compose build in the docker Makefile. When testing in HW mode, it's not clear which build is used and there is discrepancy in using args (e.g., (a) does not pass the SGX_MODE variable). Ideally we may want to consolidate the build of the images and just use compose for running tests.

  2. this PR passes the SGX_MODE variable to all the builds (particularly those that use it). This is necessary when we build the services, and it's also nice to have since we set the relative env var in the docker file, the env var persists in the image, so later the variable can be used to distinguish between SIM/HW-mode images.

  3. environment.sh sets PDO_SGX_KEY_ROOT to a folder within the XFER folder. However, the XFER is not available during a docker build (it's docker compose that later mounts that). This is a problem during the service build, because PDO uses that folder to store the enclave signing key. Since the key is ephemeral for now, this PR moves the enclave signing key to /tmp/.

  4. The policy registration script is called in one of two different places, depending on SGX_MODE. Namely: register-with-ledger.sh in HW-MODE, and start_ccf_network.sh in SIM-mode. First, these calls should be consolidated in a single place (probably the former). Second, start_ccf_network.sh and (in particular) the CCF images only need the SGX_MODE variable to trigger this script.

  5. The policy registration script looks for the CCF keys in multiple places. We should discuss whether the XFER folder is the only reasonable place where these could be, and simplify the rest.

  6. Among the global user-facing environment variables, PDO defines PDO_LEDGER_URL. However, in multiple places now PDO is using a combination of PDO_LEDGER_ADDRESS and (hard-coded) port -- because this is what CCF needs. We should discuss whether address and port should be promoted to global envs, or rather be derived from the ledger url.

Signed-off-by: Bruno Vavala <[email protected]>
@cmickeyb
Copy link
Contributor

  1. there appears to be some redundancy between (a) docker build and (b) docker-compose build in the docker Makefile. When testing in HW mode, it's not clear which build is used and there is discrepancy in using args (e.g., (a) does not pass the SGX_MODE variable). Ideally we may want to consolidate the build of the images and just use compose for running tests.

dockerfile have the recipes. docker compose uses those recipes to build the images. the images are rebuilt with docker compose because we change the build variables in the compose files. docker compose is (currently) used exclusively for test. the compose files are not general purpose. every installation will likely customize.

RE: HW mode... it SHOULD be set in the base and carried through the composition. its docker "compose" so the ultimate set of variables is the composition. so unless you believe there is a reason (with evidence) for why the variable is not propogated i would not add more complexity.

@cmickeyb
Copy link
Contributor

3. environment.sh sets PDO_SGX_KEY_ROOT to a folder within the XFER folder. However, the XFER is not available during a docker build (it's docker compose that later mounts that). This is a problem during the service build, because PDO uses that folder to store the enclave signing key. Since the key is ephemeral for now, this PR moves the enclave signing key to /tmp/.

This is not true. Suggest you read through the scripts. the contents of xfer are copied through the build process into the directories where the files are ultimately used. you CANNOT change it afterwards without re-running the build scripts. This behavior is completely intentional.

4. The policy registration script is called in one of two different places, depending on SGX_MODE. Namely: register-with-ledger.sh in HW-MODE, and start_ccf_network.sh in SIM-mode. First, these calls should be consolidated in a single place (probably the former). Second, start_ccf_network.sh and (in particular) the CCF images only need the SGX_MODE variable to trigger this script.

this has been an outstanding issue for you and @prakash need to resolve. I would strongly suggest pulling that into a separate PR and not muddy the rest of the PR with that particular issue. To be clear... my preference is that CCF NOT register and the registration is explicit. BUT... that is an issue for you to resolve (outside the scope of this PR).

@cmickeyb
Copy link
Contributor

5. The policy registration script looks for the CCF keys in multiple places. We should discuss whether the XFER folder is the only reasonable place where these could be, and simplify the rest.

No. The build script should copy these into another location that is consistent. Again, the particular location is an issue you & @prakashngit were supposed to resolve long ago.

@cmickeyb
Copy link
Contributor

6. mong the global user-facing environment variables, PDO defines PDO_LEDGER_URL. However, in multiple places now PDO is using a combination of PDO_LEDGER_ADDRESS and (hard-coded) port -- because this is what CCF needs. We should discuss whether address and port should be promoted to global envs, or rather be derived from the ledger url.

again. this is not true. PDO_LEDGER_ADDRESS is NOT used outside the build scripts. It is used as a means to ensure that the ledger address that is used is the IP address. we explicitly set this because there are so many errors right now with the construction of the ledger url.

Copy link
Contributor

@cmickeyb cmickeyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will provide specific comments later. But I would like this split into at least two parts... The first should resolve the registration issue. The second should make docker-compose test work. The third should enable more general use of SGX mode in the services container.

@bvavala
Copy link
Member Author

bvavala commented Jan 19, 2024

  1. environment.sh sets PDO_SGX_KEY_ROOT to a folder within the XFER folder. However, the XFER is not available during a docker build (it's docker compose that later mounts that). This is a problem during the service build, because PDO uses that folder to store the enclave signing key. Since the key is ephemeral for now, this PR moves the enclave signing key to /tmp/.

This is not true. Suggest you read through the scripts. the contents of xfer are copied through the build process into the directories where the files are ultimately used. you CANNOT change it afterwards without re-running the build scripts. This behavior is completely intentional.

Not sure what is not true here.
The environment variables are set as follows: if [ ${SGX_MODE} == "HW" ]; then export PDO_SGX_KEY_ROOT=${XFER_DIR}/services/keys/sgx; export PDO_ENCLAVE_CODE_SIGN_PEM=${PDO_SGX_KEY_ROOT}/enclave_code_sign.pem. The enclave signing key is generated through openssl genrsa -3 -out ${PDO_ENCLAVE_CODE_SIGN_PEM} 3072. This fails because the folder does not exist.

  1. The policy registration script is called in one of two different places, depending on SGX_MODE. Namely: register-with-ledger.sh in HW-MODE, and start_ccf_network.sh in SIM-mode. First, these calls should be consolidated in a single place (probably the former). Second, start_ccf_network.sh and (in particular) the CCF images only need the SGX_MODE variable to trigger this script.

this has been an outstanding issue for you and @prakash need to resolve. I would strongly suggest pulling that into a separate PR and not muddy the rest of the PR with that particular issue. To be clear... my preference is that CCF NOT register and the registration is explicit. BUT... that is an issue for you to resolve (outside the scope of this PR).

I agree: separate PR, explicit registration and ... not by CCF. The registration is an application/contract level procedure, it does not have anything to do with the underlying platform (CCF).

@bvavala
Copy link
Member Author

bvavala commented Jan 19, 2024

  1. mong the global user-facing environment variables, PDO defines PDO_LEDGER_URL. However, in multiple places now PDO is using a combination of PDO_LEDGER_ADDRESS and (hard-coded) port -- because this is what CCF needs. We should discuss whether address and port should be promoted to global envs, or rather be derived from the ledger url.

again. this is not true. PDO_LEDGER_ADDRESS is NOT used outside the build scripts. It is used as a means to ensure that the ledger address that is used is the IP address. we explicitly set this because there are so many errors right now with the construction of the ledger url.

No, I have just double checked: PDO_LEDGER_ADDRESS is used in 3 start_*.sh scripts and 3 run_*_test.sh scripts.
In all of them, the flow is the following: PDO_HOSTNAME --used-to-set--> PDO_LEDGER_ADDRESS --used-to-set--> PDO_LEDGER_URL. Two problems here: (1) in setting the url, we hard-code the port; (2) the url is exposed/described in environment.md but here we are deriving it (ok for the tests, but it does not seem right for the rest).

@g2flyer g2flyer self-requested a review January 19, 2024 23:54
@cmickeyb
Copy link
Contributor

I agree: separate PR, explicit registration and ... not by CCF. The registration is an application/contract level procedure, it does not have anything to do with the underlying platform (CCF).

more than that... it has nothing to do with a running service. either ccf OR the pdo services.

@cmickeyb
Copy link
Contributor

cmickeyb commented Jan 21, 2024

Not sure what is not true here.
The environment variables are set as follows: if [ ${SGX_MODE} == "HW" ]; then export PDO_SGX_KEY_ROOT=${XFER_DIR}/services/keys/sgx; export PDO_ENCLAVE_CODE_SIGN_PEM=${PDO_SGX_KEY_ROOT}/enclave_code_sign.pem. The enclave signing key is generated through openssl genrsa -3 -out ${PDO_ENCLAVE_CODE_SIGN_PEM} 3072. This fails because the folder does not exist.

I was pretty explicit with the docker PR that HW mode was NOT supported. PDO_SGX_KEY_ROOT should be inferred. As should PDO_ENCLAVE_CODE_SIGN. (Meaning that the paths can be hardcoded like everything else and the user has to adjust to our container structure or just go build their own.) And... if you recall the comments in the PR specifically said that these issues needed to be addressed.

And environment.sh should NEVER be modified by a user. The only variables that matter are the ones that are listed as ARGS in the *.dockerfiles. Those are the only ones you should set. Everything in environment.sh is INTERNAL to the configuration. We get to decide the structure of the container.

@cmickeyb
Copy link
Contributor

  1. mong the global user-facing environment variables, PDO defines PDO_LEDGER_URL. However, in multiple places now PDO is using a combination of PDO_LEDGER_ADDRESS and (hard-coded) port -- because this is what CCF needs. We should discuss whether address and port should be promoted to global envs, or rather be derived from the ledger url.

again. this is not true. PDO_LEDGER_ADDRESS is NOT used outside the build scripts. It is used as a means to ensure that the ledger address that is used is the IP address. we explicitly set this because there are so many errors right now with the construction of the ledger url.

No, I have just double checked: PDO_LEDGER_ADDRESS is used in 3 start_*.sh scripts and 3 run_*_test.sh scripts. In all of them, the flow is the following: PDO_HOSTNAME --used-to-set--> PDO_LEDGER_ADDRESS --used-to-set--> PDO_LEDGER_URL. Two problems here: (1) in setting the url, we hard-code the port; (2) the url is exposed/described in environment.md but here we are deriving it (ok for the tests, but it does not seem right for the rest).

Of course. but again those are all internal uses. You do not build a services container with the ledger address preconfigured. It is indedendent of where the ledger is running. Likewise, you do not build a CCF container with the configuration predetermined.

The exception to that rule is the test container where WE decide where everything goes. The point is... all of those are completely hidden from EVERY user.

@g2flyer
Copy link
Contributor

g2flyer commented Jan 22, 2024

  1. there appears to be some redundancy between (a) docker build and (b) docker-compose build in the docker Makefile. When testing in HW mode, it's not clear which build is used and there is discrepancy in using args (e.g., (a) does not pass the SGX_MODE variable). Ideally we may want to consolidate the build of the images and just use compose for running tests.

Consistent build is addressed in PR#462. For second part, in addition to persisting SGX_MODE (which i think currently happens but might be overridden at runtime?) part i would suggest to revert to the conventions before the docker(-compose) revamp which included crucial build options such as HW-mode into image & container name. BTW: that old version also had test in HW mode which is currently not really supported anymore due to devices not being passed?

@g2flyer
Copy link
Contributor

g2flyer commented Jan 22, 2024

2. this PR passes the SGX_MODE variable to all the builds (particularly those that use it). This is necessary when we build the services, and it's also nice to have since we set the relative env var in the docker file, the env var persists in the image, so later the variable can be used to distinguish between SIM/HW-mode images.

See above ..

@g2flyer
Copy link
Contributor

g2flyer commented Jan 23, 2024

@cmickeyb Mic, might be worth to discuss F2F, say in TDS meeting rather than here in PR? Seems a number of approaches/solutions might be discussed in person and i think it's also useful (be it only to get me update :-)) what scenarios (and how) we all want to support with docker(-compose). The obvious ones are local development, testing (local & CI/CD, for obvious reasons also in both SGX-modes) but less clear to me is to what extent we target "real(istic) deployments" and how that would look best (e.g., some docker registry somewhere, service admin scripts along lines we have but how to orchestrate whole network is a bit less clear; presumably would need that also with variants w/w.o sgx and maybe w/w.o debug and/or different wawacka version, likely relying on image naming conventions?)

@cmickeyb
Copy link
Contributor

  1. there appears to be some redundancy between (a) docker build and (b) docker-compose build in the docker Makefile. When testing in HW mode, it's not clear which build is used and there is discrepancy in using args (e.g., (a) does not pass the SGX_MODE variable). Ideally we may want to consolidate the build of the images and just use compose for running tests.

Consistent build is addressed in PR#462. For second part, in addition to persisting SGX_MODE (which i think currently happens but might be overridden at runtime?) part i would suggest to revert to the conventions before the docker(-compose) revamp which included crucial build options such as HW-mode into image & container name. BTW: that old version also had test in HW mode which is currently not really supported anymore due to devices not being passed?

Let's be clear... docker-compose in our repository has ONE purpose: run the test scripts on a completely local network. That's it. And that's all that should ever be in our repository. The dockerfiles are complicated and generic. But how you use those to configure a service is completely unique to individuals. If you want more generic versions, put examples in the readme file.

Simplifying the compose files would be great... but i couldn't get the healthchecks to work without leveraging the "compose" part of docker-compose.

Second point on this... we should completely remove the build sections from the compose files and only build images directly from the docker files. Replace all of the build sections with a specific reference to an image. And... add the HW mode to the tagging of the images.

@cmickeyb
Copy link
Contributor

No, I have just double checked: PDO_LEDGER_ADDRESS is used in 3 start_*.sh scripts and 3 run_*_test.sh scripts.
In all of them, the flow is the following: PDO_HOSTNAME --used-to-set--> PDO_LEDGER_ADDRESS --used-to-set--> PDO_LEDGER_URL. Two problems here: (1) in setting the url, we hard-code the port; (2) the url is exposed/described in environment.md but here we are deriving it (ok for the tests, but it does not seem right for the rest).

the start scripts have nothing to do with the build. ledger_address is a runtime only variable. as is documented in the scripts, all of the start_* are for running services in a container that has already been built. the documentation in most of the scripts (and the readme) isn't perfect but it would address most of your questions.

@bvavala bvavala marked this pull request as draft February 2, 2024 16:13
@cmickeyb
Copy link
Contributor

Can we close this & reopen a new PR that takes advantage of the registration scripts from CCF?

@bvavala
Copy link
Member Author

bvavala commented Feb 27, 2024

Yes, at this point this has to be rebased on several others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants