Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Docker Images and readme provided as part of confidential-ml example #356

Open
ajay-fuji opened this issue Jul 29, 2024 · 9 comments
Assignees

Comments

@ajay-fuji
Copy link

ajay-fuji commented Jul 29, 2024

Hi,

Docker image provided with examples/confidential-ml/code_model.md example is outdated as per latest code (main branch).

So the steps mentioned in code_model.md to run device on top of arm FVP does not work.

Please provide updated images and update link in document as well.

Thanks!

@ajay-fuji ajay-fuji changed the title Update Docker Images provided as part of confidential-ml example Update Docker Images and readme provided as part of confidential-ml example Jul 29, 2024
@ajay-fuji
Copy link
Author

In the documentation examples/confidential-ml/code_model.md, in How to test with Islet section, it seems like launching of ARM FVP should come before running three instances on host PC.

Once FVP is started with tap network, then a new network interface is created with 193.168.10.15 IP. After that only certifier-service, runtime and model-provider can be run with given commands. After that commands following terminal 4 can be run.

Command to launch terminal 4 can also be added once FVP is running -

  • Connect to FVP main terminal -
    telnet localhost 5000
  • Connect to FVP RMM terminal -
    telnet localhost 50003

Please suggest for possible correction.

@jinbpark
Copy link
Collaborator

First of all, thank you for trying out ISLET-! you can find answers below.

Docker image provided with examples/confidential-ml/code_model.md example is outdated as per latest code

It's true and I'm aware of that. As you said, it needs to be updated accordingly. I'll check it out and update it.
You can use How to test with simulated enclave (no actual hardware TEE) on x86_64 in the meantime (if this no-actual-TEE setup suffices for what you want to do).

Once FVP is started with tap network, then a new network interface is created with 193.168.10.15 IP.

It also seems to have to do with "outdated" example codes and instructinos. I'll check it out as well.

@ajay-fuji
Copy link
Author

Thanks @jinbpark for checking this out.
We want to run use case with islet on FVP. We tried changing scripts as per latest code and run but still getting some issues with device running on FVP.

image

Any temporary patch for this will be helpful until documentation and docker image is fixed from your end.

Thanks!

@jinbpark
Copy link
Collaborator

We tried changing scripts as per latest code and run but still getting some issues with device running on FVP.

Could you write down a more detailed context about your changes? It might help.

@jinbpark jinbpark self-assigned this Jul 30, 2024
@ajay-fuji
Copy link
Author

Files changed -

certifier-service
     |- run.sh  #-> make HOST as input argument instead of using 0.0.0.0
runtime
     |- init.sh  # Change with latest code from main branch
     |- run.sh # Change with latest code from main branch 
model-provider
     |- init.sh #Change with latest code from main branch
     |- run.sh  # Change with latest code from main branch

Steps to run example -

  • Start FVP with linux
  • Start certifier service
  • Start runtime and model-provider terminals
  • Login into FVP using telnet localhost 5000
  • Execute steps as given in FVP step -
    • Update set-realm-ip.sh with IP range from virtual network interface created by FVP. Can be checked using ip addr command.
    • After this when executing init_aarch64.sh and run_aarch64.sh getting above shared error regarding glibc library.

@ajay-fuji
Copy link
Author

Do we have any idea, how to resolve the GLIBC issue?

@ajay-fuji
Copy link
Author

ajay-fuji commented Aug 20, 2024

Thanks @jinbpark for checking this out. We want to run use case with islet on FVP. We tried changing scripts as per latest code and run but still getting some issues with device running on FVP.

image

Any temporary patch for this will be helpful until documentation and docker image is fixed from your end.

Thanks!

We could not reproduce this exact error.
But few things to notice is that,

  • eth0 is not default network interface in cloud instances, so we were not able to connect to certifier service from Device (realm on FVP) using ./init_aarch64 193.168.10.15 script.
  • When we created a network interface using ip link add eth0 type veth command, then we were able to connect to certifier service.
  • Still certification fails error we could see but seems like this is because of realm authenticity issue, not realm to certifier service network issue.

PS: When we start FVP, machine internet goes down. Anyone has experienced this earlier? Any idea how to fix this?

@jinbpark
Copy link
Collaborator

PS: When we start FVP, machine internet goes down. Anyone has experienced this earlier? Any idea how to fix this?

Could you try commenting out line-41/line-42 of configure_tap.sh?

Do we have any idea, how to resolve the GLIBC issue?

Sorry about the late response. I don't have enough time to dig into this issue, until the start of this September. I'll do look at this issue after that point (maybe in the middle of this September?). In the meanwhile, you can build the tensorflow lite library on your own if you really need the tensorflow capability.

@ajay-fuji
Copy link
Author

Hi @jinbpark,

Thanks for suggesting the solution. Although we were able to find out this solution.
Also since that glibc issue is not reproducible, so you can skip that part for now.

Currently we are not able to run ./init_aarch64.sh and ./run_aarch64.sh with below error -
image

Error for run_aarch64.sh -

# ./run_aarch64.sh 193.168.10.15 8125 code 0 -1 -1 193.168.20.10
Mon Sep 18 00:00:00 UTC 2023
ln: /lib/libtensorflowlite.so: File exists
ln: /lib/libtensorflowlite_flex.so: File exists
running as client
load_client_certs_and_key: can't translate der to X509
init_client_ssl: load_client_certs_and_key failed
Can't init client app

Here also same time-voilation error logs could be seen in certifier-service terminal.

If you help up navigate through this, it would be grateful.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants