You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think that the installation instructions for the FedML framework on the official website maybe need to be updated
Initially, I followed the tutorial on the official website to download the FedML framework. docker pull fedml/fedml:latest-nvidia-jetson-l4t-ml-r35.1.0-py3
The version of FedML downloaded from this mirror source was 0.7.6, but I don't remember the specific version.This version can run some official examples well, such as:(fedcv)
However, an error occurred when I ran my own federated learning training for anomaly detection in a computer vision surveillance scenario.I sought help from the community, and someone advised me to update my version of FedML as mine was too old. So, I used the command pip install -U fedml to update the version of FedML.
After updating the version, when I ran my code again, I encountered a new error related to the architecture of numpy.This is the log of my project.
This image shows the numpy issue that occurred when my Jetson device was connected to MLOps.
The error I discovered
After updating to FedML version 0.8.17, I checked the supported versions of the related libraries and found that both the torch library and the torchvision library versions are incompatible.
The version of the mirror source pulled from the FedML official website is r35.1.0-py3.
For fedml==0.8.17, the corresponding versions of torch and torchvision are:
"torch>=1.13.1",
"torchvision>=0.14.1",
I believe that the numpy error is likely caused by the versions of torch and torchvision within the mirror source.
My Solution
First, I re-flashed my NVIDIA JETSON device with the latest JetPack version, JetPack 5.1.2.Then, I attempted to use docker pull fedml/fedml:latest-nvidia-jetson-l4t-ml-r36.2.0-py3to pull the latest mirror source and download FedML.But I was unable to download using this command line.
I first pulled the latest version of the mirror source using the command sudo docker pull nvcr.io/nvidia/l4t-ml:r36.2.0-py3
Then, I opened an interactive container by running the command sudo docker run -it nvcr.io/nvidia/l4t-ml:r36.2.0-py3 /bin/bash
downloaded the latest version of the FedML framework by using the command pip install fedml==0.8.17
updated the contents of the container to create a new image:
Open a new terminal and enter the following command: sudo docker ps
Copy the ID of the container and enter the following command: sudo docker commit <container_id> <image_new_name>
To enter the newly saved image, for example, if the name of the new image is "fedml"
Please enter the following command: sudo docker run -t -i --runtime nvidia fedml /bin/bash
Then connect to MLOps using the command: fedml login $account_id
If You encounter an issue where "/bin/sh: python: not found" error appears:
ls -l /usr/bin/python*
ln -s /usr/bin/python3 /usr/bin/python
This is because the path /usr/bin/python has been overwritten by Python 2 and Python 3.
Subsequent issues encountered
After updating the environment, I did not encounter any further issues with numpy. But I encountered a new problem.
I think that the installation instructions for the FedML framework on the official website maybe need to be updated
Initially, I followed the tutorial on the official website to download the FedML framework.
docker pull fedml/fedml:latest-nvidia-jetson-l4t-ml-r35.1.0-py3
The version of FedML downloaded from this mirror source was 0.7.6, but I don't remember the specific version.This version can run some official examples well, such as:(fedcv)
However, an error occurred when I ran my own federated learning training for anomaly detection in a computer vision surveillance scenario.I sought help from the community, and someone advised me to update my version of FedML as mine was too old. So, I used the command
pip install -U fedml
to update the version of FedML.After updating the version, when I ran my code again, I encountered a new error related to the architecture of numpy.This is the log of my project.
The error I discovered
After updating to FedML version 0.8.17, I checked the supported versions of the related libraries and found that both the torch library and the torchvision library versions are incompatible.
The version of the mirror source pulled from the FedML official website is r35.1.0-py3.
Here are the corresponding versions of the libraries for the r35.1.0-py3 mirror source:
For
fedml==0.8.17
, the corresponding versions of torch and torchvision are:I believe that the numpy error is likely caused by the versions of torch and torchvision within the mirror source.
My Solution
First, I re-flashed my NVIDIA JETSON device with the latest JetPack version, JetPack 5.1.2.Then, I attempted to use
docker pull fedml/fedml:latest-nvidia-jetson-l4t-ml-r36.2.0-py3
to pull the latest mirror source and download FedML.But I was unable to download using this command line.sudo docker pull nvcr.io/nvidia/l4t-ml:r36.2.0-py3
sudo docker run -it nvcr.io/nvidia/l4t-ml:r36.2.0-py3 /bin/bash
pip install fedml==0.8.17
updated the contents of the container to create a new image:
sudo docker ps
sudo docker commit <container_id> <image_new_name>
To enter the newly saved image, for example, if the name of the new image is "fedml"
sudo docker run -t -i --runtime nvidia fedml /bin/bash
fedml login $account_id
This is because the path /usr/bin/python has been overwritten by Python 2 and Python 3.
Subsequent issues encountered
After updating the environment, I did not encounter any further issues with numpy. But I encountered a new problem.
My own example.
the official example
The text was updated successfully, but these errors were encountered: