Skip to content

Latest commit

 

History

History
213 lines (167 loc) · 6.75 KB

README.md

File metadata and controls

213 lines (167 loc) · 6.75 KB

SENECA: Push Efficient Medical Semantic Segmentation to the Edge

Setup

  • Open a command prompt and execute:
    git clone https://github.com/Xilinx/Vitis-AI.git
    cd Vitis-AI
    git checkout 1.4.1
  • Follow the Vitis-AI installation process here
    • Once the installation is completed open a terminal in the Vitis-AI directory and execute:
    git clone https://github.com/RaffaeleBerzoini/SENECA.git
    ./docker_run.sh xilinx/vitis-ai-gpu:latest

The working directory should look similar to:

SENECA   # your WRK_DIR
.
├── application
├── build
├── charts
├── preprocessing
    ├── extract_slices.py
    └── prepare_dataset.sh
├── results
├── ...
└── .py files
  • Download the dataset
    • Data wil be downloaded in a folder named OrganSegmentations. If not rename it
    • Move the OrganSegmentations folder in WRK_DIR/preprocessing. Now the workspace should look like:
SENECA   # your WRK_DIR
.
├── application
├── preprocessing
    ├── OrganSegmentations
        ├── labels-0.nii.gz
        ├── ...
        └── volume-139.nii.gz
    ├── extract_slices.py
    └── prepare_dataset.sh
├── ...
└── .py files
  • In the command prompt execute:
      Vitis-AI /workspace > conda activate vitis-ai-tensorflow2
      (vitis-ai-tensorflow2) Vitis-AI /workspace > cd SENECA
      (vitis-ai-tensorflow2) Vitis-AI /workspace/SENECA > pip install -r requirements.txt
      (vitis-ai-tensorflow2) Vitis-AI /workspace/SENECA > cd preprocessing
      (vitis-ai-tensorflow2) Vitis-AI /workspace/SENECA/preprocessing > sh prepare_dataset_sh
      (vitis-ai-tensorflow2) Vitis-AI /workspace/SENECA/preprocessing > cd ..
  • Wait for the slice extraction. This could take several minutes

Training

Now you should be in the WRK_DIR with the following setup:

SENECA   # your WRK_DIR
.
├── ...
├── build
    ├── dataset
        ├── input
        └── target
├── ...
└── .py files

In the WRK_DIR execute:

python train.py --batchsize 8 --layers 4 --filters 8 --epochs 75

to train the 1 million parameters model. To test other configurations reported on the paper follow this table:

Configuration --layers --filters Parameters [x 106]
1M 4 8 ~ 1.034
2M 5 6 ~ 2.329
4M 5 8 ~ 4.136
8M 5 11 ~ 7.814
16M 5 16 ~ 16.522

During training, each time validation results improve, a float model is saved in: build/float_model/{val_loss:.4f}-f_model.h5

Quantization

You can perform Post Training Quantization (PTQ) or Fast Finetuning Quantization (FFQ) to quantize the float model. PTQ is to be preferred in terms of time and computation needs. Try FFQ if you're experiencing performance losses after PTQ

1. Post Training Quantization

In the WRK_DIR execute:

python quantize.py -m build/float_model/0.1021-f_model.h5 --evaluate --calibration 500
  • You would want to try different calibration dimensions if there is a lot of performance loss after quantization

2. Fast Finetuning Quantization

In the WRK_DIR execute:

python quantize.py -m build/float_model/0.1021-f_model.h5 --evaluate --calibration 100 --fastfinetuning --fftepochs 5
  • Modify the fast_ft_epochs as you like
  • Keep in mind that FFT requires more memory as you increase the calibration dataset dimensions and the number of FFT epochs

Note that here 0.1021-f_model.h5 is just an example. Check in your build/float_model/ directory which float models have been generated during training.

  • The quantized model is saved in build/quant_model/q_model.h5.

Compilation

To compile the q_model.h5 for the FPGA execute one of these command.

  • sh compile.sh ZCU102 for the ZCU102
  • sh compile.sh ZCU104 for the ZCU104
  • sh compile.sh vck190 for the VCK190
  • sh compile.sh u50 for the ALVEO U50

For the ZCU104 (used for this work) the compiled model is saved in build/compiled_zcu104/ directory

Deployment on the evaluation board

Set up the evaluation board (we used the ZCU104 for this work) as stated here.

In the WRK_DIR execute:

sh deployment_setup.sh 0 100 zcu104
  • The firsts two args stands for

    1. Starting image in the images directories list
    2. Number of images to be prepared
  • Change the third arg (zcu104 in our case) if your target board is different

Copy the build/target/ directory to your board with scp -r build/target/ root@[email protected]:~/. assuming that the target board IP address is 192.168.1.227 - adjust this as appropriate for your system.

You could also directly copy the folder to the board SD card

On the board execute:

root@xilinx-zcu104-2021_1:~# cd target
root@xilinx-zcu104-2021_1:~/target# python3 app_mt.py --threads 4 --model unet.xmodel --save
Command line options:
 --image_dir :  images
 --threads   :  4
 --model     :  unet.xmodel
 --save      :  True
------------------------------------
Pre-processing 100 images...
Starting 4 threads...
------------------------------------
Throughput=274.73 fps, total frames = 100, time=0.3640 seconds
Saving  100  predictions...
------------------------------------

To evaluate results:

python3 scores.py       
Command line options:
 --image_dir :  predictions
 --label_dir :  labels
------------------------------
------------------------------
Global  dice :
Mean on slices: 88.77 +- 10.02
Weighted Mean on organs: 93.04 +- 0.07
------------------------------
Organs  dice
Liver: 91.63 +- 0.09
Bladder: 79.21 +- 0.09
Lungs: 96.16 +- 0.09
Kidneys: 81.32 +- 0.08
Bones: 94.35 +- 0.03

The script prints out also other metrics for a more complete analysis.

Associated Publication

If you find this repository useful, please use the following citation:

@inproceedings{berzoini2021onhow,
  title={On How to Push Efficient Medical Semantic Segmentation to the Edge: the SENECA approach},
  author={Berzoini, Raffaele and D'Arnese, Eleonora and Conficconi, Davide},
  booktitle={2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)},
  year={2022},
  organization={IEEE}
}