-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to port DISK UNET from Kornia to Inf2. Compilation taking hours with no signs of progress #1039
Comments
Thank you for reaching out! To help us investigate this issue, can you please share the neuron versions in your environment |
Hello @fayyadd , here're the pip extract:
Here's the code to reporoduce:
Here's extractors/disk.py:
|
Currently this model is not well supported due to the use of Billinear upsampling. There is ongoing work to support this operation, but it will require an upcoming release. Original Issue Resolution The original issue should have been resolvable by just tracing the UNet portion of the model for Neuron rather than the included wrapper. One of the reasons this model does not work well with Neuron is the use of variable shaped tensors in the post-processing stages of the model: https://github.com/kornia/kornia/blob/main/kornia/feature/disk/detector.py#L56 To avoid variable shaped tensors, we can usually trace the compute-heavy (static) portion of the model and then allow the post-processing portion of the model to execute on CPU. If we supported bilinear upsampling, this would be how I would have modified the script: import torch
from disk import DISK
import torch_neuronx
# load disk model
conf = {
"max_num_keypoints": 2000,
}
disk_model = DISK(conf)
# Trace Unet
unet_input = torch.ones(1, 3, 1024, 768).mul(0.5)
compiler_args = '--verbose DEBUG --target trn1 --model-type unet-inference --optlevel 1'
unet = torch_neuronx.trace(disk_model.model.unet, unet_input, compiler_args=compiler_args)
# Replace module with traced Neuron module
disk_model.model.unet = unet
# Create final artifact
disk_input = {"image": unet_input}
neuron_disk_model_unet = torch.jit.trace(disk_model, disk_input, strict=False) |
Hi,
I'm trying to port some models from Kornia. I was able to port NetVlad and LightGlue.
When it comes to Disk, the trace command from torch_neuronx is
Input tensor is not an XLA tensor: LazyFloatType
although I moved the tensors and the model to the xla device.So, I started experimenting with
torch.jit.trace
the compiler runs but is just stuck at this debug entry:2024-11-20T19:01:27Z INFO 454853 [job.Frontend.0]: Executing: <site-packages>/neuronxcc/starfish/bin/hlo2penguin --input /tmp/ubuntu/neuroncc_compile_workdir/ab6ffbbd-bb47-4739-9ef3-fef030126a68/model.MODULE_16216335577045190367+11b4a2df.hlo_module.pb --out-dir ./ --output penguin.py --layers-per-module=1 --partition --coalesce-all-gathers=false --coalesce-reduce-scatters=false --coalesce-all-reduces=false --emit-tensor-level-dropout-ops --emit-tensor-level-rng-ops --expand-batch-norm-training --enable-native-kernel --native-kernel-auto-cast=matmult-to-bf16
The text was updated successfully, but these errors were encountered: