Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could you release small/tiny/nano version of detector and descriptor? #30

Open
zhongqiu1245 opened this issue Apr 22, 2024 · 12 comments
Open

Comments

@zhongqiu1245
Copy link

zhongqiu1245 commented Apr 22, 2024

Hello, thank you for your amazing job!
I'm really interesting of your job and want to deploy DeDoDe on mobile devices(laptop, even CPU) for some self-driving works.
But I find it is too heavy for mobile device to run DeDoDeDescriptorB, DeDoDeDetectorL.
In my computer(RTX4060 mobile 8G), only 5.4 fps when inputs with 640*480 (tensorrt_fp16)
Could you release small/tiny/nano version of detector and descriptor?
Thank you in advance!

@Parskatt
Copy link
Owner

Sure, the easiest I guess would be using vgg11 and reducing layers further. Should be doable. Not sure how much performance will degrade.

@zhongqiu1245
Copy link
Author

about 30fps in RTX4060 mobile 8G.

@Parskatt
Copy link
Owner

@zhongqiu1245 could you try out the small detector in the branch that references this issue?

Weights can be found here: https://github.com/Parskatt/DeDoDe/releases/tag/v2

@Parskatt
Copy link
Owner

It uses a VGG11 backbone and I reduced the number of layers at each scale from 8 -> 4 and cut the dimensionality in half. I think it should be about 3-4X faster than the _L detector. Could you verify?

@Parskatt
Copy link
Owner

Depending on your application it might also be possible to increase the framerate by batching, is this an option for you?

@zhongqiu1245
Copy link
Author

@Parskatt
Sorry for reply so late.
I will verify this.
Thank you!

@zhongqiu1245
Copy link
Author

zhongqiu1245 commented Apr 27, 2024

@Parskatt
Thank you for your DetectorS!
The fps increases rapidly, but still lower than 30fps (15.9fps, DetectorS + DescriptorB, 640*480, tensorrt fp16).

So I reduce the shape of img to 320 * 240, then fps=25, almost there.
Could you release a small version of Descriptor? Like DescriptorS?
Maybe this can help DoDeDo breaks up the limitation of 30fps.
Thank you!

@Parskatt
Copy link
Owner

Sure, then I think we can also reduce descriptor size. Does 128 sound better? Is descriptor dimensinality a concern?

@zhongqiu1245
Copy link
Author

Thank you for your reply !
128 sounds better.
Yes, dim is an important factor which can speed up/slow down the inference time of net.The dim is smaller, the speed is faster. However, if dim is too small, it will cause bad performance. I thought dim=64 before but I thought it maybe too small. 128 maybe better :)
Thank you for your generous!

@zhongqiu1245
Copy link
Author

zhongqiu1245 commented Apr 28, 2024

some details:
resolution: (480, 640)
preprocess: 19.606828689575195ms
detectorS: 16.09945297241211ms
descriptorB: 29.36267852783203ms
dualsoftmaxmatcher: 0.6873607635498047ms
postprocess: 0.14138221740722656ms
total: 65.89770317077637ms fps: 15.207663468720314
detectorS & descriptorB are trt_fp16

@Parskatt
Copy link
Owner

Okay, so seems like around 20fps is at least possible with current sizes.

Are you able to extract the times for the encoder/decoder parts of the network? Depending on what is taking most time might need to change enc architecture.

The final thing I guess would be to distill both networks into a single network.

@zhongqiu1245
Copy link
Author

ok, I will try later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants