-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Orientation and descriptors are always extracted in the same Gaussian image inside one octave #52
Comments
Yes, that is a correct observation. In the earliest version of the code, I used the layer corresponding to the scale for orientations and descriptors, but in 2010 I changed that and collapsed the scales in order to gain considerably more speed. It would have been interesting to see how that affected the accuracy and how it compares to the version of today. I would like to check that later next week if I have enough energy left after giving six lectures in seven days. It could definitely have affected the accuracy. What it should have affected the most though are the magnitudes of gradients, not the directions, which should limit the effects the change had on orientations and descriptors. Most important is to get the size of the window right. |
While (not) preparing for a lecture this morning, I had to do some testing and went back to the pre-2010 version of the code. I did the same observations you did. The orientation is quite sensitive to the correct scale, while the descriptor is not. Maybe it's possible to blur on the fly when computing the gradients for orientation estimation, without the need for more memory usage. As long as you don't increase memory usage, and pollute the caches, the speed should hardly be effected, since that part of the code is not that time-critical. Worth noting though is that the best scale for computing the orientation and descriptor might not necessarily be same as the one at which the DoG was detected. After all, they try to capture different things in the image, blobs vs edges. The most important difference with respect to a traditional implementation is the observation that the semi-group property of Gaussian filters allows you to compute all scales in parallel, without sequentially filtering from scale to scale. It's just that on CPU it used to be faster to do it sequentially, since you could use smaller filter kernels. When I find more time, I will try to dig up some databases. |
Yes, it is mostly memory-bound. For the DoG building, I also used similar fusion as you used. I see you are using one kernel per octave for DoG building step. I was also fusing generation of multiple Gaussian layers and DoG layers into one kernel, but not so aggressive as you did. The reason was that I was using pretty large Gaussian window (15x15 or 17x17 with separable optimization for consecutive Gaussian layers). With my window size configuration, the halo would be too big for fusing the whole octave (possibly 35x35). Maybe my window size setting was too conservative and can be reduce. I see you are using 9x9 for one whole octave. I'm wondering if that also has impact on accuracy. Right, with the on-the-fly approach, we can calculate orientation with more accurate scale, by setting the Gaussian filter based on kpoint size. It will cause more memory read per kpoint, though, due to the halo. But maybe that's not a big issue. |
You may love it: |
Hi, I tried to read your code and found that you did not store the Gaussian smoothed images. In the original SIFT paper, orientation and descriptors are extracted from the Gaussian smoothed images with scale closest the key points. In your implementation, I see it's extracted from the first image of the octave (i.e. the only saved one). I understand this gives speed benefits, but did you thoroughly test scale/rotation invariance with this simplification?
Regards,
Yao
The text was updated successfully, but these errors were encountered: