-
Notifications
You must be signed in to change notification settings - Fork 27.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SuperGlue model #29886
Add SuperGlue model #29886
Conversation
c58196c
to
803a801
Compare
@amyeroberts Alright, the PR is opened, even though the branch is still a bit of a mess but the code contained in the
|
803a801
to
1a0ce15
Compare
@ydshieh Hey, got a weird bug when I try to run
This seems to come from |
@sbucaille Apologies for the delay in my response. I've answered your initial questions below - let me know if anything isn't clear
This shouldn't be necessary. Doing: self.keypoint_detector = AutoModelForKeypointDetection.from_config(config.keypoint_detector_config) is the correct way to go. If the weights are all newly initialized, then first thing I'd suspect is that the checkpoint being used doesn't contain the SuperPoint weights.
SuperGlueForImageMatching taking
In this case, are the images to be pair interleaved? e.g. the sequence goes [image_1_a, image_1_b, image_2_a, image_2_b, ...] I think I would do something more similar to sentence similarity for text models:
This is just an oversight, sorry, as the model as added last minute as a patch after the branch was already cut for release. I'll update the notes. |
@ydshieh Nevermind, found the problem, it came from a left ouf @amyeroberts Everything is clear, I fixed the embedded I'll implement the new The original read_image function from authors : def frame2tensor(frame, device):
return torch.from_numpy(frame/255.).float()[None, None].to(device)
def read_image(path, device, resize, rotation, resize_float):
image = cv2.imread(str(path), cv2.IMREAD_GRAYSCALE)
if image is None:
return None, None, None
w, h = image.shape[1], image.shape[0]
w_new, h_new = process_resize(w, h, resize)
scales = (float(w) / float(w_new), float(h) / float(h_new))
if resize_float:
image = cv2.resize(image.astype('float32'), (w_new, h_new))
else:
image = cv2.resize(image, (w_new, h_new)).astype('float32')
if rotation != 0:
image = np.rot90(image, k=rotation)
if rotation % 2:
scales = scales[::-1]
inp = frame2tensor(image, device)
return image, inp, scales |
Yep, that's how it's done in Llava and other composite models. You can check this by inspecting the safetensor weight names on the hub for a checkpoint: https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf/tree/main (Click on this symbol:
None of our image processors should be using The image processors accept It would be interesting to know what the difference is between the two images after loading from PIL vs cv2 |
@amyeroberts Indeed there is no such thing of reading an image in the image processors, I was mistaken.
About that, I've modified the On additional notes :
|
bb1f337
to
3257f00
Compare
@amyeroberts Me again, I had time to write some tests, but I ended up with a very verbose implementation of the forward method. The reason is because in order to cover all tests about hidden states, we need to output 2 variables ( On different tests I have issues :
Last test to write is the one related to the pretrained model and checking whether the model outputs the proper values. EDIT : problem fixed |
79dcc0c
to
b68ae29
Compare
Hey @amyeroberts
Is it a known problem from the main branch (I tried with the superpoint tests and ended with the same result) or something on my part ? EDIT : problem fixed |
@amyeroberts Hey, you can disregard the last two posts I made as I found the origin of the problems. |
@sbucaille Apologies for the delay in my response on this PR. If I've understood correctly, you're asking about whether we should return the hidden states from the SuperPoint model when calling SuperGlue, is that right? If so, I'd say no, we don't need to return them and I agree it can act as if it were a backbone. As, technically, any keypoint detection model can feed into super glue, we can't make the assumption about all of them having the same kind of hidden states. |
@amyeroberts Hey ! No worries at all !!
All tests passed, should we move onto a first review ? |
@sbucaille Ah, OK, I see. Regarding returning hidden_states and attentions, yes, the model should return these. It's fine if the dimension here varies depending on the number of points detected, as long as not all of the dimensions vary (if that makes sense?). That is, for each "block" of the model, there should be an associated hidden_states and attentions tensor which is returned if For the other points:
Could you clarify this a bit? Specifically instantiation of what and where?
Is there any valid input to SuperGlue which involves one image? If not, then we can raise an exception when the image processor is called if the number of images isn't even
We could, although renaming models should be done lightly as this would be a breaking change. I don't think the |
Hey @amyeroberts,
Alright I made the changes to output the hidden states and attentions, in hidden states only the last dimension is "masked" and equals the highest number of keypoints in all the images, same for attentions for the last and 2nd to last.
Since SuperGlue uses SuperPoint as its keypoint detector, the
Made these changes and changed tests accordingly
I've renamed ImageMatching occurences into KeypointMatching I need to make a last pass on the code to check for quality, refactoring, naming and commenting problems in the code but overall I'd say we have everything |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on adding this model!
I've just done an initial first pass - so there will be some things I'll revisit in a more in-depth review. I might have missed something, but my first thought when looking at the structure, is there's a lot of the code which takes both of the images, are returns e.g. their respective scores, but the scores aren't dependent on one another e.g. in SuperGlueAttentionalGNN. I think it would be better to have these layers just take one image, and then call this layer twice and combine as needed in the final stages.
src/transformers/models/superpoint/convert_superpoint_to_pytorch.py
Outdated
Show resolved
Hide resolved
src/transformers/models/superglue/image_processing_superglue.py
Outdated
Show resolved
Hide resolved
Hey @amyeroberts , |
@sbucaille No offence taken! It's a very common mistake 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great - thanks for the continued work on this! A few general structural things, but we're pretty close to being ready 🤗
tests/fixtures/tests_samples/image_matching/tower_bridge_19481797_2295892421.jpg
Outdated
Show resolved
Hide resolved
src/transformers/models/superglue/image_processing_superglue.py
Outdated
Show resolved
Hide resolved
src/transformers/models/superglue/image_processing_superglue.py
Outdated
Show resolved
Hide resolved
matches_mask[i, 1, : _matches_1.shape[1]] = 1 | ||
keypoints[i, 0, : _keypoints_0.shape[1], :] = _keypoints_0 | ||
keypoints[i, 1, : _keypoints_1.shape[1], :] = _keypoints_1 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding batching of the outputs, we don't want to batch together all of the attentions and hidden states into one big tensor. The pattern for other models is a tuple of tensors is returned, with each element in the tensor representing a layer or block of the model. Sorry if I wasn't clear about this earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh but it is not the case, we don't have a big tensor for all hidden_states and a big tensor for all attentions, it is still a tuple of hidden states and a tuple of attentions. What we are batching here is the multiple tuples of hidden states and attentions from the different image matching, together
Hey @amyeroberts |
@qubvel done ! |
Hi @qubvel , @ArthurZucker , |
Thanks, @sbucaille! Arthur is on/off this week, but hopefully, he will be able to review it then. Thanks for your patience 🤗 |
…rocess_keypoint_matching (and docstring)
- use `post_process_keypoint_matching` as default docs example - add `post_process_keypoint_matching` in autodoc - add `SuperPointConfig` import under TYPE_CHECKING condition - format SuperGlueConfig docstring - add device in convert_superglue_to_hf - Fix typo - Fix KeypointMatchingOutput docstring - Removed unnecessary line - Added missing SuperGlueConfig in __init__ methods
- use batching to get keypoint detection
Hey @ArthurZucker ! Gentle 🎅 bump here, happy holidays ! |
Hey @qubvel ! Wish you all the best for this new year ! Heard some good news about vision PR merges so here is a gentle bump ! 😬 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few small nits! Sorry that this took so long 🤗
Let's make sure the model / feature (keypoint matching) is easy to use and thus add basic functionnalities for it!
import matplotlib.pyplot as plt | ||
import numpy as np | ||
|
||
# Create side by side image | ||
merged_image = np.zeros((max(image1.height, image2.height), image1.width + image2.width, 3)) | ||
merged_image[: image1.height, : image1.width] = np.array(image1) / 255.0 | ||
merged_image[: image2.height, image1.width :] = np.array(image2) / 255.0 | ||
plt.imshow(merged_image) | ||
plt.axis("off") | ||
|
||
# Retrieve the keypoints and matches | ||
output = outputs[0] | ||
keypoints0 = output["keypoints0"] | ||
keypoints1 = output["keypoints1"] | ||
matching_scores = output["matching_scores"] | ||
keypoints0_x, keypoints0_y = keypoints0[:, 0].numpy(), keypoints0[:, 1].numpy() | ||
keypoints1_x, keypoints1_y = keypoints1[:, 0].numpy(), keypoints1[:, 1].numpy() | ||
|
||
# Plot the matches | ||
for keypoint0_x, keypoint0_y, keypoint1_x, keypoint1_y, matching_score in zip( | ||
keypoints0_x, keypoints0_y, keypoints1_x, keypoints1_y, matching_scores | ||
): | ||
plt.plot( | ||
[keypoint0_x, keypoint1_x + image1.width], | ||
[keypoint0_y, keypoint1_y], | ||
color=plt.get_cmap("RdYlGn")(matching_score.item()), | ||
alpha=0.9, | ||
linewidth=0.5, | ||
) | ||
plt.scatter(keypoint0_x, keypoint0_y, c="black", s=2) | ||
plt.scatter(keypoint1_x + image1.width, keypoint1_y, c="black", s=2) | ||
|
||
# Save the plot | ||
plt.savefig("matched_image.png", dpi=300, bbox_inches='tight') | ||
plt.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think it would make sense to add this to the image processor / processor ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like a plot_keypoint_matching(images, keypoint_matching_output, path)
method ? or just as a docstring ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah a method sounds good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought a bit about this but I think it depends on whether you want to put visualization forward in the library or not. Here in this example we assume only a pair of images, but as a method in the processor, should it handle multiple pairs like other methods ? If so, should we visualize the pairs individually / all together ? In terms of plotting, should we force the template we have here or allow some customization ?
On the other hand, I don't know your policy about that, but on this SuperPoint's PR, another contributor took the opportunity of us introducing the new keypoint detection task to implement visualization in roboflow/supervision (PR still under work in progress as it appears), maybe it could also be the case for keypoint matching ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have matplotlib
dependency? otherwise, I would better just provide snippets in docs and model card (as we have right now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be considered as resolved ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we could have a soft dependency as well
|
||
|
||
def verify_model_outputs(model, model_name, device): | ||
from tests.models.superglue.test_modeling_superglue import prepare_imgs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's better to just copy the function as conversion files are supposed to be runnable and usable alone !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed this issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbucaille should we remove the import then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's push this change and merge 🤗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import got removed in a following commit yep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, sorry, looks like was looking diff for older commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clean conversion script! 🤗
src/transformers/models/superglue/image_processing_superglue.py
Outdated
Show resolved
Hide resolved
@sbucaille happy new year as well 🤗 |
Hi @ArthurZucker , happy new year to you too and thanks for the review ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤗 let's just solve the last nits and merge!
Congratulations @sbucaille on the model merged 🎉 🎉 🎉 Fantastic work! Thanks for iterating so many times to follow our standards 🤗 |
* Initial commit with template code generated by transformers-cli * Multiple additions to SuperGlue implementation : - Added the SuperGlueConfig - Added the SuperGlueModel and its implementation - Added basic weight conversion script - Added new ImageMatchingOutput dataclass * Few changes for SuperGlue * Multiple changes : - Added keypoint detection config to SuperGlueConfig - Completed convert_superglue_to_pytorch and succesfully run inference * Reverted unintentional change * Multiple changes : - Added SuperGlue to a bunch of places - Divided SuperGlue into SuperGlueForImageMatching and SuperGlueModel - Added testing images * Moved things in init files * Added docs (to be finished depending on the final implementation) * Added necessary imports and some doc * Removed unnecessary import * Fixed make fix-copies bug and ran it * Deleted SuperGlueModel Fixed convert script * Added SuperGlueImageProcessor * Changed SuperGlue to support batching pairs of images and modified ImageMatchingOutput in consequences * Changed convert_superglue_to_hf.py script to experiment different ways of reading an image and seeing its impact on performances * Added initial tests for SuperGlueImageProcessor * Added AutoModelForImageMatching in missing places and tests * Fixed keypoint_detector_output instructions * Fix style * Adapted to latest main changes * Added integration test * Fixed bugs to pass tests * Added keypoints returned by keypoint detector in the output of SuperGlue * Added doc to SuperGlue * SuperGlue returning all attention and hidden states for a fixed number of keypoints * Make style * Changed SuperGlueImageProcessor tests * Revert "SuperGlue returning all attention and hidden states for a fixed number of keypoints" Changed tests accordingly This reverts commit 5b3b669c * Added back hidden_states and attentions masked outputs with tests * Renamed ImageMatching occurences into KeypointMatching * Changed SuperGlueImageProcessor to raise error when batch_size is not even * Added docs and clarity to hidden state and attention grouping function * Fixed some code and done refactoring * Fixed typo in SuperPoint output doc * Fixed some of the formatting and variable naming problems * Removed useless function call * Removed AutoModelForKeypointMatching * Fixed SuperGlueImageProcessor to only accept paris of images * Added more fixes to SuperGlueImageProcessor * Simplified the batching of attention and hidden states * Simplified stack functions * Moved attention instructions into class * Removed unused do_batch_norm argument * Moved weight initialization to the proper place * Replaced deepcopy for instantiation * Fixed small bug * Changed from stevenbucaille to magic-leap repo * Renamed London Bridge images to Tower Bridge * Fixed formatting * Renamed remaining "london" to "tower" * Apply suggestions from code review Small changes in the docs Co-authored-by: amyeroberts <[email protected]> * Added AutoModelForKeypointMatching * Changed images used in example * Several changes to image_processing_superglue and style * Fixed resample type hint * Changed SuperGlueImageProcessor and added test case for list of 2 images * Changed list_of_tuples implementation * Fix in dummy objects * Added normalize_keypoint, log_sinkhorn_iterations and log_optimal_transport docstring * Added missing docstring * Apply suggestions from code review Co-authored-by: amyeroberts <[email protected]> * Apply suggestions from code review Co-authored-by: amyeroberts <[email protected]> * Moved forward block at bottom * Added docstring to forward method * Added docstring to match_image_pair method * Changed test_model_common_attributes to test_model_get_set_embeddings test method signature * Removed AutoModelForKeypointMatching * Removed image fixtures and added load_dataset * Added padding of images in SuperGlueImageProcessor * Cleaned up convert_superglue_to_hf script * Added missing docs and fixed unused argument * Fixed SuperGlueImageProcessor tests * Transposed all hidden states from SuperGlue to reflect the standard (..., seq_len, feature_dim) shape * Added SuperGlueForKeypointMatching back to modeling_auto * Fixed image processor padding test * Changed SuperGlue docs * changes: - Abstraction to batch, concat and stack of inconsistent tensors - Changed conv1d's to linears to match standard attention implementations - Renamed all tensors to be tensor0 and not tensor_0 and be consistent - Changed match image pair to run keypoint detection on all image first, create batching tensors and then filling these tensors matches after matches - Various changes in docs, etc * Changes to SuperGlueImageProcessor: - Reworked the input image pairs checking function and added tests accordingly - Added Copied from statements - Added do_grayscale tag (also for SuperPointImageProcessor) - Misc changes for better code * Formatting changes * Reverted conv1d to linear conversion because of numerical differences * fix: changed some code to be more straightforward (e.g. filtering keypoints) and converted plot from opencv to matplotlib * fix: removed unnecessary test * chore: removed commented code and added back hidden states transpositions * chore: changed from "inconsistent" to "ragged" function names as suggested Co-authored-by: amyeroberts <[email protected]> * docs: applied suggestions Co-authored-by: amyeroberts <[email protected]> * docs: updated to display matched output * chore: applied suggestion for check_image_pairs_input function Co-authored-by: amyeroberts <[email protected]> * chore: changed check_image_pairs_input function name to validate_and_format_image_pairs and used validate_preprocess_arguments function * tests: simplified tests for image input format and shapes * feat: converted SuperGlue's use of Conv1d with kernel_size of 1 with Linear layers. Changed tests and conversion script accordingly * feat: several changes to address comments Conversion script: - Reverted fuse batchnorm to linear conversion - Changed all 'nn.Module' to respective SuperGlue models - Changed conversion script to use regex mapping and match other recent scripts Modeling SuperGlue: - Added batching with mask and padding to attention - Removed unnecessary concat, stack and batch ragged pairs functions - Reverted batchnorm layer - Renamed query, key, value and merge layers into q, k, v, out proj - Removed Union of different Module into nn.Module in _init_weights method typehint - Changed several method's signature to combine image0 and image1 inputs with appropriate doc changes - Updated SuperGlue's doc with torch.no_grad() Updated test to reflect changes in SuperGlue model * refactor: changed validate_and_format_image_pairs function with clarity * refactor: changed from one SuperGlueMLP class to a list of SuperGlueMLP class * fix: fixed forgotten init weight change from last commit * fix: fixed rebase mistake * fix: removed leftover commented code * fix: added typehint and changed some of arguments default values * fix: fixed attribute default values for SuperGlueConfig * feat: added SuperGlueImageProcessor post process keypoint matching method with tests * fix: fixed SuperGlue attention and hidden state tuples aggregation * chore: fixed mask optionality and reordered tensor reshapes to be cleaner * chore: fixed docs and error message returned in validate_and_format_image_pairs function * fix: fixed returned keypoints to be the ones that SuperPoint returns * fix: fixed check on number of image sizes for post process compared to the pairs in outputs of SuperGlue * fix: fixed check on number of image sizes for post process compared to the pairs in outputs of SuperGlue (bis) * fix: Changed SuperGlueMultiLayerPerceptron instantiation to avoid if statement * fix: Changed convert_superglue_to_hf script to reflect latest SuperGlue changes and got rid of nn.Modules * WIP: implement Attention from an existing class (like BERT) * docs: Changed docs to include more appealing matching plot * WIP: Implement Attention * chore: minor typehint change * chore: changed convert superglue script by removing all classes and apply conv to linear conversion in state dict + rearrange keys to comply with changes in model's layers organisation * Revert "Fixed typo in SuperPoint output doc" This reverts commit 2120390. * chore: added comments in SuperGlueImageProcessor * chore: changed SuperGlue organization HF repo to magic-leap-community * [run-slow] refactor: small change in layer instantiation * [run-slow] chore: replaced remaining stevenbucaille org to magic-leap-community * [run-slow] chore: make style * chore: update image matching fixture dataset HF repository * [run-slow] superglue * tests: overwriting test_batching_equivalence * [run-slow] superglue * tests: changed test to cope with value changing depending on cuda version * [run-slow] superglue * tests: changed matching_threshold value * [run-slow] superglue * [run-slow] superglue * tests: changed tests for integration * [run-slow] superglue * fix: Changed tensor view and permutations to match original implementation results * fix: updated convert script and integration test to include last change in model * fix: increase tolerance for CUDA variances * Apply suggestions from code review Co-authored-by: Pavel Iakubovskii <[email protected]> * [run-slow] superglue * chore: removed blank whitespaces * [run-slow] superglue * Revert SuperPoint image processor accident changes * [run-slow] superglue * refactor: reverted copy from BERT class * tests: lower the tolerance in integration tests for SuperGlue * [run-slow] superglue * chore: set do_grayscale to False in SuperPoint and SuperGlue image processors * [run-slow] superglue * fix: fixed imports in SuperGlue files * chore: changed do_grayscale SuperGlueImageProcessing default value to True * docs: added typehint to post_process_keypoint_matching method in SuperGlueImageProcessor * fix: set matching_threshold default value to 0.0 instead of 0.2 * feat: added matching_threshold to post_process_keypoint_matching method * docs: update superglue.md to include matching_threshold parameter * docs: updated SuperGlueConfig docstring for matching_threshold default value * refactor: removed unnecessary parameters in SuperGlueConfig * fix: changed from matching_threshold to threshold * fix: re-revert changes to make SuperGlue attention classes copies of BERT * [run-slow] superglue * fix: added missing device argument in post_processing method * [run-slow] superglue * fix: add matches different from -1 to compute valid matches in post_process_keypoint_matching (and docstring) * fix: add device to image_sizes tensor instantiation * tests: added checks on do_grayscale test * chore: reordered and added Optional typehint to KeypointMatchingOutput * LightGluePR suggestions: - use `post_process_keypoint_matching` as default docs example - add `post_process_keypoint_matching` in autodoc - add `SuperPointConfig` import under TYPE_CHECKING condition - format SuperGlueConfig docstring - add device in convert_superglue_to_hf - Fix typo - Fix KeypointMatchingOutput docstring - Removed unnecessary line - Added missing SuperGlueConfig in __init__ methods * LightGluePR suggestions: - use batching to get keypoint detection * refactor: processing images done in 1 for loop instead of 4 * fix: use @ instead of torch.einsum for scores computation * style: added #fmt skip to long tensor values * refactor: rollbacked validate_and_format_image_pairs valid and invalid case to more simple ones * refactor: prepare_imgs * refactor: simplified `validate_and_format_image_pairs` * docs: fixed doc --------- Co-authored-by: steven <[email protected]> Co-authored-by: amyeroberts <[email protected]> Co-authored-by: Steven Bucaille <[email protected]> Co-authored-by: Pavel Iakubovskii <[email protected]>
* Initial commit with template code generated by transformers-cli * Multiple additions to SuperGlue implementation : - Added the SuperGlueConfig - Added the SuperGlueModel and its implementation - Added basic weight conversion script - Added new ImageMatchingOutput dataclass * Few changes for SuperGlue * Multiple changes : - Added keypoint detection config to SuperGlueConfig - Completed convert_superglue_to_pytorch and succesfully run inference * Reverted unintentional change * Multiple changes : - Added SuperGlue to a bunch of places - Divided SuperGlue into SuperGlueForImageMatching and SuperGlueModel - Added testing images * Moved things in init files * Added docs (to be finished depending on the final implementation) * Added necessary imports and some doc * Removed unnecessary import * Fixed make fix-copies bug and ran it * Deleted SuperGlueModel Fixed convert script * Added SuperGlueImageProcessor * Changed SuperGlue to support batching pairs of images and modified ImageMatchingOutput in consequences * Changed convert_superglue_to_hf.py script to experiment different ways of reading an image and seeing its impact on performances * Added initial tests for SuperGlueImageProcessor * Added AutoModelForImageMatching in missing places and tests * Fixed keypoint_detector_output instructions * Fix style * Adapted to latest main changes * Added integration test * Fixed bugs to pass tests * Added keypoints returned by keypoint detector in the output of SuperGlue * Added doc to SuperGlue * SuperGlue returning all attention and hidden states for a fixed number of keypoints * Make style * Changed SuperGlueImageProcessor tests * Revert "SuperGlue returning all attention and hidden states for a fixed number of keypoints" Changed tests accordingly This reverts commit 5b3b669c * Added back hidden_states and attentions masked outputs with tests * Renamed ImageMatching occurences into KeypointMatching * Changed SuperGlueImageProcessor to raise error when batch_size is not even * Added docs and clarity to hidden state and attention grouping function * Fixed some code and done refactoring * Fixed typo in SuperPoint output doc * Fixed some of the formatting and variable naming problems * Removed useless function call * Removed AutoModelForKeypointMatching * Fixed SuperGlueImageProcessor to only accept paris of images * Added more fixes to SuperGlueImageProcessor * Simplified the batching of attention and hidden states * Simplified stack functions * Moved attention instructions into class * Removed unused do_batch_norm argument * Moved weight initialization to the proper place * Replaced deepcopy for instantiation * Fixed small bug * Changed from stevenbucaille to magic-leap repo * Renamed London Bridge images to Tower Bridge * Fixed formatting * Renamed remaining "london" to "tower" * Apply suggestions from code review Small changes in the docs Co-authored-by: amyeroberts <[email protected]> * Added AutoModelForKeypointMatching * Changed images used in example * Several changes to image_processing_superglue and style * Fixed resample type hint * Changed SuperGlueImageProcessor and added test case for list of 2 images * Changed list_of_tuples implementation * Fix in dummy objects * Added normalize_keypoint, log_sinkhorn_iterations and log_optimal_transport docstring * Added missing docstring * Apply suggestions from code review Co-authored-by: amyeroberts <[email protected]> * Apply suggestions from code review Co-authored-by: amyeroberts <[email protected]> * Moved forward block at bottom * Added docstring to forward method * Added docstring to match_image_pair method * Changed test_model_common_attributes to test_model_get_set_embeddings test method signature * Removed AutoModelForKeypointMatching * Removed image fixtures and added load_dataset * Added padding of images in SuperGlueImageProcessor * Cleaned up convert_superglue_to_hf script * Added missing docs and fixed unused argument * Fixed SuperGlueImageProcessor tests * Transposed all hidden states from SuperGlue to reflect the standard (..., seq_len, feature_dim) shape * Added SuperGlueForKeypointMatching back to modeling_auto * Fixed image processor padding test * Changed SuperGlue docs * changes: - Abstraction to batch, concat and stack of inconsistent tensors - Changed conv1d's to linears to match standard attention implementations - Renamed all tensors to be tensor0 and not tensor_0 and be consistent - Changed match image pair to run keypoint detection on all image first, create batching tensors and then filling these tensors matches after matches - Various changes in docs, etc * Changes to SuperGlueImageProcessor: - Reworked the input image pairs checking function and added tests accordingly - Added Copied from statements - Added do_grayscale tag (also for SuperPointImageProcessor) - Misc changes for better code * Formatting changes * Reverted conv1d to linear conversion because of numerical differences * fix: changed some code to be more straightforward (e.g. filtering keypoints) and converted plot from opencv to matplotlib * fix: removed unnecessary test * chore: removed commented code and added back hidden states transpositions * chore: changed from "inconsistent" to "ragged" function names as suggested Co-authored-by: amyeroberts <[email protected]> * docs: applied suggestions Co-authored-by: amyeroberts <[email protected]> * docs: updated to display matched output * chore: applied suggestion for check_image_pairs_input function Co-authored-by: amyeroberts <[email protected]> * chore: changed check_image_pairs_input function name to validate_and_format_image_pairs and used validate_preprocess_arguments function * tests: simplified tests for image input format and shapes * feat: converted SuperGlue's use of Conv1d with kernel_size of 1 with Linear layers. Changed tests and conversion script accordingly * feat: several changes to address comments Conversion script: - Reverted fuse batchnorm to linear conversion - Changed all 'nn.Module' to respective SuperGlue models - Changed conversion script to use regex mapping and match other recent scripts Modeling SuperGlue: - Added batching with mask and padding to attention - Removed unnecessary concat, stack and batch ragged pairs functions - Reverted batchnorm layer - Renamed query, key, value and merge layers into q, k, v, out proj - Removed Union of different Module into nn.Module in _init_weights method typehint - Changed several method's signature to combine image0 and image1 inputs with appropriate doc changes - Updated SuperGlue's doc with torch.no_grad() Updated test to reflect changes in SuperGlue model * refactor: changed validate_and_format_image_pairs function with clarity * refactor: changed from one SuperGlueMLP class to a list of SuperGlueMLP class * fix: fixed forgotten init weight change from last commit * fix: fixed rebase mistake * fix: removed leftover commented code * fix: added typehint and changed some of arguments default values * fix: fixed attribute default values for SuperGlueConfig * feat: added SuperGlueImageProcessor post process keypoint matching method with tests * fix: fixed SuperGlue attention and hidden state tuples aggregation * chore: fixed mask optionality and reordered tensor reshapes to be cleaner * chore: fixed docs and error message returned in validate_and_format_image_pairs function * fix: fixed returned keypoints to be the ones that SuperPoint returns * fix: fixed check on number of image sizes for post process compared to the pairs in outputs of SuperGlue * fix: fixed check on number of image sizes for post process compared to the pairs in outputs of SuperGlue (bis) * fix: Changed SuperGlueMultiLayerPerceptron instantiation to avoid if statement * fix: Changed convert_superglue_to_hf script to reflect latest SuperGlue changes and got rid of nn.Modules * WIP: implement Attention from an existing class (like BERT) * docs: Changed docs to include more appealing matching plot * WIP: Implement Attention * chore: minor typehint change * chore: changed convert superglue script by removing all classes and apply conv to linear conversion in state dict + rearrange keys to comply with changes in model's layers organisation * Revert "Fixed typo in SuperPoint output doc" This reverts commit 2120390. * chore: added comments in SuperGlueImageProcessor * chore: changed SuperGlue organization HF repo to magic-leap-community * [run-slow] refactor: small change in layer instantiation * [run-slow] chore: replaced remaining stevenbucaille org to magic-leap-community * [run-slow] chore: make style * chore: update image matching fixture dataset HF repository * [run-slow] superglue * tests: overwriting test_batching_equivalence * [run-slow] superglue * tests: changed test to cope with value changing depending on cuda version * [run-slow] superglue * tests: changed matching_threshold value * [run-slow] superglue * [run-slow] superglue * tests: changed tests for integration * [run-slow] superglue * fix: Changed tensor view and permutations to match original implementation results * fix: updated convert script and integration test to include last change in model * fix: increase tolerance for CUDA variances * Apply suggestions from code review Co-authored-by: Pavel Iakubovskii <[email protected]> * [run-slow] superglue * chore: removed blank whitespaces * [run-slow] superglue * Revert SuperPoint image processor accident changes * [run-slow] superglue * refactor: reverted copy from BERT class * tests: lower the tolerance in integration tests for SuperGlue * [run-slow] superglue * chore: set do_grayscale to False in SuperPoint and SuperGlue image processors * [run-slow] superglue * fix: fixed imports in SuperGlue files * chore: changed do_grayscale SuperGlueImageProcessing default value to True * docs: added typehint to post_process_keypoint_matching method in SuperGlueImageProcessor * fix: set matching_threshold default value to 0.0 instead of 0.2 * feat: added matching_threshold to post_process_keypoint_matching method * docs: update superglue.md to include matching_threshold parameter * docs: updated SuperGlueConfig docstring for matching_threshold default value * refactor: removed unnecessary parameters in SuperGlueConfig * fix: changed from matching_threshold to threshold * fix: re-revert changes to make SuperGlue attention classes copies of BERT * [run-slow] superglue * fix: added missing device argument in post_processing method * [run-slow] superglue * fix: add matches different from -1 to compute valid matches in post_process_keypoint_matching (and docstring) * fix: add device to image_sizes tensor instantiation * tests: added checks on do_grayscale test * chore: reordered and added Optional typehint to KeypointMatchingOutput * LightGluePR suggestions: - use `post_process_keypoint_matching` as default docs example - add `post_process_keypoint_matching` in autodoc - add `SuperPointConfig` import under TYPE_CHECKING condition - format SuperGlueConfig docstring - add device in convert_superglue_to_hf - Fix typo - Fix KeypointMatchingOutput docstring - Removed unnecessary line - Added missing SuperGlueConfig in __init__ methods * LightGluePR suggestions: - use batching to get keypoint detection * refactor: processing images done in 1 for loop instead of 4 * fix: use @ instead of torch.einsum for scores computation * style: added #fmt skip to long tensor values * refactor: rollbacked validate_and_format_image_pairs valid and invalid case to more simple ones * refactor: prepare_imgs * refactor: simplified `validate_and_format_image_pairs` * docs: fixed doc --------- Co-authored-by: steven <[email protected]> Co-authored-by: amyeroberts <[email protected]> Co-authored-by: Steven Bucaille <[email protected]> Co-authored-by: Pavel Iakubovskii <[email protected]>
What does this PR do?
Fixes #25489
This PR is the next step after implementing SuperPoint to implement image matching through keypoint matching.
Colab notebook with inference example:
https://colab.research.google.com/drive/1NhwofZFzy7IMN4irN-jC-9LZy7dx_GZ2?usp=sharing
Who can review?
@amyeroberts