-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Support exporting fpn #372
Support exporting fpn #372
Conversation
@orionr FPN works with pytorch/pytorch#7091 ! Please review it when you have time :) |
The “rpn_post_nms_topN” in convert_collect_and_distribute() function should be "post_nms_topN" |
@ZesongLee Yes, I have fixed it in pytorch/pytorch#7091, because it makes the names consistent. Please use the latest caffe2 code :) |
2624603
to
27f5aee
Compare
inputs = [x for x in op.input] | ||
ret = core.CreateOperator( | ||
'CollectAndDistributeFpnRpnProposals', | ||
inputs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use list(op.input)
? It seems to have the exact same output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is from @orionr 's code. I have no idea whether it is better for me to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @daquexian,
Thanks for writing this. While running it to convert pkl model(FPN), I get an error at verify_model(args, [net, init_net], args.test_img)
Something like as shown below:
AssertionError:
Arrays are not almost equal to 3 decimals
result_boxes and result_boxes not matched. Max diff: 5.0
(mismatch 0.606060606061%)
x: array([[3.329e+02, 1.799e+01, 5.250e+02, 3.556e+02, 9.995e-01],
[0.000e+00, 2.073e+01, 2.308e+02, 3.685e+02, 9.926e-01],
[2.459e+01, 3.320e+01, 1.330e+02, 1.092e+02, 9.909e-01],...
y: array([[3.329e+02, 1.799e+01, 5.250e+02, 3.556e+02, 9.995e-01],
[0.000e+00, 2.073e+01, 2.308e+02, 3.685e+02, 9.926e-01],
[2.459e+01, 3.320e+01, 1.330e+02, 1.092e+02, 9.909e-01],...
I get this error when I use a test_img, otherwise model gets written in pb format.
How important it is to get correct boxes on test image??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daquexian Sure. I can share the model, config file and test image. How would you like me to share it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daquexian : I was able to convert pkl model(with fpn) to protobufs. However, when I try to run this net, I get following error:
RuntimeError: [enforce fail at utility_ops.h:275] . Check failed: output->dims() == Input(i).dims().Description: Input #1, input dimension:1 256 38 60 should match output dimension: 1 256 38 59 Error from operator:
input: "fpn_inner_res4_5_sum_lateral" input: "fpn_inner_res4_5_sum_topdown" output: "fpn_inner_res4_5_sum" name: "" type: "Sum" device_option { device_type: 1 cuda_gpu_id: 0 }
The error is typical of models with fpn layers. I never encountered this kind of issue before. Any pointers to what might be causing this would be of great help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daquexian : Yes, there is a method blob_utils.im_list_to_blob([im])
in convert_pkl_to_pb.py
. The size of original image is (377, 590, 3)
.
The size of the blob while data prep becomes (1, 3, 500, 782)
. Also, content of im_info
is array([[500. , 782. , 1.32626]], dtype=float32)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rohitbhio The error you encountered is caused by input size which is not a multiple of 32. When the size is not a multiple of 32, for example, the width is 36, after a certain layer, a feature map whose width=36/2/2=9 is produced, then after a conv layer whose stride=2 and a upsample2x layer, a feature map whose width=8 instead of 9 is produced, then an error occurs when summing these two feature maps whose widths are different with each other. blob_utils.im_list_to_blob
pads the input so that its height and width will be a multiple of 32. You can try to print some messages to check whether blob_utils.im_list_to_blob
works correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daquexian: Thanks for the explanation. Your observation regarding blob_utils.im_list_to_blob
method is correct. For some reason, the method was not adjusting image blob dimensions according to COARSEST_STRIDE
. Now, the size of the blob while data prep becomes (1, 3, 512, 800) and I am able to make detections. Wow. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rohitbhio Glad to see it :) Could you please tell me why the method doesn't adjust the dimensions? Is there anything wrong in my PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daquexian Your PR is awesome. I was trying to load these protobuf models without loading its corresponding config yaml file. Now, blob_utils.im_list_to_blob
method ended up using cfg.FPN.FPN_ON
and cfg.FPN.COARSEST_STRIDE
from default config file(config.py). In default config file, cfg.FPN.FPN_ON
is set to False
. For my purpose, I made a copy of im_list_to_blob
method from blob.py and modified it so that I could explicitly pass FPN_ON
and COARSEST_STRIDE
as input parameters so that this method doesn't take global values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the very minor change I suggested LGTM.
I tested it with a few different models and it ran without problem.
Pre-requisite : facebookresearch#372 Purpose : enable exporting all the models for CPU by exporting 2 separate nets : one for the bboxes and one for the rest of the inference. Two main modifications - Refactor the main() : it will call a function convert_to_pb for each sub_net - run_model_pb : always do the inference for bbox and then call mask or keypoint part if needed. The exact same approach is adopted. Then helper functions are only lightly modified to fit with the new objective to export 2 pb files
@rohitbhio It might be caused by some inconsistent implementation between python layer and corresponding c++ layer or some inherent difference between GPU and CPU. I think max diff 5.0 and mismatch 0.6% doesn't matter. You can comment |
@rbgirshick @ir413 This PR has been open for almost two months : do you plan to review it, if so do you know when you will have time to do it ? |
detectron/modeling/FPN.py
Outdated
@@ -293,7 +293,7 @@ def add_topdown_lateral_module( | |||
bias_init=const_fill(0.0) | |||
) | |||
# Top-down 2x upsampling | |||
td = model.net.UpsampleNearest(fpn_top, fpn_bottom + '_topdown', scale=2) | |||
td = model.net.ResizeNearest(fpn_top, fpn_bottom + '_topdown', width_scale=2., height_scale=2.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these ops exactly equivalent? Please verify that this change does not impact the AP of FPN model zoo models and does not impact the results when training new models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rbgirshick Thanks for your review. I will check it in several days.
@gadcam I haven't been able to follow all of the discussions about CPU support for FPN models. This PR looks self contained so I should be able to review it, though probably not until after CVPR which is next week. I left a comment about one concern over the switch from the UpsampleNearest to ResizeNearest op. I want someone to check that they produce equivalent results (the history is that when implementing this originally the C2 ResizeNearest op did not exist). |
@rbgirshick I just tested the AP of e2e_faster_rcnn_R-50-FPN_1x and e2e_keypoint_rcnn_R-50-FPN_1x , the AP are all exactly same when using Both Faster R-CNN are:
Both Keypoint R-CNN are:
|
@rbgirshick I have also checked the code. Because resize/upsample nearest is simple to implement, both operators are basically doing the same thing. The only difference is
|
@daquexian thanks for checking! Will try to review and merge in the near future. |
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed. If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
tools/convert_pkl_to_pb.py
Outdated
spatial_scale, | ||
(cfg.FPN.RPN_ANCHOR_START_SIZE * 2.**(lvl - cfg.FPN.RPN_MIN_LEVEL),) | ||
) \ | ||
if lvl else get_anchors(spatial_scale, cfg.RPN.SIZES) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be easier to read if we write
anchor_sizes = ....
blobs[anchor_name] = get_anchors(spatial_scale, anchor_sizes)
@rbgirshick The diff looks good to me. |
inputs = [x for x in op.input] | ||
ret = core.CreateOperator( | ||
'CollectAndDistributeFpnRpnProposals', | ||
inputs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, Iam trying to convert maskrcnn model from pkl to pb format. But Iam getting below error
Traceback (most recent call last):
File "convert_pkl_to_pb.py", line 586, in
main()
File "convert_pkl_to_pb.py", line 574, in main
[net, init_net] = convert_model_gpu(args, net, init_net)
File "convert_pkl_to_pb.py", line 300, in convert_model_gpu
ret = core.InjectDeviceCopiesAmongNets([ret_init_net, ret_net])
File "/usr/local/anaconda3/envs/caffetwo/lib/python2.7/site-packages/caffe2/python/core.py", line 2407, in InjectDeviceCopiesAmongNets
blob_remap=blob_remap,
File "/usr/local/anaconda3/envs/caffetwo/lib/python2.7/site-packages/caffe2/python/core.py", line 2298, in InjectCrossDeviceCopies
"input {} should be defined in the net.".format(input)
AssertionError: input rpn_rois should be defined in the net.
Input to the script is config file = e2e_mask_rcnn_R-101-FPN_1x.yaml and its pkl model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AssertionError: input rpn_rois should be defined in the net.
Looks like you also need #449 PR to do what you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, Iam able to convert the model from pkl to pb format with convert_pkl_to_pb.py with args.device = cpu. But when Iam viewing the network using netron (https://github.com/lutzroeder/Netron), mask_conv_body section is missing in the graph. I want to understand if this is the issue with convert_pkl_to_pb.py or with Netron.
@orionr @rbgirshick Did you reach a final decision for this one ? |
@orionr @rbgirshick @ir413 It is the last time I annoy you with this one : do you think it is now mergeable ? As @daquexian did the tests you required & @newstzpz approved the changes it looks like ready to merge - from an external viewpoint at least. |
@newstzpz Hi :) I have made the changes you requested. Is it ready for landing now? Thanks! |
Does this mean that it will be possible to convert both FPN and Mask models ? |
@ferasboulala #449 add support for mask models. |
@rbgirshick @orionr @newstzpz Could you please consider merging it? As you may have noticed many users are eager to see it landing. |
@daquexian please rebase and then I will do some necessary testing to ensure that the |
Another possible way is to keep UpsampleNearest unchanged in the training code but convert it to ResizeNearest when converting to pb. |
@rbgirshick @newstzpz Thanks! I will rebase in a few hours and then explore the possibility of keeping UpsampleNearest unchanged. |
@newstzpz great suggestion! @daquexian if you keep |
fdcf219
to
abaa4b9
Compare
@rbgirshick Thanks! I have rebased and I will then explore the possibility of keeping UpsampleNearest unchanged in training code. Maybe I will update several days later because I don't have much time these days :) |
@daquexian @rbgirshick To me, it looks like this should be merged with But as I understand it there will be no neon optimization and it is still a custom op so the question would still be open after merge. |
…est only when converting pkl to pb
@rbgirshick @gadcam I have updated the code. Now UpsampleNearest is kept unchanged in training code. It is easier than I imagined, I spent less than 20 minutes finishing it :) If pytorch/pytorch#10157 is merged, one can simply remove the corresponding convert_op_in_proto to use the CPU version of UpsampleNearest. I have verified the new code on faster-resnet-50-fpn and the detection part of keypoint-resnet-50-fpn, the results of both models are right. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rbgirshick has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: Based on orionr's work - [x] Solve [the problem about GenerateProposals](facebookresearch#334 (comment)) - [x] Use the existing [ResizeNearest](https://github.com/caffe2/caffe2/blob/master/caffe2/operators/resize_op.cc#L57) layer instead of UpsampleNearest. ResizeNearest has cpu implementation and neon optimization - [x] Make it work (with pytorch/pytorch#7091) With this PR, FPN is supported in cooperation with pytorch/pytorch#7091. I have verified that it works on `e2e_faster_rcnn_R-50-FPN_1x.yaml` Pull Request resolved: facebookresearch#372 Reviewed By: newstzpz Differential Revision: D9213242 Pulled By: rbgirshick fbshipit-source-id: 8fc7b77e6cbf08adaafd760505dd760df59bfd79
Based on @orionr's work
With this PR, FPN is supported in cooperation with pytorch/pytorch#7091. I have verified that it works on
e2e_faster_rcnn_R-50-FPN_1x.yaml