Support exporting fpn #372

daquexian · 2018-04-17T05:08:10Z

Based on @orionr's work

Solve the problem about GenerateProposals
Use the existing ResizeNearest layer instead of UpsampleNearest. ResizeNearest has cpu implementation and neon optimization
Make it work (with [Caffe2] Fix the wrong argument name in collect_and_distribute_op pytorch/pytorch#7091)

With this PR, FPN is supported in cooperation with pytorch/pytorch#7091. I have verified that it works on e2e_faster_rcnn_R-50-FPN_1x.yaml

daquexian · 2018-04-30T08:35:48Z

@orionr FPN works with pytorch/pytorch#7091 ! Please review it when you have time :)

ZesongLee · 2018-05-02T02:19:50Z

The “rpn_post_nms_topN” in convert_collect_and_distribute() function should be "post_nms_topN"

daquexian · 2018-05-02T11:18:22Z

@ZesongLee Yes, I have fixed it in pytorch/pytorch#7091, because it makes the names consistent. Please use the latest caffe2 code :)

gadcam · 2018-05-15T09:46:52Z

This PR seems ready to merge ! And going in the direction of #24 which is dear to a lot of people.
@orionr Do you have time to check this PR ? Or do you think it is better to go with #334 ?

gadcam · 2018-05-21T19:25:55Z

tools/convert_pkl_to_pb.py

+    inputs = [x for x in op.input]
+    ret = core.CreateOperator(
+        'CollectAndDistributeFpnRpnProposals',
+        inputs,


Why not use list(op.input) ? It seems to have the exact same output.

Yes, it is from @orionr 's code. I have no idea whether it is better for me to change it.

Hi @daquexian,

Thanks for writing this. While running it to convert pkl model(FPN), I get an error at verify_model(args, [net, init_net], args.test_img)

Something like as shown below:

AssertionError:
Arrays are not almost equal to 3 decimals
result_boxes and result_boxes not matched. Max diff: 5.0
(mismatch 0.606060606061%)
x: array([[3.329e+02, 1.799e+01, 5.250e+02, 3.556e+02, 9.995e-01],
[0.000e+00, 2.073e+01, 2.308e+02, 3.685e+02, 9.926e-01],
[2.459e+01, 3.320e+01, 1.330e+02, 1.092e+02, 9.909e-01],...
y: array([[3.329e+02, 1.799e+01, 5.250e+02, 3.556e+02, 9.995e-01],
[0.000e+00, 2.073e+01, 2.308e+02, 3.685e+02, 9.926e-01],
[2.459e+01, 3.320e+01, 1.330e+02, 1.092e+02, 9.909e-01],...

I get this error when I use a test_img, otherwise model gets written in pb format.

How important it is to get correct boxes on test image??

@daquexian Sure. I can share the model, config file and test image. How would you like me to share it?

@daquexian : I was able to convert pkl model(with fpn) to protobufs. However, when I try to run this net, I get following error:

RuntimeError: [enforce fail at utility_ops.h:275] . Check failed: output->dims() == Input(i).dims().Description: Input #1, input dimension:1 256 38 60 should match output dimension: 1 256 38 59 Error from operator:
input: "fpn_inner_res4_5_sum_lateral" input: "fpn_inner_res4_5_sum_topdown" output: "fpn_inner_res4_5_sum" name: "" type: "Sum" device_option { device_type: 1 cuda_gpu_id: 0 }

The error is typical of models with fpn layers. I never encountered this kind of issue before. Any pointers to what might be causing this would be of great help.

@daquexian : Yes, there is a method blob_utils.im_list_to_blob([im]) in convert_pkl_to_pb.py. The size of original image is (377, 590, 3).

The size of the blob while data prep becomes (1, 3, 500, 782). Also, content of im_info is array([[500. , 782. , 1.32626]], dtype=float32)

@rohitbhio The error you encountered is caused by input size which is not a multiple of 32. When the size is not a multiple of 32, for example, the width is 36, after a certain layer, a feature map whose width=36/2/2=9 is produced, then after a conv layer whose stride=2 and a upsample2x layer, a feature map whose width=8 instead of 9 is produced, then an error occurs when summing these two feature maps whose widths are different with each other. blob_utils.im_list_to_blob pads the input so that its height and width will be a multiple of 32. You can try to print some messages to check whether blob_utils.im_list_to_blob works correctly.

@daquexian: Thanks for the explanation. Your observation regarding blob_utils.im_list_to_blob method is correct. For some reason, the method was not adjusting image blob dimensions according to COARSEST_STRIDE. Now, the size of the blob while data prep becomes (1, 3, 512, 800) and I am able to make detections. Wow. Thanks.

@rohitbhio Glad to see it :) Could you please tell me why the method doesn't adjust the dimensions? Is there anything wrong in my PR?

@daquexian Your PR is awesome. I was trying to load these protobuf models without loading its corresponding config yaml file. Now, blob_utils.im_list_to_blob method ended up using cfg.FPN.FPN_ON and cfg.FPN.COARSEST_STRIDE from default config file(config.py). In default config file, cfg.FPN.FPN_ON is set to False. For my purpose, I made a copy of im_list_to_blob method from blob.py and modified it so that I could explicitly pass FPN_ON and COARSEST_STRIDE as input parameters so that this method doesn't take global values.

gadcam

Apart from the very minor change I suggested LGTM.
I tested it with a few different models and it ran without problem.

Pre-requisite : facebookresearch#372 Purpose : enable exporting all the models for CPU by exporting 2 separate nets : one for the bboxes and one for the rest of the inference. Two main modifications - Refactor the main() : it will call a function convert_to_pb for each sub_net - run_model_pb : always do the inference for bbox and then call mask or keypoint part if needed. The exact same approach is adopted. Then helper functions are only lightly modified to fit with the new objective to export 2 pb files

daquexian · 2018-06-13T14:02:37Z

@rohitbhio It might be caused by some inconsistent implementation between python layer and corresponding c++ layer or some inherent difference between GPU and CPU. I think max diff 5.0 and mismatch 0.6% doesn't matter. You can comment verify_model to skip verification and save the model. And it's appreciated if you can share the model you want to convert and the test img

gadcam · 2018-06-15T00:16:15Z

@rbgirshick @ir413 This PR has been open for almost two months : do you plan to review it, if so do you know when you will have time to do it ?

rbgirshick · 2018-06-15T04:58:52Z

detectron/modeling/FPN.py

@@ -293,7 +293,7 @@ def add_topdown_lateral_module(
            bias_init=const_fill(0.0)
        )
    # Top-down 2x upsampling
-    td = model.net.UpsampleNearest(fpn_top, fpn_bottom + '_topdown', scale=2)
+    td = model.net.ResizeNearest(fpn_top, fpn_bottom + '_topdown', width_scale=2., height_scale=2.)


Are these ops exactly equivalent? Please verify that this change does not impact the AP of FPN model zoo models and does not impact the results when training new models.

@rbgirshick Thanks for your review. I will check it in several days.

rbgirshick · 2018-06-15T05:01:22Z

@gadcam I haven't been able to follow all of the discussions about CPU support for FPN models. This PR looks self contained so I should be able to review it, though probably not until after CVPR which is next week. I left a comment about one concern over the switch from the UpsampleNearest to ResizeNearest op. I want someone to check that they produce equivalent results (the history is that when implementing this originally the C2 ResizeNearest op did not exist).

daquexian · 2018-06-25T15:16:32Z

@rbgirshick I just tested the AP of e2e_faster_rcnn_R-50-FPN_1x and e2e_keypoint_rcnn_R-50-FPN_1x , the AP are all exactly same when using ResizeNearest and UpsampleNearest.

Both Faster R-CNN are:

INFO task_evaluation.py: 181: copypaste: Dataset: coco_2017_val
INFO task_evaluation.py: 183: copypaste: Task: box
INFO task_evaluation.py: 186: copypaste: AP,AP50,AP75,APs,APm,APl
INFO task_evaluation.py: 187: copypaste: 0.3671,0.5845,0.3962,0.2112,0.3985,0.4813

Both Keypoint R-CNN are:

INFO task_evaluation.py: 181: copypaste: Dataset: keypoints_coco_2017_val
INFO task_evaluation.py: 183: copypaste: Task: box
INFO task_evaluation.py: 186: copypaste: AP,AP50,AP75,APs,APm,APl
INFO task_evaluation.py: 187: copypaste: 0.5356,0.8285,0.5828,0.3649,0.6122,0.6970
INFO task_evaluation.py: 183: copypaste: Task: keypoint
INFO task_evaluation.py: 186: copypaste: AP,AP50,AP75,APm,APl
INFO task_evaluation.py: 187: copypaste: 0.6421,0.8643,0.6991,0.5854,0.7337

daquexian · 2018-06-26T03:37:50Z

@rbgirshick I have also checked the code. Because resize/upsample nearest is simple to implement, both operators are basically doing the same thing. The only difference is

There is a limitation on cuda block num in UpsampleNearest but not another
UpsampleNearest can receive an input whose ndim()==3 but ResizeNearest only support ndim()==4

rbgirshick · 2018-06-27T14:28:36Z

@daquexian thanks for checking! Will try to review and merge in the near future.

rbgirshick · 2018-06-29T23:10:57Z

@orionr @newstzpz does this diff look good to you? My understanding is that it subsumes #334, which you worked on.

facebook-github-bot · 2018-06-30T23:00:46Z

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

facebook-github-bot · 2018-06-30T23:42:36Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

newstzpz · 2018-07-03T01:05:52Z

tools/convert_pkl_to_pb.py

+            spatial_scale,
+            (cfg.FPN.RPN_ANCHOR_START_SIZE * 2.**(lvl - cfg.FPN.RPN_MIN_LEVEL),)
+        ) \
+            if lvl else get_anchors(spatial_scale, cfg.RPN.SIZES)


It may be easier to read if we write

anchor_sizes = .... blobs[anchor_name] = get_anchors(spatial_scale, anchor_sizes)

newstzpz · 2018-07-03T01:07:47Z

@rbgirshick The diff looks good to me.

scotwilli · 2018-07-12T13:47:12Z

tools/convert_pkl_to_pb.py

+    inputs = [x for x in op.input]
+    ret = core.CreateOperator(
+        'CollectAndDistributeFpnRpnProposals',
+        inputs,


Hi, Iam trying to convert maskrcnn model from pkl to pb format. But Iam getting below error
Traceback (most recent call last):
File "convert_pkl_to_pb.py", line 586, in
main()
File "convert_pkl_to_pb.py", line 574, in main
[net, init_net] = convert_model_gpu(args, net, init_net)
File "convert_pkl_to_pb.py", line 300, in convert_model_gpu
ret = core.InjectDeviceCopiesAmongNets([ret_init_net, ret_net])
File "/usr/local/anaconda3/envs/caffetwo/lib/python2.7/site-packages/caffe2/python/core.py", line 2407, in InjectDeviceCopiesAmongNets
blob_remap=blob_remap,
File "/usr/local/anaconda3/envs/caffetwo/lib/python2.7/site-packages/caffe2/python/core.py", line 2298, in InjectCrossDeviceCopies
"input {} should be defined in the net.".format(input)
AssertionError: input rpn_rois should be defined in the net.

Input to the script is config file = e2e_mask_rcnn_R-101-FPN_1x.yaml and its pkl model

AssertionError: input rpn_rois should be defined in the net.

Looks like you also need #449 PR to do what you want.

Hi, Iam able to convert the model from pkl to pb format with convert_pkl_to_pb.py with args.device = cpu. But when Iam viewing the network using netron (https://github.com/lutzroeder/Netron), mask_conv_body section is missing in the graph. I want to understand if this is the issue with convert_pkl_to_pb.py or with Netron.

gadcam · 2018-07-12T22:12:55Z

@orionr @rbgirshick Did you reach a final decision for this one ?

gadcam · 2018-07-22T14:06:23Z

@orionr @rbgirshick @ir413 It is the last time I annoy you with this one : do you think it is now mergeable ?

As @daquexian did the tests you required & @newstzpz approved the changes it looks like ready to merge - from an external viewpoint at least.

daquexian · 2018-07-23T04:01:18Z

@newstzpz Hi :) I have made the changes you requested. Is it ready for landing now? Thanks!

ferasboulala · 2018-07-25T16:02:26Z

Does this mean that it will be possible to convert both FPN and Mask models ?

gadcam · 2018-07-25T17:35:37Z

@ferasboulala #449 add support for mask models.

daquexian · 2018-07-28T14:26:44Z

@rbgirshick @orionr @newstzpz Could you please consider merging it? As you may have noticed many users are eager to see it landing.

rbgirshick · 2018-08-03T03:42:54Z

@daquexian please rebase and then I will do some necessary testing to ensure that the UpsampleNearest -> ResizeNearest op change doesn't cause any problems.

newstzpz · 2018-08-03T05:48:08Z

Another possible way is to keep UpsampleNearest unchanged in the training code but convert it to ResizeNearest when converting to pb.

daquexian · 2018-08-03T05:59:51Z

@rbgirshick @newstzpz Thanks! I will rebase in a few hours and then explore the possibility of keeping UpsampleNearest unchanged.

rbgirshick · 2018-08-03T13:11:05Z

@newstzpz great suggestion!

@daquexian if you keep UpsampleNearest unchanged in the training code then I will be able to merge this trivially. It is my only concern and requires significant testing to ensure that the behavior of existing models is unchanged.

daquexian · 2018-08-03T14:52:50Z

@rbgirshick Thanks! I have rebased and I will then explore the possibility of keeping UpsampleNearest unchanged in training code. Maybe I will update several days later because I don't have much time these days :)

gadcam · 2018-08-04T13:11:48Z

@daquexian @rbgirshick To me, it looks like this should be merged with UpsampleNearest as a first step : the PRs of @jgong5 will enable CPU support with UpsampleNearest cf #596 and pytorch/pytorch#10157.

But as I understand it there will be no neon optimization and it is still a custom op so the question would still be open after merge.

…est only when converting pkl to pb

daquexian · 2018-08-08T02:21:39Z

@rbgirshick @gadcam I have updated the code. Now UpsampleNearest is kept unchanged in training code. It is easier than I imagined, I spent less than 20 minutes finishing it :) If pytorch/pytorch#10157 is merged, one can simply remove the corresponding convert_op_in_proto to use the CPU version of UpsampleNearest.

I have verified the new code on faster-resnet-50-fpn and the detection part of keypoint-resnet-50-fpn, the results of both models are right.

facebook-github-bot

rbgirshick has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: Based on orionr's work - [x] Solve [the problem about GenerateProposals](facebookresearch#334 (comment)) - [x] Use the existing [ResizeNearest](https://github.com/caffe2/caffe2/blob/master/caffe2/operators/resize_op.cc#L57) layer instead of UpsampleNearest. ResizeNearest has cpu implementation and neon optimization - [x] Make it work (with pytorch/pytorch#7091) With this PR, FPN is supported in cooperation with pytorch/pytorch#7091. I have verified that it works on `e2e_faster_rcnn_R-50-FPN_1x.yaml` Pull Request resolved: facebookresearch#372 Reviewed By: newstzpz Differential Revision: D9213242 Pulled By: rbgirshick fbshipit-source-id: 8fc7b77e6cbf08adaafd760505dd760df59bfd79

facebook-github-bot added the CLA Signed label Apr 17, 2018

daquexian mentioned this pull request Apr 30, 2018

[Caffe2] Fix the wrong argument name in collect_and_distribute_op pytorch/pytorch#7091

Merged

daquexian mentioned this pull request Apr 30, 2018

assert not cfg.MODEL.MASK_ON, "Mask model not supported." convert_pkl_to_pb.py #387

Closed

This was referenced May 15, 2018

Help in converting pkl to pb file (FPN not supported) #134

Open

Community effort to bring CPU and pure Caffe2 / C++ inference support #432

Closed

daquexian force-pushed the add-export-support-fpn branch from 2624603 to 27f5aee Compare May 15, 2018 14:20

gadcam reviewed May 21, 2018

View reviewed changes

gadcam approved these changes May 21, 2018

View reviewed changes

This was referenced May 28, 2018

Support exporting for CPU Mask & Keypoint nets #449

Closed

CPU and pure Caffe2 / C++ inference support including mobile devices (was "the model can be use for mobile phone like as Android?") #24

Open

daquexian mentioned this pull request Jun 1, 2018

[Caffe2] Support non peer access in muji and fix bug when reduced_affix is empty pytorch/pytorch#6896

Merged

rbgirshick reviewed Jun 15, 2018

View reviewed changes

newstzpz reviewed Jul 3, 2018

View reviewed changes

scotwilli reviewed Jul 12, 2018

View reviewed changes

dedoogong mentioned this pull request Aug 3, 2018

Can Detectron model be converted to Caffe model? #600

Closed

orionr and others added 6 commits August 3, 2018 14:35

Support export of CollectAndDistributeFpnRpnProposalsOp

6064f00

Use type=Python in op_filter

94c7251

Apply conversions separately

7d01723

Support fpn

6b3de8b

Reuse code in blob.py to fix inconsistent sizes in FPN's sum

23234e4

Extract anchor_sizes to imporve readability

abaa4b9

daquexian force-pushed the add-export-support-fpn branch from fdcf219 to abaa4b9 Compare August 3, 2018 14:49

Restore UpsampleNearest in training code and convert it to ResizeNear…

3558c35

…est only when converting pkl to pb

facebook-github-bot reviewed Aug 8, 2018

View reviewed changes

facebook-github-bot closed this in 9387594 Aug 8, 2018

gadcam mentioned this pull request Aug 16, 2018

Support export of CollectAndDistributeFpnRpnProposalsOp #334

Closed

Support exporting fpn #372

Support exporting fpn #372

Conversation

daquexian commented Apr 17, 2018 • edited Loading

daquexian commented Apr 30, 2018

ZesongLee commented May 2, 2018

daquexian commented May 2, 2018

gadcam commented May 15, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gadcam left a comment • edited Loading

Choose a reason for hiding this comment

daquexian commented Jun 13, 2018

gadcam commented Jun 15, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rbgirshick commented Jun 15, 2018

daquexian commented Jun 25, 2018 • edited Loading

daquexian commented Jun 26, 2018 • edited Loading

rbgirshick commented Jun 27, 2018

rbgirshick commented Jun 29, 2018

facebook-github-bot commented Jun 30, 2018

facebook-github-bot commented Jun 30, 2018

Choose a reason for hiding this comment

newstzpz commented Jul 3, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gadcam commented Jul 12, 2018

gadcam commented Jul 22, 2018

daquexian commented Jul 23, 2018 • edited Loading

ferasboulala commented Jul 25, 2018

gadcam commented Jul 25, 2018

daquexian commented Jul 28, 2018

rbgirshick commented Aug 3, 2018

newstzpz commented Aug 3, 2018

daquexian commented Aug 3, 2018

rbgirshick commented Aug 3, 2018

daquexian commented Aug 3, 2018

gadcam commented Aug 4, 2018

daquexian commented Aug 8, 2018 • edited Loading

facebook-github-bot left a comment

Choose a reason for hiding this comment

daquexian commented Apr 17, 2018 •

edited

Loading

gadcam left a comment •

edited

Loading

daquexian commented Jun 25, 2018 •

edited

Loading

daquexian commented Jun 26, 2018 •

edited

Loading

daquexian commented Jul 23, 2018 •

edited

Loading

daquexian commented Aug 8, 2018 •

edited

Loading