Community effort to bring CPU and pure Caffe2 / C++ inference support #432

gadcam · 2018-05-15T12:54:25Z

It looks like many people are asking for CPU inference and it seems it needs much work to make it happen. What I offer is that we use this issue to publicly state what work is needed and so people eager to have this feature could easily help to implement it.

@daquexian, @orionr, @rbgirshick do you have time to share a list of features / ops needed to convert all the models with convert_pkl_to_pb.py ?

Feature/Operator	Where do we need it ?	State	Difficulty
CollectAndDistributeFpnRpnProposals	FPN	🕔 PR #372 submitted & Review needed	?
...	...	...	...

I would like to contribute to this effort but I do not know where to begin.
If you are willing to implement a feature do not hesitate to tell it in this issue.

Ps: To avoid any confusion I am only a random user of the Detectron & my initiative was not solicited by the maintainers

daquexian · 2018-05-15T13:10:17Z

Based on #372, models containing FPN can be correctly converted to caffe2's .pb files. (I will rebase the PR on master soon) However only detection net will be converted even in Mask R-CNN and Keypoint R-CNN which has mask net or keypoint net.

gadcam · 2018-05-15T14:16:54Z

@daquexian I am really sorry but I think I failed to understand properly what you mean as I do not have a deep understanding how the Detectron repo works.

Do you mean that, when #372 will be merged, if we try to convert for example e2e_keypoint_rcnn_R-50-FPN_1x only the proposal part would be converted and so we could not use it on CPU ?
If the answer to this question is yes, can you help us understand what steps we need to take to achieve a complete conversion ?

daquexian · 2018-05-15T14:29:52Z

@gadcam If we try to convert e2e_keypoint_rcnn_R-50-FPN_1x, we will only get bounding boxes but not keypoint. Because in here only model.net is used, but mask and keypoint are in model.mask_net and model.keypoint_net like it. The solution seems straightforward because there are only normal layers in these nets. But if you want to infer masks or keypoints after getting bounding boxes (in order to save inference time), it seems better to save these nets in different .pb files

HappyKerry · 2018-05-16T12:59:15Z

@daquexian would you like to write a detail guild on how to change pkl to pb? Thanks

daquexian · 2018-05-16T13:12:25Z

@HappyKerry Just fetch and checkout my branch

git remote add daquexian https://github.com/daquexian/Detectron
git fetch daquexian
git checkout daquexian/add-export-support-fpn

and run convert_pkl_to_pb.py with your configuration files and weights

gadcam · 2018-05-18T10:44:22Z

@daquexian I ran convert_pkl_to_pb.py (with your patch) successfully on e2e_keypoint_rcnn_R-50-FPN_s1x and on MSRA's original ResNet-50 model.

For e2e_keypoint_rcnn_R-50-FPN_s1x I have no warning.
For MSRA's original ResNet-50 model I have the following output

Blob fpn_inner_res5_2_sum_w with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_inner_res5_2_sum_b with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_inner_res4_5_sum_lateral_w with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_inner_res4_5_sum_lateral_b with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_inner_res3_3_sum_lateral_w with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_inner_res3_3_sum_lateral_b with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_inner_res2_2_sum_lateral_w with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_inner_res2_2_sum_lateral_b with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res5_2_sum_w with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res5_2_sum_b with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res4_5_sum_w with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res4_5_sum_b with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res3_3_sum_w with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res3_3_sum_b with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res2_2_sum_w with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res2_2_sum_b with type <class 'str'> is not supported in generating init net, skipped.
Blob conv_rpn_fpn2_w with type <class 'str'> is not supported in generating init net, skipped.
Blob conv_rpn_fpn2_b with type <class 'str'> is not supported in generating init net, skipped.
Blob rpn_cls_logits_fpn2_w with type <class 'str'> is not supported in generating init net, skipped.
Blob rpn_cls_logits_fpn2_b with type <class 'str'> is not supported in generating init net, skipped.
Blob rpn_bbox_pred_fpn2_w with type <class 'str'> is not supported in generating init net, skipped.
Blob rpn_bbox_pred_fpn2_b with type <class 'str'> is not supported in generating init net, skipped.
Blob fc6_w with type <class 'str'> is not supported in generating init net, skipped.
Blob fc6_b with type <class 'str'> is not supported in generating init net, skipped.
Blob fc7_w with type <class 'str'> is not supported in generating init net, skipped.
Blob fc7_b with type <class 'str'> is not supported in generating init net, skipped.
Blob cls_score_w with type <class 'str'> is not supported in generating init net, skipped.
Blob cls_score_b with type <class 'str'> is not supported in generating init net, skipped.
Blob bbox_pred_w with type <class 'str'> is not supported in generating init net, skipped.
Blob bbox_pred_b with type <class 'str'> is not supported in generating init net, skipped.

If I try to convert model.keypoint_net from e2e_keypoint_rcnn_R-50-FPN_s1x I get

Blob fpn_res2_2_sum with type <class 'str'> is not supported in generating init net, skipped.
Blob keypoint_rois_fpn2 with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res3_3_sum with type <class 'str'> is not supported in generating init net, skipped.
Blob keypoint_rois_fpn3 with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res4_5_sum with type <class 'str'> is not supported in generating init net, skipped.
Blob keypoint_rois_fpn4 with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res5_2_sum with type <class 'str'> is not supported in generating init net, skipped.
Blob keypoint_rois_fpn5 with type <class 'str'> is not supported in generating init net, skipped.
Blob keypoint_rois_idx_restore_int32 with type <class 'str'> is not supported in generating init net, skipped.

So I have a few questions

Why is there no Blob ____ is not supported for the keypoint model when we have some for the ResNet ?
What does it means to have some Blob ____ is not supported in the ResNet ? Should we implement these operators ? (I thought the ResNet would be converted without trouble)
Same question for the keypoint part of e2e_keypoint_rcnn_R-50-FPN_s1x

daquexian · 2018-05-19T08:01:45Z

@gadcam You should use the model in Model Zoo.

gadcam · 2018-05-19T12:30:18Z

@daquexian Then that is perfect : I did use the models in the Model Zoo.
To be accurate what I call

MSRA's original ResNet-50 model is https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl
e2e_keypoint_rcnn_R-50-FPN_s1x is https://s3-us-west-2.amazonaws.com/detectron/37697714/12_2017_baselines/e2e_keypoint_rcnn_R-50-FPN_s1x.yaml.08_44_03.qrQ0ph6M/output/train/keypoints_coco_2014_train%3Akeypoints_coco_2014_valminusminival/generalized_rcnn/model_final.pkl (so in section End-to-End Keypoint-Only Mask R-CNN Baselines)

Why did you suspect I tried to convert something else ? Because I have some Blob ____ is not supported when I should not ?

daquexian · 2018-05-19T12:42:26Z

@gadcam Yes. It is reasonable that Blob ____ is not supported appears when you use an ImageNet pretrained model, because fpn, rpn and some other layers are not in ImageNet pretrained models.

Could you please tell me what ops not supported output means?

gadcam · 2018-05-19T13:51:22Z

@daquexian

Could you please tell me what ops not supported output means?

I meant Blob ____ is not supported I am sorry for my inaccuracy. (I corrected it)

It is reasonable that Blob ____ is not supported appears when you use an ImageNet pretrained model, because fpn, rpn and some other layers are not in ImageNet pretrained models.

I am not sure I got this part : do you mean that when we see Blob ____ is not supported it means the Blob needs some code from the Detectron to be fully defined ?

So I think we are getting to the point of my issue : what should we implement to avoid it ? Or can you direct me where to dive to know what we need to implement ?

If we take an example (but we could say the same thing for keypoint_rois_idx_restore_int32)

Blob keypoint_rois_fpn2 with type <class 'str'> is not supported in generating init net, skipped.

The only mention I found of keypoint_rois_fpn in the code is here

Detectron/detectron/roi_data/fast_rcnn.py

Line 103 in b3c93df

blob_names += ['keypoint_rois_fpn' + str(lvl)]

So I do not really know where to investigate to avoid this Blob ____ is not supported error.

As a side question should we implement something like

Detectron/detectron/ops/generate_proposal_labels.py

Line 30 in e5bb3a8

class GenerateProposalLabelsOp(object):

in Caffe2/PyTorch repo and then add some conversion code here to get full CPU support ?

daquexian · 2018-05-19T14:28:37Z

@gadcam Blob ____ is not supported here just indicates that the blob doesn't have any value (I don't know why its type will be 'str' when it doesn't have any value, caffe2 is strange). There is no more layers needed to implement. You can add the name of these blobs into empty_blobs like

Detectron/tools/convert_pkl_to_pb.py

Line 558 in b3c93df

empty_blobs = ['data', 'im_info']

('data' and 'im_info' are the inputs of model.net, 'fpn_res2_2_sum', 'keypoint_rois_fpn2' and so on are the inputs of model.keypoint_net)

The converted model will crash when you try to verify it. Because its inputs are not legal. Maybe giving it some proper inputs ('fpn_res2_2_sum' and so on produced by bbox branch, and also "keypoint_rois_fpnX" below) will make it run.

Detectron/detectron/core/test.py

Lines 540 to 566 in b3c93df

    
           def im_detect_keypoints(model, im_scale, boxes): 
        
               """Infer instance keypoint poses. This function must be called after 
        
               im_detect_bbox as it assumes that the Caffe2 workspace is already populated 
        
               with the necessary blobs. 
        
               Arguments: 
        
                   model (DetectionModelHelper): the detection model to use 
        
                   im_scales (list): image blob scales as returned by im_detect_bbox 
        
                   boxes (ndarray): R x 4 array of bounding box detections (e.g., as 
        
                       returned by im_detect_bbox) 
        
               Returns: 
        
                   pred_heatmaps (ndarray): R x J x M x M array of keypoint location 
        
                       logits (softmax inputs) for each of the J keypoint types output 
        
                       by the network (must be processed by keypoint_results to convert 
        
                       into point predictions in the original image coordinate space) 
        
               """ 
        
               M = cfg.KRCNN.HEATMAP_SIZE 
        
               if boxes.shape[0] == 0: 
        
                   pred_heatmaps = np.zeros((0, cfg.KRCNN.NUM_KEYPOINTS, M, M), np.float32) 
        
                   return pred_heatmaps 
        
               inputs = {'keypoint_rois': _get_rois_blob(boxes, im_scale)} 
        
               # Add multi-level rois for FPN 
        
               if cfg.FPN.MULTILEVEL_ROIS: 
        
                   _add_multilevel_rois_for_test(inputs, 'keypoint_rois')

gadcam · 2018-05-21T04:36:33Z

@daquexian Thank you for your hints, with a bit of work I was able to run e2e_keypoint_rcnn_R-50-FPN_s1x on CPU !
I will tidy up my code before sharing it.
If I am able to write something clean enough I will do a PR to enable conversion of keypoints and mask-models with test to check the correctness of the conversion. (and so an example of how to run it)
For the moment the main problem is that I could not pick programmatically the input blobs.

daquexian · 2018-05-21T05:11:31Z

@gadcam Great! Looking forward to your PR

dongmingsun · 2018-05-22T14:26:54Z

@gadcam Hi, are we able to convert the Mask R-CNN model from .pkl to .pb now?

gadcam · 2018-05-22T14:37:08Z

@dongmingsun With @daquexian's #372 + my (future) PR you will be able to convert the models from the Zoo from .pkl to two .pb files, one for the bbox and one for the mask or keypoints, and you would need to use some helper function to run them.
What I achieved is to run it without the need of a GPU, not to have a pure Caffe2 model.
I think someone more experimented than me would be able to merge these two .pb files at least. I will investigate quickly this option.

dongmingsun · 2018-05-22T15:01:29Z

@gadcam Thank you very much, so I still have to figure out how to feed a Detectron model to pure Caffe2 C++.

kundalee · 2018-05-23T04:09:25Z

@gadcam Hi, do you encounter this problem when you ran convert_pkl_to_pb.py in @daquexian .

config file: configs/12_2017_baselines/e2e_faster_rcnn_R-50-FPN_1x.yaml
model: https://s3-us-west-2.amazonaws.com/detectron/35857345/12_2017_baselines/e2e_faster_rcnn_R-50-FPN_1x.yaml.01_36_30.cUF7QR7I/output/train/coco_2014_train%3Acoco_2014_valminusminival/generalized_rcnn/model_final.pkl

Cannot find operator schema for CollectAndDistributeFpnRpnProposals. Will skip schema checking. Traceback for operator 164 in network origin_model Traceback (most recent call last): File "tools/convert_pkl_to_pb.py", line 637, in <module> main() File "tools/convert_pkl_to_pb.py", line 631, in main verify_model(args, [net, init_net], args.test_img) File "tools/convert_pkl_to_pb.py", line 569, in verify_model _run_cfg_func, _run_pb_func, test_img, check_blobs) File "/alpha/Rddd/projects/detectron0518/Detectron/detectron/utils/model_convert_utils.py", line 367, in compare_model res2 = model2_func(test_image, check_blobs) File "tools/convert_pkl_to_pb.py", line 565, in _run_pb_func return run_model_pb(args, model_pb[0], model_pb[1], im, check_blobs) File "tools/convert_pkl_to_pb.py", line 505, in run_model_pb workspace.CreateNet(net) File "/home/Rddd/data/projects/caffe2/build-cpu/caffe2/python/workspace.py", line 163, in CreateNet StringifyProto(net), overwrite, File "/home/Rddd/data/projects/caffe2/build-cpu/caffe2/python/workspace.py", line 189, in CallWithExceptionIntercept return func(*args, **kwargs) RuntimeError: [enforce fail at operator.cc:191] op. Cannot create operator of type 'CollectAndDistributeFpnRpnProposals' on the device 'CPU'. Verify that implementation for the corresponding device exist. It might also happen if the binary is not linked with the operator implementation code. If Python frontend is used it might happen if dyndep.InitOpsLibrary call is missing.

daquexian · 2018-05-23T04:15:00Z

Hi @kundalee , it seems that your caffe2 version is not the latest. You might want to pull the latest code from https://github.com/pytorch/pytorch and recompile it.

…

On Wed, May 23, 2018, 12:10 PM Kunda ***@***.***> wrote: @gadcam <https://github.com/gadcam> Hi, do you encounter this problem when you ran convert_pkl_to_pb.py in @daquexian <https://github.com/daquexian> . Cannot find operator schema for CollectAndDistributeFpnRpnProposals. Will skip schema checking. Traceback for operator 164 in network origin_model Traceback (most recent call last): File "tools/convert_pkl_to_pb.py", line 637, in main() File "tools/convert_pkl_to_pb.py", line 631, in main verify_model(args, [net, init_net], args.test_img) File "tools/convert_pkl_to_pb.py", line 569, in verify_model _run_cfg_func, _run_pb_func, test_img, check_blobs) File "/alpha/Rddd/projects/detectron0518/Detectron/detectron/utils/model_convert_utils.py", line 367, in compare_model res2 = model2_func(test_image, check_blobs) File "tools/convert_pkl_to_pb.py", line 565, in _run_pb_func return run_model_pb(args, model_pb[0], model_pb[1], im, check_blobs) File "tools/convert_pkl_to_pb.py", line 505, in run_model_pb workspace.CreateNet(net) File "/home/Rddd/data/projects/caffe2/build-cpu/caffe2/python/workspace.py", line 163, in CreateNet StringifyProto(net), overwrite, File "/home/Rddd/data/projects/caffe2/build-cpu/caffe2/python/workspace.py", line 189, in CallWithExceptionIntercept return func(*args, **kwargs) RuntimeError: [enforce fail at operator.cc:191] op. Cannot create operator of type 'CollectAndDistributeFpnRpnProposals' on the device 'CPU'. Verify that implementation for the corresponding device exist. It might also happen if the binary is not linked with the operator implementation code. If Python frontend is used it might happen if dyndep.InitOpsLibrary call is missing. Operator def: input: "rpn_rois_fpn2" input: "rpn_rois_fpn3" input: "rpn_rois_fpn4" input: "rpn_rois_fpn5" input: "rpn_rois_fpn6" input: "rpn_roi_probs_fpn2" input: "rpn_roi_probs_fpn3" input: "rpn_roi_probs_fpn4" input: "rpn_roi_probs_fpn5" input: "rpn_roi_probs_fpn6" output: "rpn_rois" output: "rois_fpn2" output: "rois_fpn3" output: "rois_fpn4" output: "rois_fpn5" output: "rois_idx_restore_int32" name: "" type: "CollectAndDistributeFpnRpnProposals" arg { name: "roi_max_level" i: 5 } arg { name: "rpn_post_nms_topN" i: 1000 } arg { name: "roi_canonical_scale" i: 224 } arg { name: "rpn_min_level" i: 2 } arg { name: "roi_canonical_level" i: 4 } arg { name: "roi_min_level" i: 2 } arg { name: "rpn_max_level" i: 6 } device_option { } engine: "" debug_info: " File "tools/convert_pkl_to_pb.py", line 637, in \n main()\n File "tools/convert_pkl_to_pb.py", line 607, in main\n convert_net(args, net.Proto(), blobs)\n File "tools/convert_pkl_to_pb.py", line 279, in convert_net\n convert_op_in_proto(net, convert_python)\n File "/alpha/Rddd/projects/detectron0518/Detectron/detectron/utils/model_convert_utils.py", line 113, in convert_op_in_proto\n convert_op_in_ops(proto.op, func_or_list)\n File "/alpha/Rddd/projects/detectron0518/Detectron/detectron/utils/model_convert_utils.py", line 102, in convert_op_in_ops\n new_ops = func(op)\n File "/alpha/Rddd/projects/detectron0518/Detectron/detectron/utils/model_convert_utils.py", line 76, in wrapper\n return f(op, **params)\n File "tools/convert_pkl_to_pb.py", line 250, in convert_python\n rpn_post_nms_topN=cfg.TEST.RPN_POST_NMS_TOP_N,\n File "tools/convert_pkl_to_pb.py", line 158, in convert_collect_and_distribute\n rpn_post_nms_topN=rpn_post_nms_topN,\n" — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#432 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ALEcn4FWwcu0YBA_OdHH_XwLx1Ba1Wogks5t1OErgaJpZM4T_faM> .

HappyKerry · 2018-05-23T09:53:05Z

@gadcam @daquexian @dongmingsun I have changed pkl model to pb model,but how to use pb model in caffe2 C++? Thanks

daquexian · 2018-05-23T16:38:53Z

@HappyKerry you can search for caffe2 android demo or thiry-party tutorials

dongmingsun · 2018-05-23T16:46:08Z

@HappyKerry caffe2_cpp_tutorial might helps.

kundalee · 2018-05-24T10:21:09Z

Hi @daquexian Thank you very much. Because of you comments, the problem CollectAndDistributeFpnRpnProposals is solved. I have already converted .pkl to .pb successfully.

But when I try to load the pb files for testing on CPU, i get this problem below. Everything is fine until I call this functionworkspace.CreateNet(net).

workspace.CreateNet(net) File "/home/Rddd/data/projects/pytorch/build/caffe2/python/workspace.py", line 152, in CreateNet StringifyProto(net), overwrite, File "/home/Rddd/data/projects/pytorch/build/caffe2/python/workspace.py", line 178, in CallWithExceptionIntercept return func(*args, **kwargs) RuntimeError: [enforce fail at operator.cc:185] op. Cannot create operator of type 'BatchPermutation' on the device 'CPU'. Verify that implementation for the corresponding device exist. It might also happen if the binary is not linked with the operator implementation code. If Python frontend is used it might happen if dyndep.InitOpsLibrary call is missing. Operator def: input: "roi_feat_shuffled" input: "rois_idx_restore_int32" output: "roi_feat" name: "" type: "BatchPermutation" device_option { } engine: ""

I have noticed that the function named verify_model after converting. It works well and no error occurred. Can someone tell me how to use pb model in caffe2 python? Thanks

daquexian · 2018-05-24T15:33:19Z

@kundalee BatchPermutation is in a caffe2 module. You need load the module in your code like

Detectron/detectron/utils/c2.py

Line 42 in e5bb3a8

dyndep.InitOpsLibrary(detectron_ops_lib)

or this tutorial.

And I haven't find how to load module in c++. No one responds to my issue (It's so normal :D) So I compiled the detectron ops into caffe2 main library as a workaround.

HappyKerry · 2018-05-25T03:07:00Z

@daquexian I met the same "BatchPermutation"problem as @kundalee, So how to compile the detectron ops into caffe2 main library ?

daquexian · 2018-05-25T04:43:04Z

@HappyKerry Just copy detectron ops into the main caffe2 ops directory and recompile.

gadcam · 2018-05-28T03:14:02Z

@dongmingsun @daquexian

I still have to figure out how to feed a Detectron model to pure Caffe2 C++.

I think someone more experimented than me would be able to merge these two .pb files at least. I will investigate quickly this option.

Assuming that #372 & #449 are correct and merged.
The main problem I see to do one of these two things is that we could put all the ops in the same net but we would need to write something like this just before inference:

def run_model_pb(args, models_pb, im, check_blobs):
    workspace.ResetWorkspace()
    net, init_net = models_pb
    workspace.RunNetOnce(init_net)
    mutils.create_input_blobs_for_net(net.Proto())
    workspace.CreateNet(net)

    input_blobs = _prepare_blobs(
        im,
        cfg.PIXEL_MEANS,
        cfg.TEST.SCALE, cfg.TEST.MAX_SIZE
    )
    boxes = ????
    if cfg.MODEL.MASK_ON:
        im_scale = input_blobs['im_info'][0][2]
        mask_rois = {'mask_rois': test._get_rois_blob(boxes, im_scale)}

        # Add multi-level rois for FPN
        if cfg.FPN.MULTILEVEL_ROIS:
            test._add_multilevel_rois_for_test(mask_rois, 'mask_rois')
        input_blobs.update(keypoints_rois)

    if cfg.MODEL.KEYPOINTS_ON:
        im_scale = input_blobs['im_info'][0][2]
        keypoints_rois = { 'keypoint_rois': test._get_rois_blob(boxes, im_scale)}

        # Add multi-level rois for FPN
        if cfg.FPN.MULTILEVEL_ROIS:
            test._add_multilevel_rois_for_test(input_blobs, 'keypoint_rois')
        input_blobs.update(keypoints_rois)

But we can not know boxes before inference...
So do we have to run this in two steps if we want to keep the exact same architecture or am I missing something ?
So @dongmingsun I think you have to do like in my PR : running first stage, "Add multi-level rois for FPN", running second stage & process the result.

As a sidenote why do we keep cfg.FPN.MULTILEVEL_ROIS if it is set to TRUE in all the CFG files ?

AwwNaiCha · 2018-06-26T19:11:31Z

@daquexian
Hello. I am new to caffe2 and Detectron. I trained a model of detectron and want to test it in caffe2. Since the current branch of detectron does not support FPN conversion, so I search around and found your branch.
I try to use your code to convert my pkl model to pb files. The model is based on Detectron tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml and trained with my own dataset.

I tried both gpu and cpu mode and got the following error. This one is got in CPU mode:

WARNING workspace.py: 185: Original python traceback for operator '121' in network 'detectron' in exception above (most recent call last):
Running pb model failed.
[enforce fail at upsample_nearest_op.h:39] . Not Implemented. Error from operator: 
input: "fpn_inner_res5_2_sum" output: "fpn_inner_res4_5_sum_topdown" name: "" type: "UpsampleNearest" arg { name: "scale" i: 2 } device_option { } engine: ""
Checking result_boxes -> result_boxes...
Traceback (most recent call last):
  File "/detectron/tools/convert_pkl_to_pb.py", line 637, in <module>
    main()
  File "/detectron/tools/convert_pkl_to_pb.py", line 631, in main
    verify_model(args, [net, init_net], args.test_img)
  File "/detectron/tools/convert_pkl_to_pb.py", line 569, in verify_model
    _run_cfg_func, _run_pb_func, test_img, check_blobs)
  File "/detectron/detectron/utils/model_convert_utils.py", line 379, in compare_model
    n1, n2, r1.shape, r2.shape)
AssertionError: Blob result_boxes and result_boxes shape mismatched: (9, 5) vs (0, 5)

Process finished with exit code 1

This one is got in GPU mode:

WARNING cnn.py:  25: [====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information.
INFO net.py:  59: Loading weights from: result50/model_iter19999.pkl
I0626 12:01:25.666318 29776 net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 0.000106857 secs
I0626 12:01:25.666505 29776 net_dag.cc:46] Number of parallel execution chains 63 Number of operators = 232
I0626 12:01:25.675417 29776 net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 8.2534e-05 secs
I0626 12:01:25.675545 29776 net_dag.cc:46] Number of parallel execution chains 30 Number of operators = 188
Running the second model...
Checking result_boxes -> result_boxes...
Traceback (most recent call last):
  File "/detectron/tools/convert_pkl_to_pb.py", line 637, in <module>
    main()
  File "/detectron/tools/convert_pkl_to_pb.py", line 631, in main
    verify_model(args, [net, init_net], args.test_img)
  File "/detectron/tools/convert_pkl_to_pb.py", line 569, in verify_model
    _run_cfg_func, _run_pb_func, test_img, check_blobs)
  File "/detectron/detectron/utils/model_convert_utils.py", line 384, in compare_model
    n1, n2, np.amax(np.absolute(r1 - r2))))
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/nose_tools/utils.py", line 963, in assert_array_almost_equal
    precision=decimal)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/nose_tools/utils.py", line 779, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 3 decimals
result_boxes and result_boxes not matched. Max diff: 4.39031982422
(mismatch 11.1111111111%)
 x: array([[7.503e+02, 3.873e+02, 8.095e+02, 4.501e+02, 9.987e-01],
       [1.055e+03, 3.291e+02, 1.113e+03, 3.970e+02, 9.965e-01],
       [8.385e+02, 3.726e+02, 8.958e+02, 4.344e+02, 9.940e-01],...
 y: array([[7.503e+02, 3.873e+02, 8.095e+02, 4.501e+02, 9.987e-01],
       [1.055e+03, 3.291e+02, 1.113e+03, 3.970e+02, 9.965e-01],
       [8.385e+02, 3.726e+02, 8.958e+02, 4.344e+02, 9.940e-01],...

Process finished with exit code 1

Do you have any idea how I can fix these? Thanks.

AwwNaiCha · 2018-07-04T00:13:01Z

@daquexian
I change the input image dimension and everything works well now! Still thank you.

Kongsea · 2018-11-05T02:08:06Z

When I run
python tools/convert_pkl_to_pb.py --cfg mm/noaug_2gpu_e2e_faster_rcnn_R-101-FPN.yaml --out_dir ttt --test_img 01.jpg --fuse_af 0 --device cpu,
it raised the following error:

AssertionError: Blob result_boxes and result_boxes shape mismatched: (195, 5) vs (117, 5)

I trained the faster-rcnn model using the pretrained imagenet model, R-101.pkl.
Could anybody give me some advice? Thanks.

pascschoSSL · 2018-11-07T12:36:22Z

@gadcam Thank you for the effort.

@dongmingsun With @daquexian's #372 + my (future) PR you will be able to convert the models from the Zoo from .pkl to two .pb files, one for the bbox and one for the mask or keypoints, and you would need to use some helper function to run them.
What I achieved is to run it without the need of a GPU, not to have a pure Caffe2 model.
I think someone more experimented than me would be able to merge these two .pb files at least. I will investigate quickly this option.

How to use the pb files in python? is there somewhere a tutorial/example? (i couldn't find something useful yet)

lilichu · 2018-11-08T07:12:04Z

Hi! @daquexian @HappyKerry @kundalee
I use convert_pkl_to_pb.py to convert the detectron model to caffe2 model successfully. Then I want to use ONNX to convert the caffe2 model to ONNX model.
I encounter the same issue as above:

WARNING:caffe2.python.workspace:Original python traceback for operator `170` in network `detectron` in exception above (most recent call last):
Traceback (most recent call last):
  File "/home/user/pycharm-2018.1.3/helpers/pydev/pydevd.py", line 1664, in <module>
    main()
  File "/home/user/pycharm-2018.1.3/helpers/pydev/pydevd.py", line 1658, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "/home/user/pycharm-2018.1.3/helpers/pydev/pydevd.py", line 1068, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/user/backup/lichu/onnx_convert/caffe2_onnx.py", line 24, in <module>
    value_info,
  File "/usr/local/lib/python2.7/dist-packages/caffe2/python/onnx/frontend.py", line 332, in caffe2_net_to_onnx_model
    model = make_model(cls.caffe2_net_to_onnx_graph(*args, **kwargs),
  File "/usr/local/lib/python2.7/dist-packages/caffe2/python/onnx/frontend.py", line 221, in caffe2_net_to_onnx_graph
    inputs)
  File "/usr/local/lib/python2.7/dist-packages/caffe2/python/onnx/helper.py", line 62, in c2_native_run_net
    ws.RunNetOnce(predict_net)
  File "/usr/local/lib/python2.7/dist-packages/caffe2/python/onnx/workspace.py", line 63, in f
    return getattr(workspace, attr)(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/caffe2/python/workspace.py", line 199, in RunNetOnce
    StringifyProto(net),
  File "/usr/local/lib/python2.7/dist-packages/caffe2/python/workspace.py", line 178, in CallWithExceptionIntercept
    return func(*args, **kwargs)
RuntimeError: [enforce fail at operator.cc:185] op. Cannot create operator of type 'BatchPermutation' on the device 'CPU'. Verify that implementation for the corresponding device exist. It might also happen if the binary is not linked with the operator implementation code. If Python frontend is used it might happen if dyndep.InitOpsLibrary call is missing. Operator def: input: "roi_feat_shuffled_1" input: "rois_idx_restore_int32_1" output: "roi_feat_1" name: "" type: "BatchPermutation" device_option { } engine: ""

Does it mean that BatchPermutation can't be found in caffe2? what should I do? thanks!

tomas-wood · 2018-11-27T23:21:48Z

Hey I'm running

python detectron/tools/convert_pkl_to_pb.py --out_dir /app/out --cfg /app/detectron/configs/12_2017_baselines/e2e_faster_rcnn_R-50-FPN_1x.yaml --device cpu  TEST.WEIGHTS model_final.pkl

and getting the following error.

Traceback (most recent call last):
  File "detectron/tools/convert_pkl_to_pb.py", line 654, in <module>
    main()
  File "detectron/tools/convert_pkl_to_pb.py", line 612, in main
    model, blobs = load_model(args)
  File "detectron/tools/convert_pkl_to_pb.py", line 420, in load_model
    model = test_engine.initialize_model_from_cfg(cfg.TEST.WEIGHTS)
  File "/app/detectron/detectron/core/test_engine.py", line 330, in initialize_model_from_cfg
    model, weights_file, gpu_id=gpu_id,
  File "/app/detectron/detectron/utils/net.py", line 112, in initialize_gpu_from_weights_file
    src_blobs[src_name].astype(np.float32, copy=False))
  File "/app/pytorch/build/caffe2/python/workspace.py", line 317, in FeedBlob
    return _Workspace_feed_blob(ws, name, arr, device_option)
  File "/app/pytorch/build/caffe2/python/workspace.py", line 654, in _Workspace_feed_blob
    return ws.create_blob(name).feed(arr, device_option)
  File "/app/pytorch/build/caffe2/python/workspace.py", line 676, in _Blob_feed
    return blob._feed(arg, device_option)
RuntimeError: [enforce fail at pybind_state.cc:348] feeder. Unknown device type encountered in FeedBlob.

I've built caffe2 with CPU-only support. Is this going to be a deal breaker? Should I fire up the GPU version and convert to PB with that? Looks like one needs GPU support to convert from pkl to pb. Is this assumption I'm making correct?

wytcsuch · 2019-04-10T13:11:45Z

Hi,I have converted .pkl model to .pb model under ubuntu16.04, and I want to use .pb in c++ windows.Do I need to install caffe2 under windows according to the tutorial https://caffe2.ai/docs/get-start.html? Platform=windows&configuration=compile?
@HappyKerry @daquexian @dongmingsun

satyajithj · 2019-05-28T20:44:25Z

@lilichu

Does it mean that BatchPermutation can't be found in caffe2? what should I do? thanks!

Refer to this comment.

satyajithj · 2019-09-23T14:51:02Z

Is this finished?

gadcam · 2019-09-28T15:39:30Z

Hello @fuzzyBatman,

To be honest I do not know what is the current state of the Detectron.
I closed this issue because I felt it was not useful any more as it did not get enough attention in the last months.

songwellxie · 2019-10-08T00:02:26Z

Hey I'm running

python detectron/tools/convert_pkl_to_pb.py --out_dir /app/out --cfg /app/detectron/configs/12_2017_baselines/e2e_faster_rcnn_R-50-FPN_1x.yaml --device cpu  TEST.WEIGHTS model_final.pkl

and getting the following error.

Traceback (most recent call last):
  File "detectron/tools/convert_pkl_to_pb.py", line 654, in <module>
    main()
  File "detectron/tools/convert_pkl_to_pb.py", line 612, in main
    model, blobs = load_model(args)
  File "detectron/tools/convert_pkl_to_pb.py", line 420, in load_model
    model = test_engine.initialize_model_from_cfg(cfg.TEST.WEIGHTS)
  File "/app/detectron/detectron/core/test_engine.py", line 330, in initialize_model_from_cfg
    model, weights_file, gpu_id=gpu_id,
  File "/app/detectron/detectron/utils/net.py", line 112, in initialize_gpu_from_weights_file
    src_blobs[src_name].astype(np.float32, copy=False))
  File "/app/pytorch/build/caffe2/python/workspace.py", line 317, in FeedBlob
    return _Workspace_feed_blob(ws, name, arr, device_option)
  File "/app/pytorch/build/caffe2/python/workspace.py", line 654, in _Workspace_feed_blob
    return ws.create_blob(name).feed(arr, device_option)
  File "/app/pytorch/build/caffe2/python/workspace.py", line 676, in _Blob_feed
    return blob._feed(arg, device_option)
RuntimeError: [enforce fail at pybind_state.cc:348] feeder. Unknown device type encountered in FeedBlob.

I've built caffe2 with CPU-only support. Is this going to be a deal breaker? Should I fire up the GPU version and convert to PB with that? Looks like one needs GPU support to convert from pkl to pb. Is this assumption I'm making correct?

Hello, did anyone come across this error? I did when I tried to run CPU-only C3D extraction.

gadcam mentioned this issue May 23, 2018

Detectron in C++ #199

Open

gadcam mentioned this issue Jun 20, 2018

Training in Python, Inference in C++ on a GPU/CPU #501

Closed

gadcam closed this as completed Sep 22, 2019

Community effort to bring CPU and pure Caffe2 / C++ inference support #432

Community effort to bring CPU and pure Caffe2 / C++ inference support #432

Comments

gadcam commented May 15, 2018

daquexian commented May 15, 2018

gadcam commented May 15, 2018 • edited Loading

daquexian commented May 15, 2018 • edited Loading

HappyKerry commented May 16, 2018

daquexian commented May 16, 2018

gadcam commented May 18, 2018 • edited Loading

daquexian commented May 19, 2018

gadcam commented May 19, 2018

daquexian commented May 19, 2018

gadcam commented May 19, 2018 • edited Loading

daquexian commented May 19, 2018 • edited Loading

gadcam commented May 21, 2018

daquexian commented May 21, 2018

dongmingsun commented May 22, 2018

gadcam commented May 22, 2018

dongmingsun commented May 22, 2018

kundalee commented May 23, 2018 • edited Loading

daquexian commented May 23, 2018 via email • edited Loading

HappyKerry commented May 23, 2018

daquexian commented May 23, 2018

dongmingsun commented May 23, 2018

kundalee commented May 24, 2018

daquexian commented May 24, 2018 • edited Loading

HappyKerry commented May 25, 2018

daquexian commented May 25, 2018 • edited Loading

gadcam commented May 28, 2018

AwwNaiCha commented Jun 26, 2018

AwwNaiCha commented Jul 4, 2018

Kongsea commented Nov 5, 2018 • edited Loading

pascschoSSL commented Nov 7, 2018

lilichu commented Nov 8, 2018 • edited Loading

tomas-wood commented Nov 27, 2018

wytcsuch commented Apr 10, 2019

satyajithj commented May 28, 2019

satyajithj commented Sep 23, 2019

gadcam commented Sep 28, 2019

songwellxie commented Oct 8, 2019

gadcam commented May 15, 2018 •

edited

Loading

daquexian commented May 15, 2018 •

edited

Loading

gadcam commented May 18, 2018 •

edited

Loading

gadcam commented May 19, 2018 •

edited

Loading

daquexian commented May 19, 2018 •

edited

Loading

kundalee commented May 23, 2018 •

edited

Loading

daquexian commented May 23, 2018 via email •

edited

Loading

daquexian commented May 24, 2018 •

edited

Loading

daquexian commented May 25, 2018 •

edited

Loading

Kongsea commented Nov 5, 2018 •

edited

Loading

lilichu commented Nov 8, 2018 •

edited

Loading