Skip to content
This repository has been archived by the owner on Nov 21, 2023. It is now read-only.

Community effort to bring CPU and pure Caffe2 / C++ inference support #432

Closed
gadcam opened this issue May 15, 2018 · 37 comments
Closed

Community effort to bring CPU and pure Caffe2 / C++ inference support #432

gadcam opened this issue May 15, 2018 · 37 comments

Comments

@gadcam
Copy link
Contributor

gadcam commented May 15, 2018

It looks like many people are asking for CPU inference and it seems it needs much work to make it happen. What I offer is that we use this issue to publicly state what work is needed and so people eager to have this feature could easily help to implement it.

@daquexian, @orionr, @rbgirshick do you have time to share a list of features / ops needed to convert all the models with convert_pkl_to_pb.py ?

Feature/Operator Where do we need it ? State Difficulty
CollectAndDistributeFpnRpnProposals FPN 🕔 PR #372 submitted & Review needed ?
... ... ... ...

I would like to contribute to this effort but I do not know where to begin.
If you are willing to implement a feature do not hesitate to tell it in this issue.

Ps: To avoid any confusion I am only a random user of the Detectron & my initiative was not solicited by the maintainers

@daquexian
Copy link
Contributor

Based on #372, models containing FPN can be correctly converted to caffe2's .pb files. (I will rebase the PR on master soon) However only detection net will be converted even in Mask R-CNN and Keypoint R-CNN which has mask net or keypoint net.

@gadcam
Copy link
Contributor Author

gadcam commented May 15, 2018

@daquexian I am really sorry but I think I failed to understand properly what you mean as I do not have a deep understanding how the Detectron repo works.

Do you mean that, when #372 will be merged, if we try to convert for example e2e_keypoint_rcnn_R-50-FPN_1x only the proposal part would be converted and so we could not use it on CPU ?
If the answer to this question is yes, can you help us understand what steps we need to take to achieve a complete conversion ?

@daquexian
Copy link
Contributor

daquexian commented May 15, 2018

@gadcam If we try to convert e2e_keypoint_rcnn_R-50-FPN_1x, we will only get bounding boxes but not keypoint. Because in here only model.net is used, but mask and keypoint are in model.mask_net and model.keypoint_net like it. The solution seems straightforward because there are only normal layers in these nets. But if you want to infer masks or keypoints after getting bounding boxes (in order to save inference time), it seems better to save these nets in different .pb files

@HappyKerry
Copy link

@daquexian would you like to write a detail guild on how to change pkl to pb? Thanks

@daquexian
Copy link
Contributor

@HappyKerry Just fetch and checkout my branch

git remote add daquexian https://github.com/daquexian/Detectron
git fetch daquexian
git checkout daquexian/add-export-support-fpn

and run convert_pkl_to_pb.py with your configuration files and weights

@gadcam
Copy link
Contributor Author

gadcam commented May 18, 2018

@daquexian I ran convert_pkl_to_pb.py (with your patch) successfully on e2e_keypoint_rcnn_R-50-FPN_s1x and on MSRA's original ResNet-50 model.

For e2e_keypoint_rcnn_R-50-FPN_s1x I have no warning.
For MSRA's original ResNet-50 model I have the following output

Blob fpn_inner_res5_2_sum_w with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_inner_res5_2_sum_b with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_inner_res4_5_sum_lateral_w with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_inner_res4_5_sum_lateral_b with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_inner_res3_3_sum_lateral_w with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_inner_res3_3_sum_lateral_b with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_inner_res2_2_sum_lateral_w with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_inner_res2_2_sum_lateral_b with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res5_2_sum_w with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res5_2_sum_b with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res4_5_sum_w with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res4_5_sum_b with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res3_3_sum_w with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res3_3_sum_b with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res2_2_sum_w with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res2_2_sum_b with type <class 'str'> is not supported in generating init net, skipped.
Blob conv_rpn_fpn2_w with type <class 'str'> is not supported in generating init net, skipped.
Blob conv_rpn_fpn2_b with type <class 'str'> is not supported in generating init net, skipped.
Blob rpn_cls_logits_fpn2_w with type <class 'str'> is not supported in generating init net, skipped.
Blob rpn_cls_logits_fpn2_b with type <class 'str'> is not supported in generating init net, skipped.
Blob rpn_bbox_pred_fpn2_w with type <class 'str'> is not supported in generating init net, skipped.
Blob rpn_bbox_pred_fpn2_b with type <class 'str'> is not supported in generating init net, skipped.
Blob fc6_w with type <class 'str'> is not supported in generating init net, skipped.
Blob fc6_b with type <class 'str'> is not supported in generating init net, skipped.
Blob fc7_w with type <class 'str'> is not supported in generating init net, skipped.
Blob fc7_b with type <class 'str'> is not supported in generating init net, skipped.
Blob cls_score_w with type <class 'str'> is not supported in generating init net, skipped.
Blob cls_score_b with type <class 'str'> is not supported in generating init net, skipped.
Blob bbox_pred_w with type <class 'str'> is not supported in generating init net, skipped.
Blob bbox_pred_b with type <class 'str'> is not supported in generating init net, skipped.

If I try to convert model.keypoint_net from e2e_keypoint_rcnn_R-50-FPN_s1x I get

Blob fpn_res2_2_sum with type <class 'str'> is not supported in generating init net, skipped.
Blob keypoint_rois_fpn2 with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res3_3_sum with type <class 'str'> is not supported in generating init net, skipped.
Blob keypoint_rois_fpn3 with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res4_5_sum with type <class 'str'> is not supported in generating init net, skipped.
Blob keypoint_rois_fpn4 with type <class 'str'> is not supported in generating init net, skipped.
Blob fpn_res5_2_sum with type <class 'str'> is not supported in generating init net, skipped.
Blob keypoint_rois_fpn5 with type <class 'str'> is not supported in generating init net, skipped.
Blob keypoint_rois_idx_restore_int32 with type <class 'str'> is not supported in generating init net, skipped.

So I have a few questions

  • Why is there no Blob ____ is not supported for the keypoint model when we have some for the ResNet ?
  • What does it means to have some Blob ____ is not supported in the ResNet ? Should we implement these operators ? (I thought the ResNet would be converted without trouble)
  • Same question for the keypoint part of e2e_keypoint_rcnn_R-50-FPN_s1x

@daquexian
Copy link
Contributor

@gadcam You should use the model in Model Zoo.

@gadcam
Copy link
Contributor Author

gadcam commented May 19, 2018

@daquexian Then that is perfect : I did use the models in the Model Zoo.
To be accurate what I call

Why did you suspect I tried to convert something else ? Because I have some Blob ____ is not supported when I should not ?

@daquexian
Copy link
Contributor

@gadcam Yes. It is reasonable that Blob ____ is not supported appears when you use an ImageNet pretrained model, because fpn, rpn and some other layers are not in ImageNet pretrained models.

Could you please tell me what ops not supported output means?

@gadcam
Copy link
Contributor Author

gadcam commented May 19, 2018

@daquexian

Could you please tell me what ops not supported output means?

I meant Blob ____ is not supported I am sorry for my inaccuracy. (I corrected it)

It is reasonable that Blob ____ is not supported appears when you use an ImageNet pretrained model, because fpn, rpn and some other layers are not in ImageNet pretrained models.

I am not sure I got this part : do you mean that when we see Blob ____ is not supported it means the Blob needs some code from the Detectron to be fully defined ?

So I think we are getting to the point of my issue : what should we implement to avoid it ? Or can you direct me where to dive to know what we need to implement ?

If we take an example (but we could say the same thing for keypoint_rois_idx_restore_int32)

Blob keypoint_rois_fpn2 with type <class 'str'> is not supported in generating init net, skipped.

The only mention I found of keypoint_rois_fpn in the code is here

blob_names += ['keypoint_rois_fpn' + str(lvl)]

So I do not really know where to investigate to avoid this Blob ____ is not supported error.

As a side question should we implement something like

class GenerateProposalLabelsOp(object):
in Caffe2/PyTorch repo and then add some conversion code here to get full CPU support ?

@daquexian
Copy link
Contributor

daquexian commented May 19, 2018

@gadcam Blob ____ is not supported here just indicates that the blob doesn't have any value (I don't know why its type will be 'str' when it doesn't have any value, caffe2 is strange). There is no more layers needed to implement. You can add the name of these blobs into empty_blobs like

empty_blobs = ['data', 'im_info']

('data' and 'im_info' are the inputs of model.net, 'fpn_res2_2_sum', 'keypoint_rois_fpn2' and so on are the inputs of model.keypoint_net)

The converted model will crash when you try to verify it. Because its inputs are not legal. Maybe giving it some proper inputs ('fpn_res2_2_sum' and so on produced by bbox branch, and also "keypoint_rois_fpnX" below) will make it run.

def im_detect_keypoints(model, im_scale, boxes):
"""Infer instance keypoint poses. This function must be called after
im_detect_bbox as it assumes that the Caffe2 workspace is already populated
with the necessary blobs.
Arguments:
model (DetectionModelHelper): the detection model to use
im_scales (list): image blob scales as returned by im_detect_bbox
boxes (ndarray): R x 4 array of bounding box detections (e.g., as
returned by im_detect_bbox)
Returns:
pred_heatmaps (ndarray): R x J x M x M array of keypoint location
logits (softmax inputs) for each of the J keypoint types output
by the network (must be processed by keypoint_results to convert
into point predictions in the original image coordinate space)
"""
M = cfg.KRCNN.HEATMAP_SIZE
if boxes.shape[0] == 0:
pred_heatmaps = np.zeros((0, cfg.KRCNN.NUM_KEYPOINTS, M, M), np.float32)
return pred_heatmaps
inputs = {'keypoint_rois': _get_rois_blob(boxes, im_scale)}
# Add multi-level rois for FPN
if cfg.FPN.MULTILEVEL_ROIS:
_add_multilevel_rois_for_test(inputs, 'keypoint_rois')

@gadcam
Copy link
Contributor Author

gadcam commented May 21, 2018

@daquexian Thank you for your hints, with a bit of work I was able to run e2e_keypoint_rcnn_R-50-FPN_s1x on CPU !
I will tidy up my code before sharing it.
If I am able to write something clean enough I will do a PR to enable conversion of keypoints and mask-models with test to check the correctness of the conversion. (and so an example of how to run it)
For the moment the main problem is that I could not pick programmatically the input blobs.

@daquexian
Copy link
Contributor

@gadcam Great! Looking forward to your PR

@dongmingsun
Copy link

@gadcam Hi, are we able to convert the Mask R-CNN model from .pkl to .pb now?

@gadcam
Copy link
Contributor Author

gadcam commented May 22, 2018

@dongmingsun With @daquexian's #372 + my (future) PR you will be able to convert the models from the Zoo from .pkl to two .pb files, one for the bbox and one for the mask or keypoints, and you would need to use some helper function to run them.
What I achieved is to run it without the need of a GPU, not to have a pure Caffe2 model.
I think someone more experimented than me would be able to merge these two .pb files at least. I will investigate quickly this option.

@dongmingsun
Copy link

@gadcam Thank you very much, so I still have to figure out how to feed a Detectron model to pure Caffe2 C++.

@kundalee
Copy link

kundalee commented May 23, 2018

@gadcam Hi, do you encounter this problem when you ran convert_pkl_to_pb.py in @daquexian .

Cannot find operator schema for CollectAndDistributeFpnRpnProposals. Will skip schema checking. Traceback for operator 164 in network origin_model Traceback (most recent call last): File "tools/convert_pkl_to_pb.py", line 637, in <module> main() File "tools/convert_pkl_to_pb.py", line 631, in main verify_model(args, [net, init_net], args.test_img) File "tools/convert_pkl_to_pb.py", line 569, in verify_model _run_cfg_func, _run_pb_func, test_img, check_blobs) File "/alpha/Rddd/projects/detectron0518/Detectron/detectron/utils/model_convert_utils.py", line 367, in compare_model res2 = model2_func(test_image, check_blobs) File "tools/convert_pkl_to_pb.py", line 565, in _run_pb_func return run_model_pb(args, model_pb[0], model_pb[1], im, check_blobs) File "tools/convert_pkl_to_pb.py", line 505, in run_model_pb workspace.CreateNet(net) File "/home/Rddd/data/projects/caffe2/build-cpu/caffe2/python/workspace.py", line 163, in CreateNet StringifyProto(net), overwrite, File "/home/Rddd/data/projects/caffe2/build-cpu/caffe2/python/workspace.py", line 189, in CallWithExceptionIntercept return func(*args, **kwargs) RuntimeError: [enforce fail at operator.cc:191] op. Cannot create operator of type 'CollectAndDistributeFpnRpnProposals' on the device 'CPU'. Verify that implementation for the corresponding device exist. It might also happen if the binary is not linked with the operator implementation code. If Python frontend is used it might happen if dyndep.InitOpsLibrary call is missing.

@daquexian
Copy link
Contributor

daquexian commented May 23, 2018 via email

@HappyKerry
Copy link

@gadcam @daquexian @dongmingsun I have changed pkl model to pb model,but how to use pb model in caffe2 C++? Thanks

@daquexian
Copy link
Contributor

@HappyKerry you can search for caffe2 android demo or thiry-party tutorials

@dongmingsun
Copy link

@HappyKerry caffe2_cpp_tutorial might helps.

@kundalee
Copy link

Hi @daquexian Thank you very much. Because of you comments, the problem CollectAndDistributeFpnRpnProposals is solved. I have already converted .pkl to .pb successfully.

But when I try to load the pb files for testing on CPU, i get this problem below. Everything is fine until I call this functionworkspace.CreateNet(net).

workspace.CreateNet(net) File "/home/Rddd/data/projects/pytorch/build/caffe2/python/workspace.py", line 152, in CreateNet StringifyProto(net), overwrite, File "/home/Rddd/data/projects/pytorch/build/caffe2/python/workspace.py", line 178, in CallWithExceptionIntercept return func(*args, **kwargs) RuntimeError: [enforce fail at operator.cc:185] op. Cannot create operator of type 'BatchPermutation' on the device 'CPU'. Verify that implementation for the corresponding device exist. It might also happen if the binary is not linked with the operator implementation code. If Python frontend is used it might happen if dyndep.InitOpsLibrary call is missing. Operator def: input: "roi_feat_shuffled" input: "rois_idx_restore_int32" output: "roi_feat" name: "" type: "BatchPermutation" device_option { } engine: ""

I have noticed that the function named verify_model after converting. It works well and no error occurred. Can someone tell me how to use pb model in caffe2 python? Thanks

@daquexian
Copy link
Contributor

daquexian commented May 24, 2018

@kundalee BatchPermutation is in a caffe2 module. You need load the module in your code like

dyndep.InitOpsLibrary(detectron_ops_lib)
or this tutorial.

And I haven't find how to load module in c++. No one responds to my issue (It's so normal :D) So I compiled the detectron ops into caffe2 main library as a workaround.

@HappyKerry
Copy link

@daquexian I met the same "BatchPermutation"problem as @kundalee, So how to compile the detectron ops into caffe2 main library ?

@daquexian
Copy link
Contributor

daquexian commented May 25, 2018

@HappyKerry Just copy detectron ops into the main caffe2 ops directory and recompile.

@gadcam
Copy link
Contributor Author

gadcam commented May 28, 2018

@dongmingsun @daquexian

I still have to figure out how to feed a Detectron model to pure Caffe2 C++.

I think someone more experimented than me would be able to merge these two .pb files at least. I will investigate quickly this option.

Assuming that #372 & #449 are correct and merged.
The main problem I see to do one of these two things is that we could put all the ops in the same net but we would need to write something like this just before inference:

def run_model_pb(args, models_pb, im, check_blobs):
    workspace.ResetWorkspace()
    net, init_net = models_pb
    workspace.RunNetOnce(init_net)
    mutils.create_input_blobs_for_net(net.Proto())
    workspace.CreateNet(net)

    input_blobs = _prepare_blobs(
        im,
        cfg.PIXEL_MEANS,
        cfg.TEST.SCALE, cfg.TEST.MAX_SIZE
    )
    boxes = ????
    if cfg.MODEL.MASK_ON:
        im_scale = input_blobs['im_info'][0][2]
        mask_rois = {'mask_rois': test._get_rois_blob(boxes, im_scale)}

        # Add multi-level rois for FPN
        if cfg.FPN.MULTILEVEL_ROIS:
            test._add_multilevel_rois_for_test(mask_rois, 'mask_rois')
        input_blobs.update(keypoints_rois)

    if cfg.MODEL.KEYPOINTS_ON:
        im_scale = input_blobs['im_info'][0][2]
        keypoints_rois = { 'keypoint_rois': test._get_rois_blob(boxes, im_scale)}

        # Add multi-level rois for FPN
        if cfg.FPN.MULTILEVEL_ROIS:
            test._add_multilevel_rois_for_test(input_blobs, 'keypoint_rois')
        input_blobs.update(keypoints_rois)

But we can not know boxes before inference...
So do we have to run this in two steps if we want to keep the exact same architecture or am I missing something ?
So @dongmingsun I think you have to do like in my PR : running first stage, "Add multi-level rois for FPN", running second stage & process the result.

As a sidenote why do we keep cfg.FPN.MULTILEVEL_ROIS if it is set to TRUE in all the CFG files ?

@AwwNaiCha
Copy link

@daquexian
Hello. I am new to caffe2 and Detectron. I trained a model of detectron and want to test it in caffe2. Since the current branch of detectron does not support FPN conversion, so I search around and found your branch.
I try to use your code to convert my pkl model to pb files. The model is based on Detectron tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml and trained with my own dataset.

I tried both gpu and cpu mode and got the following error. This one is got in CPU mode:

WARNING workspace.py: 185: Original python traceback for operator '121' in network 'detectron' in exception above (most recent call last):
Running pb model failed.
[enforce fail at upsample_nearest_op.h:39] . Not Implemented. Error from operator: 
input: "fpn_inner_res5_2_sum" output: "fpn_inner_res4_5_sum_topdown" name: "" type: "UpsampleNearest" arg { name: "scale" i: 2 } device_option { } engine: ""
Checking result_boxes -> result_boxes...
Traceback (most recent call last):
  File "/detectron/tools/convert_pkl_to_pb.py", line 637, in <module>
    main()
  File "/detectron/tools/convert_pkl_to_pb.py", line 631, in main
    verify_model(args, [net, init_net], args.test_img)
  File "/detectron/tools/convert_pkl_to_pb.py", line 569, in verify_model
    _run_cfg_func, _run_pb_func, test_img, check_blobs)
  File "/detectron/detectron/utils/model_convert_utils.py", line 379, in compare_model
    n1, n2, r1.shape, r2.shape)
AssertionError: Blob result_boxes and result_boxes shape mismatched: (9, 5) vs (0, 5)

Process finished with exit code 1

This one is got in GPU mode:

WARNING cnn.py:  25: [====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information.
INFO net.py:  59: Loading weights from: result50/model_iter19999.pkl
I0626 12:01:25.666318 29776 net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 0.000106857 secs
I0626 12:01:25.666505 29776 net_dag.cc:46] Number of parallel execution chains 63 Number of operators = 232
I0626 12:01:25.675417 29776 net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 8.2534e-05 secs
I0626 12:01:25.675545 29776 net_dag.cc:46] Number of parallel execution chains 30 Number of operators = 188
Running the second model...
Checking result_boxes -> result_boxes...
Traceback (most recent call last):
  File "/detectron/tools/convert_pkl_to_pb.py", line 637, in <module>
    main()
  File "/detectron/tools/convert_pkl_to_pb.py", line 631, in main
    verify_model(args, [net, init_net], args.test_img)
  File "/detectron/tools/convert_pkl_to_pb.py", line 569, in verify_model
    _run_cfg_func, _run_pb_func, test_img, check_blobs)
  File "/detectron/detectron/utils/model_convert_utils.py", line 384, in compare_model
    n1, n2, np.amax(np.absolute(r1 - r2))))
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/nose_tools/utils.py", line 963, in assert_array_almost_equal
    precision=decimal)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/nose_tools/utils.py", line 779, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 3 decimals
result_boxes and result_boxes not matched. Max diff: 4.39031982422
(mismatch 11.1111111111%)
 x: array([[7.503e+02, 3.873e+02, 8.095e+02, 4.501e+02, 9.987e-01],
       [1.055e+03, 3.291e+02, 1.113e+03, 3.970e+02, 9.965e-01],
       [8.385e+02, 3.726e+02, 8.958e+02, 4.344e+02, 9.940e-01],...
 y: array([[7.503e+02, 3.873e+02, 8.095e+02, 4.501e+02, 9.987e-01],
       [1.055e+03, 3.291e+02, 1.113e+03, 3.970e+02, 9.965e-01],
       [8.385e+02, 3.726e+02, 8.958e+02, 4.344e+02, 9.940e-01],...

Process finished with exit code 1

Do you have any idea how I can fix these? Thanks.

@AwwNaiCha
Copy link

@daquexian
I change the input image dimension and everything works well now! Still thank you.

@Kongsea
Copy link

Kongsea commented Nov 5, 2018

When I run
python tools/convert_pkl_to_pb.py --cfg mm/noaug_2gpu_e2e_faster_rcnn_R-101-FPN.yaml --out_dir ttt --test_img 01.jpg --fuse_af 0 --device cpu,
it raised the following error:

AssertionError: Blob result_boxes and result_boxes shape mismatched: (195, 5) vs (117, 5)

I trained the faster-rcnn model using the pretrained imagenet model, R-101.pkl.
Could anybody give me some advice? Thanks.

@pascschoSSL
Copy link

@gadcam Thank you for the effort.

@dongmingsun With @daquexian's #372 + my (future) PR you will be able to convert the models from the Zoo from .pkl to two .pb files, one for the bbox and one for the mask or keypoints, and you would need to use some helper function to run them.
What I achieved is to run it without the need of a GPU, not to have a pure Caffe2 model.
I think someone more experimented than me would be able to merge these two .pb files at least. I will investigate quickly this option.

How to use the pb files in python? is there somewhere a tutorial/example? (i couldn't find something useful yet)

@lilichu
Copy link

lilichu commented Nov 8, 2018

Hi! @daquexian @HappyKerry @kundalee
I use convert_pkl_to_pb.py to convert the detectron model to caffe2 model successfully. Then I want to use ONNX to convert the caffe2 model to ONNX model.
I encounter the same issue as above:

WARNING:caffe2.python.workspace:Original python traceback for operator `170` in network `detectron` in exception above (most recent call last):
Traceback (most recent call last):
  File "/home/user/pycharm-2018.1.3/helpers/pydev/pydevd.py", line 1664, in <module>
    main()
  File "/home/user/pycharm-2018.1.3/helpers/pydev/pydevd.py", line 1658, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "/home/user/pycharm-2018.1.3/helpers/pydev/pydevd.py", line 1068, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/user/backup/lichu/onnx_convert/caffe2_onnx.py", line 24, in <module>
    value_info,
  File "/usr/local/lib/python2.7/dist-packages/caffe2/python/onnx/frontend.py", line 332, in caffe2_net_to_onnx_model
    model = make_model(cls.caffe2_net_to_onnx_graph(*args, **kwargs),
  File "/usr/local/lib/python2.7/dist-packages/caffe2/python/onnx/frontend.py", line 221, in caffe2_net_to_onnx_graph
    inputs)
  File "/usr/local/lib/python2.7/dist-packages/caffe2/python/onnx/helper.py", line 62, in c2_native_run_net
    ws.RunNetOnce(predict_net)
  File "/usr/local/lib/python2.7/dist-packages/caffe2/python/onnx/workspace.py", line 63, in f
    return getattr(workspace, attr)(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/caffe2/python/workspace.py", line 199, in RunNetOnce
    StringifyProto(net),
  File "/usr/local/lib/python2.7/dist-packages/caffe2/python/workspace.py", line 178, in CallWithExceptionIntercept
    return func(*args, **kwargs)
RuntimeError: [enforce fail at operator.cc:185] op. Cannot create operator of type 'BatchPermutation' on the device 'CPU'. Verify that implementation for the corresponding device exist. It might also happen if the binary is not linked with the operator implementation code. If Python frontend is used it might happen if dyndep.InitOpsLibrary call is missing. Operator def: input: "roi_feat_shuffled_1" input: "rois_idx_restore_int32_1" output: "roi_feat_1" name: "" type: "BatchPermutation" device_option { } engine: "" 

Does it mean that BatchPermutation can't be found in caffe2? what should I do? thanks!

@tomas-wood
Copy link

Hey I'm running

python detectron/tools/convert_pkl_to_pb.py --out_dir /app/out --cfg /app/detectron/configs/12_2017_baselines/e2e_faster_rcnn_R-50-FPN_1x.yaml --device cpu  TEST.WEIGHTS model_final.pkl

and getting the following error.

Traceback (most recent call last):
  File "detectron/tools/convert_pkl_to_pb.py", line 654, in <module>
    main()
  File "detectron/tools/convert_pkl_to_pb.py", line 612, in main
    model, blobs = load_model(args)
  File "detectron/tools/convert_pkl_to_pb.py", line 420, in load_model
    model = test_engine.initialize_model_from_cfg(cfg.TEST.WEIGHTS)
  File "/app/detectron/detectron/core/test_engine.py", line 330, in initialize_model_from_cfg
    model, weights_file, gpu_id=gpu_id,
  File "/app/detectron/detectron/utils/net.py", line 112, in initialize_gpu_from_weights_file
    src_blobs[src_name].astype(np.float32, copy=False))
  File "/app/pytorch/build/caffe2/python/workspace.py", line 317, in FeedBlob
    return _Workspace_feed_blob(ws, name, arr, device_option)
  File "/app/pytorch/build/caffe2/python/workspace.py", line 654, in _Workspace_feed_blob
    return ws.create_blob(name).feed(arr, device_option)
  File "/app/pytorch/build/caffe2/python/workspace.py", line 676, in _Blob_feed
    return blob._feed(arg, device_option)
RuntimeError: [enforce fail at pybind_state.cc:348] feeder. Unknown device type encountered in FeedBlob.

I've built caffe2 with CPU-only support. Is this going to be a deal breaker? Should I fire up the GPU version and convert to PB with that? Looks like one needs GPU support to convert from pkl to pb. Is this assumption I'm making correct?

@wytcsuch
Copy link

Hi,I have converted .pkl model to .pb model under ubuntu16.04, and I want to use .pb in c++ windows.Do I need to install caffe2 under windows according to the tutorial https://caffe2.ai/docs/get-start.html? Platform=windows&configuration=compile?
@HappyKerry @daquexian @dongmingsun

@satyajithj
Copy link

@lilichu

Does it mean that BatchPermutation can't be found in caffe2? what should I do? thanks!

Refer to this comment.

@gadcam gadcam closed this as completed Sep 22, 2019
@satyajithj
Copy link

Is this finished?

@gadcam
Copy link
Contributor Author

gadcam commented Sep 28, 2019

Hello @fuzzyBatman,

To be honest I do not know what is the current state of the Detectron.
I closed this issue because I felt it was not useful any more as it did not get enough attention in the last months.

@songwellxie
Copy link

Hey I'm running

python detectron/tools/convert_pkl_to_pb.py --out_dir /app/out --cfg /app/detectron/configs/12_2017_baselines/e2e_faster_rcnn_R-50-FPN_1x.yaml --device cpu  TEST.WEIGHTS model_final.pkl

and getting the following error.

Traceback (most recent call last):
  File "detectron/tools/convert_pkl_to_pb.py", line 654, in <module>
    main()
  File "detectron/tools/convert_pkl_to_pb.py", line 612, in main
    model, blobs = load_model(args)
  File "detectron/tools/convert_pkl_to_pb.py", line 420, in load_model
    model = test_engine.initialize_model_from_cfg(cfg.TEST.WEIGHTS)
  File "/app/detectron/detectron/core/test_engine.py", line 330, in initialize_model_from_cfg
    model, weights_file, gpu_id=gpu_id,
  File "/app/detectron/detectron/utils/net.py", line 112, in initialize_gpu_from_weights_file
    src_blobs[src_name].astype(np.float32, copy=False))
  File "/app/pytorch/build/caffe2/python/workspace.py", line 317, in FeedBlob
    return _Workspace_feed_blob(ws, name, arr, device_option)
  File "/app/pytorch/build/caffe2/python/workspace.py", line 654, in _Workspace_feed_blob
    return ws.create_blob(name).feed(arr, device_option)
  File "/app/pytorch/build/caffe2/python/workspace.py", line 676, in _Blob_feed
    return blob._feed(arg, device_option)
RuntimeError: [enforce fail at pybind_state.cc:348] feeder. Unknown device type encountered in FeedBlob.

I've built caffe2 with CPU-only support. Is this going to be a deal breaker? Should I fire up the GPU version and convert to PB with that? Looks like one needs GPU support to convert from pkl to pb. Is this assumption I'm making correct?

Hello, did anyone come across this error? I did when I tried to run CPU-only C3D extraction.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests