Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post quantization does not utilize GPU #454

Open
ek9852 opened this issue Jul 8, 2020 · 3 comments
Open

Post quantization does not utilize GPU #454

ek9852 opened this issue Jul 8, 2020 · 3 comments
Labels
feature request feature request

Comments

@ek9852
Copy link

ek9852 commented Jul 8, 2020

System information

  • TensorFlow version (you are using): 2.2.0
  • Are you willing to contribute it (Yes/No): No

Motivation
During post quantization , the GPU is idle (confirmed via nvidia-smi ), i.e. the post quantization is not using GPU to speed things up. It is very slow. It takes > 60 min to run on a server grade xeon (for test set of 2336 on our model):

converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
def representative_dataset_gen():
  with tf.io.gfile.GFile(test_set, 'r') as f:
    test_list = f.readlines()
  for i in test_list:
    # Get sample input data as a numpy array
    with Image.open(os.path.join(datasetdir,  i).split()[0]) as img:
        yield [np.array(img).reshape(1,120,160,1).astype(np.float32)/255.0]
converter.representative_dataset = representative_dataset_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8  # or tf.int8
converter.inference_output_type = tf.uint8  # or tf.int8
tflite_quant_model = converter.convert()

Describe the feature
post quantization should utilize GPU to speed things up.

@ek9852 ek9852 added the feature request feature request label Jul 8, 2020
@miaout17
Copy link

miaout17 commented Jul 9, 2020

A few initial questions:

  • For "test set of 2336 on our model", does it mean 2336 images are used as representative dataset?
  • Do you know how much time does it take to invoke the model?

I don't think the post training quantization tool supports GPU but I'm not the expert.
I'll let @suharshs follow from here.

@ek9852
Copy link
Author

ek9852 commented Jul 9, 2020

For "test set of 2336 on our model", does it mean 2336 images are used as representative dataset?
Yes

Do you know how much time does it take to invoke the model?
0.3 sec on a Google Coral edge tpu. It should be much faster on my TITAN X nvidia gpu,
But post-quantization does not use GPU currently.

@suharshs
Copy link
Contributor

suharshs commented Jul 9, 2020

TensorFlow Lite doesn't currently support non-mobile GPU kernels, and the post-training quantization tool is specific to TensorFlow Lite at the moment. As we work to unify TensorFlow and TensorFlow Lite we will keep this in mind. I will keep this issue open to give you updates as they come.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request feature request
Projects
None yet
Development

No branches or pull requests

3 participants