You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Motivation
During post quantization , the GPU is idle (confirmed via nvidia-smi ), i.e. the post quantization is not using GPU to speed things up. It is very slow. It takes > 60 min to run on a server grade xeon (for test set of 2336 on our model):
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
def representative_dataset_gen():
with tf.io.gfile.GFile(test_set, 'r') as f:
test_list = f.readlines()
for i in test_list:
# Get sample input data as a numpy array
with Image.open(os.path.join(datasetdir, i).split()[0]) as img:
yield [np.array(img).reshape(1,120,160,1).astype(np.float32)/255.0]
converter.representative_dataset = representative_dataset_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8 # or tf.int8
converter.inference_output_type = tf.uint8 # or tf.int8
tflite_quant_model = converter.convert()
Describe the feature
post quantization should utilize GPU to speed things up.
The text was updated successfully, but these errors were encountered:
For "test set of 2336 on our model", does it mean 2336 images are used as representative dataset?
Yes
Do you know how much time does it take to invoke the model?
0.3 sec on a Google Coral edge tpu. It should be much faster on my TITAN X nvidia gpu,
But post-quantization does not use GPU currently.
TensorFlow Lite doesn't currently support non-mobile GPU kernels, and the post-training quantization tool is specific to TensorFlow Lite at the moment. As we work to unify TensorFlow and TensorFlow Lite we will keep this in mind. I will keep this issue open to give you updates as they come.
System information
Motivation
During post quantization , the GPU is idle (confirmed via nvidia-smi ), i.e. the post quantization is not using GPU to speed things up. It is very slow. It takes > 60 min to run on a server grade xeon (for test set of 2336 on our model):
Describe the feature
post quantization should utilize GPU to speed things up.
The text was updated successfully, but these errors were encountered: