half and mixed precision inference #442
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements half and mixed precision inference by adding the following boolean parameters to the
Pipeline
:half_precision_model
: Whether or not to use half precision model. If set toTrue
, the model will be cast to half precision on supported devices (torch.float16
oncuda
andtorch.bfloat16
oncpu
for now, followingtorch.get_autocast_dtype()
). This can reduce the memory consumption and improve the inference speed, but may lead to numerical instability.half_precision_ops
: Whether or not to use half precision operations. If set toTrue
, the model will be run with half precision operations via torch.autocast.Since
torch.float16
can not be natively converted tonumpy
, this PR addsfloat()
casting to several model outputs (probabilities) before calling.numpy()
. inunbatch_output()
of several taskmodules. Also, this fixes some spelling mistakes in thePipeline
documentation.TODO: