Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not inference the quantilized model in my device by int8 #15

Open
deepfind opened this issue Sep 26, 2021 · 1 comment
Open

Can not inference the quantilized model in my device by int8 #15

deepfind opened this issue Sep 26, 2021 · 1 comment

Comments

@deepfind
Copy link

👉 Please follow one of these issue templates 👈

Note: to keep the backlog clean and actionable, issues may be immediately closed if they do not follow one of the above issue templates.
I want to test the accuracy and time consumption of ibert in int8 inference, so I installed transformers and tried to quantize the Roberta-base model to generate the weights. I have set that quant_model is true and torch type is int8. However, the time consumption of ibert in int8 inference is similar to Roberta-base model in my 1080ti device. Is any problem with my config.json or my device? Here is my config.json:
{
"_name_or_path": "./outputs/checkpoint-1150/",
"architectures": [
"IBertForSequenceClassification"
],
"attention_probs_dropout_prob": 0.1,
"bos_token_id": 0,
"eos_token_id": 2,
"finetuning_task": "mrpc",
"force_dequant": "none",
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"id2label": {
"0": 0,
"1": 1
},
"initializer_range": 0.02,
"intermediate_size": 3072,
"label2id": {
"0": 0,
"1": 1
},
"layer_norm_eps": 1e-05,
"max_position_embeddings": 514,
"model_type": "ibert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 1,
"position_embedding_type": "absolute",
"quant_mode": true,
"tokenizer_class": "RobertaTokenizer",
"torch_dtype": "int8",
"transformers_version": "4.11.0.dev0",
"type_vocab_size": 1,
"vocab_size": 50265
}

@NASUAS
Copy link

NASUAS commented May 8, 2024

Can you tell me how you test the accuracy and time consumption of ibert in int8 inference? Would you mind sharing your test code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants