MiniCPM-V_2_6_awq_int4显存占用20G #7

smilebetterworld · 2024-11-17T15:04:11Z

git clone https://www.modelscope.cn/models/linglingdan/MiniCPM-V_2_6_awq_int4
用这个量化后的INT4模型推理，显存占用大概20G，和fp模型显存占用情况基本一样，请教下是不是量化存在问题？

smilebetterworld · 2024-11-17T15:15:18Z

推理代码为：
from PIL import Image
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
import time
import GPUtil

图像文件路径列表

IMAGES = [
"/data/1666770191808_crop_0.jpg", # 本地图片路径
]

模型名称或路径

MODEL_NAME = "/data/MiniCPM-V_2_6_awq_int4"

打开并转换图像

image = Image.open(IMAGES[0]).convert("RGB")

初始化分词器

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)

初始化语言模型

llm = LLM(model=MODEL_NAME,
gpu_memory_utilization=1, # 使用全部GPU内存
trust_remote_code=True,
max_model_len=2048) # 根据内存状况可调整此值

构建对话消息

question = "extract only raw text from the given image.Don't add any information or commentary."
messages = [{'role': 'user', 'content': '(./)\n' + question}]

应用对话模板到消息

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

设置停止符ID

stop_tokens = ['<|im_end|>', '<|endoftext|>']
stop_token_ids = [tokenizer.convert_tokens_to_ids(i) for i in stop_tokens]

设置生成参数

sampling_params = SamplingParams(
stop_token_ids=stop_token_ids,
max_tokens=1024,
temperature=0,
best_of=1)

st = time.time()

获取模型输出

outputs = llm.generate({
"prompt": prompt,
"multi_modal_data": {
"image": image
}
}, sampling_params=sampling_params)

latency = time.time() - st
gpus = GPUtil.getGPUs()
for gpu in gpus:
print(f"GPU ID: {gpu.id}, GPU负载: {round(gpu.load*100,2)}%, Memory Total: {gpu.memoryTotal}MB, 显存占用: {gpu.memoryUsed}MB, Memory Free: {gpu.memoryFree}MB")
print('latency is {} seconds'.format(latency))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MiniCPM-V_2_6_awq_int4显存占用20G #7

MiniCPM-V_2_6_awq_int4显存占用20G #7

smilebetterworld commented Nov 17, 2024

smilebetterworld commented Nov 17, 2024

MiniCPM-V_2_6_awq_int4显存占用20G #7

MiniCPM-V_2_6_awq_int4显存占用20G #7

Comments

smilebetterworld commented Nov 17, 2024

smilebetterworld commented Nov 17, 2024

图像文件路径列表

模型名称或路径

打开并转换图像

初始化分词器

初始化语言模型

构建对话消息

应用对话模板到消息

设置停止符ID

设置生成参数

获取模型输出