Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

错误:Unexpected MMA layout version found #149

Open
Chinesenovels opened this issue Apr 25, 2023 · 9 comments
Open

错误:Unexpected MMA layout version found #149

Chinesenovels opened this issue Apr 25, 2023 · 9 comments

Comments

@Chinesenovels
Copy link

python: /project/lib/Analysis/Utility.cpp:136: bool mlir::supportMMA(mlir::Value, int): Assertion `(version == 1 || version == 2) && "Unexpected MMA layout version found"' failed.

@mapledxf
Copy link

同问
1080 Ti

@ajz34
Copy link

ajz34 commented Apr 25, 2023

这里 Titan X 也遇到这个问题。
这是否是 int8/int4 使用 triton,且 triton 目前很可能不对 Pascal 或更老型号的显卡支持 8/4 位有关?
qwopqwop200/GPTQ-for-LLaMa#142
triton-lang/triton#1505 (comment)

@luokai0223
Copy link

p40 遇见同样问题,源自 triton ,matmul_248_kernel 函数执行到 c = accumulator.to(tl.float16) 报错,可能是计算架构太老了,似乎出类似问题的都是7.0以下架构的n卡,有办法处理吗

@lieh1203
Copy link

同问,量化版本的我已经运行起来了,但提问后就会报这个错误
image
我买的是GPU云服务
image

@slang98
Copy link

slang98 commented Apr 26, 2023

根据昨天的更新 https://github.com/openai/triton/pull/1505/files
python/triton/language/semantic.py中提到算力小于70的显卡都不支持Float8 and Float16

'''
if torch.version.hip is None:
device = triton.runtime.jit.get_current_device()
capability = triton.runtime.jit.get_device_capability(device)
capability = capability[0] * 10 + capability[1]
if capability < 70:
assert (
not rhs.dtype.is_fp16() and not rhs.dtype.is_fp8()
), "Float8 and Float16 types are not supported for compute capability < 70 (use Float32 or above)"
'''

P100 P40算力版本都是60+所以暂时只能使用Float32,但是显存又不够. 亟待解决?
NVDIA V100 NVIDIA TITAN V及其以上显卡可以支持.

@lieh1203
Copy link

根据昨天的更新 https://github.com/openai/triton/pull/1505/files python/triton/language/semantic.py中提到算力小于70的显卡都不支持Float8 and Float16

''' if torch.version.hip is None: device = triton.runtime.jit.get_current_device() capability = triton.runtime.jit.get_device_capability(device) capability = capability[0] * 10 + capability[1] if capability < 70: assert ( not rhs.dtype.is_fp16() and not rhs.dtype.is_fp8() ), "Float8 and Float16 types are not supported for compute capability < 70 (use Float32 or above)" '''

P100 P40算力版本都是60+所以暂时只能使用Float32,但是显存又不够. 亟待解决? NVDIA V100 NVIDIA TITAN V及其以上显卡可以支持.

你好,P100的修改成Float32,是这样改吗?
1682567413690
还是报错的,我显存16G,

@jhj8888
Copy link

jhj8888 commented Apr 27, 2023

根据昨天的更新 https://github.com/openai/triton/pull/1505/files python/triton/language/semantic.py中提到算力小于70的显卡都不支持Float8 and Float16
''' if torch.version.hip is None: device = triton.runtime.jit.get_current_device() capability = triton.runtime.jit.get_device_capability(device) capability = capability[0] * 10 + capability[1] if capability < 70: assert ( not rhs.dtype.is_fp16() and not rhs.dtype.is_fp8() ), "Float8 and Float16 types are not supported for compute capability < 70 (use Float32 or above)" '''
P100 P40算力版本都是60+所以暂时只能使用Float32,但是显存又不够. 亟待解决? NVDIA V100 NVIDIA TITAN V及其以上显卡可以支持.

你好,P100的修改成Float32,是这样改吗? 1682567413690 还是报错的,我显存16G,

在P100上遇到同样的问题,是不是MOSS不支持P100?

@slang98
Copy link

slang98 commented Apr 27, 2023

根据昨天的更新 https://github.com/openai/triton/pull/1505/files python/triton/language/semantic.py中提到算力小于70的显卡都不支持Float8 and Float16
''' if torch.version.hip is None: device = triton.runtime.jit.get_current_device() capability = triton.runtime.jit.get_device_capability(device) capability = capability[0] * 10 + capability[1] if capability < 70: assert ( not rhs.dtype.is_fp16() and not rhs.dtype.is_fp8() ), "Float8 and Float16 types are not supported for compute capability < 70 (use Float32 or above)" '''
P100 P40算力版本都是60+所以暂时只能使用Float32,但是显存又不够. 亟待解决? NVDIA V100 NVIDIA TITAN V及其以上显卡可以支持.

你好,P100的修改成Float32,是这样改吗? 1682567413690 还是报错的,我显存16G,

在P100上遇到同样的问题,是不是MOSS不支持P100?

今天测试:修改成float32, p100/40不是爆显存就是Unexpected MMA layout version found.

triton官网说对fp16量化模型支持不完善, p100/40等老显卡都会报如上的错. 需要等他们写入更多老显卡支持.

另外实测V100 32GB可以跑int4量化模型.

(https://github.com/OpenLMLab/MOSS/issues/%E5%8F%8CP100%E6%98%BE%E5%AD%98%E4%B8%8D%E5%A4%9F)

@slang98
Copy link

slang98 commented Apr 28, 2023

已发现解决方法:
根据大神最新提交#175 需要将triton换成auto-gptq,这样就绕过了triton验证.

单卡P40(24G)测试int4量化版本成功

具体方法如下:
git clone https://github.com/PanQiWei/AutoGPTQ
conda create -n moss python==3.10
cd MOSS
python setup_env.py --install_auto_gptq

修改MOSS\moss_cli_demo.py L31 将
'''
model = load_checkpoint_and_dispatch(
raw_model, model_path, device_map="auto", no_split_module_classes=["MossBlock"], dtype=torch.float16
)
'''
修改为
'''
model = MossForCausalLM.from_pretrained(model_path, trust_remote_code=True).half().cuda()
'''

python moss_cli_demo.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants