Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mistral-demo $M7B_DIR issue #160

Open
chaima-bd opened this issue May 25, 2024 · 1 comment
Open

mistral-demo $M7B_DIR issue #160

chaima-bd opened this issue May 25, 2024 · 1 comment

Comments

@chaima-bd
Copy link

Hi I am using Google colab and when i run this command "mistral-demo $M7B_DIR" I use T4 GPU i got this error any solution for that plz

Traceback (most recent call last):
File "/usr/local/bin/mistral-demo", line 8, in
sys.exit(mistral_demo())
File "/usr/local/lib/python3.10/dist-packages/mistral_inference/main.py", line 183, in mistral_demo
fire.Fire(demo)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/mistral_inference/main.py", line 157, in demo
generated_tokens, _logprobs = generate(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/mistral_inference/generate.py", line 49, in generate
prelogits = model.forward(
File "/usr/local/lib/python3.10/dist-packages/mistral_inference/model.py", line 314, in forward
h = self.forward_partial(input_ids, seqlens, cache=cache)
File "/usr/local/lib/python3.10/dist-packages/mistral_inference/model.py", line 296, in forward_partial
h = layer(h, freqs_cis, cache_view)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/mistral_inference/model.py", line 188, in forward
r = self.attention.forward(self.attention_norm(x), freqs_cis, cache)
File "/usr/local/lib/python3.10/dist-packages/mistral_inference/model.py", line 128, in forward
output = memory_efficient_attention(
File "/usr/local/lib/python3.10/dist-packages/xformers/ops/fmha/init.py", line 268, in memory_efficient_attention
return _memory_efficient_attention(
File "/usr/local/lib/python3.10/dist-packages/xformers/ops/fmha/init.py", line 387, in _memory_efficient_attention
return _memory_efficient_attention_forward(
File "/usr/local/lib/python3.10/dist-packages/xformers/ops/fmha/init.py", line 403, in _memory_efficient_attention_forward
op = _dispatch_fw(inp, False)
File "/usr/local/lib/python3.10/dist-packages/xformers/ops/fmha/dispatch.py", line 125, in _dispatch_fw
return _run_priority_list(
File "/usr/local/lib/python3.10/dist-packages/xformers/ops/fmha/dispatch.py", line 65, in _run_priority_list
raise NotImplementedError(msg)
NotImplementedError: No operator found for memory_efficient_attention_forward with inputs:
query : shape=(1, 28, 32, 128) (torch.bfloat16)
key : shape=(1, 28, 32, 128) (torch.bfloat16)
value : shape=(1, 28, 32, 128) (torch.bfloat16)
attn_bias : <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalLocalAttentionMask'>
p : 0.0
decoderF is not supported because:
attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalLocalAttentionMask'>
bf16 is only supported on A100+ GPUs
[email protected] is not supported because:
requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old)
bf16 is only supported on A100+ GPUs
cutlassF is not supported because:
bf16 is only supported on A100+ GPUs
smallkF is not supported because:
max(query.shape[-1] != value.shape[-1]) > 32
dtype=torch.bfloat16 (supported: {torch.float32})
attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalLocalAttentionMask'>
bf16 is only supported on A100+ GPUs
unsupported embed per head: 128

@kartikWorld
Copy link

you need to have a NVIDIA A100 machine to fine tune. I faced this earlier with V100. After switching to A100, it worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants