Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FSDP + lm_head + Liger-Kernel #5941

Open
1 task done
gotzmann opened this issue Nov 5, 2024 · 0 comments
Open
1 task done

FSDP + lm_head + Liger-Kernel #5941

gotzmann opened this issue Nov 5, 2024 · 0 comments
Labels
pending This problem is yet to be addressed

Comments

@gotzmann
Copy link

gotzmann commented Nov 5, 2024

Reminder

  • I have read the README and searched the existing issues.

System Info

stage: sft
do_train: true
finetuning_type: lora
lora_target: all
additional_target: embed_tokens,lm_head
lora_rank: 128
lora_alpha: 16
lora_dropout: 0.1
use_rslora: true

Reproduction

[rank3]:   File "/home/git/Liger-Kernel/src/liger_kernel/ops/fused_linear_cross_entropy.py", line 59, in fused_linear_cross_entropy_forward
[rank3]:     logits_chunk = _input_chunk @ weight.t()  # chunk_size x V
[rank3]: RuntimeError: size mismatch, got input (2), mat (2x4096), vec (65667072)

Expected behavior

No response

Others

Please check out discussion on problems with [ lm_head ] fine-tuning:

linkedin/Liger-Kernel#332

@github-actions github-actions bot added the pending This problem is yet to be addressed label Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

1 participant