Fix Transformers seq length override #848

ahmeda14960 · 2025-01-02T16:34:06Z

Right now we are unable to override the base config loaded from transformers models when training. This normally isn't an issue but causes OOM when doing SFT fine-tuning and using Llama 3 8B base since the model config has a very large context length of 131K tokens.

This might not be the most elegant solution but has fixed the problem AFAIK.

dlwh

this is how it should be, thanks

ahmeda14960 added 5 commits December 10, 2024 15:47

overwrite pos for models loaded from hf

e108e8e

fix for sft misaligned position

b3cbaaf

Merge remote-tracking branch 'origin/main' into fix_hf_sft

866ae4d

Merge remote-tracking branch 'origin/main' into fix_hf_sft

9efa15b

Merge remote-tracking branch 'origin/main' into fix_hf_sft

e492517

ahmeda14960 requested a review from dlwh January 2, 2025 16:35

dlwh approved these changes Jan 4, 2025

View reviewed changes

dlwh merged commit 52fd92d into main Jan 4, 2025
8 checks passed

dlwh deleted the fix_hf_sft branch January 4, 2025 17:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Transformers seq length override #848

Fix Transformers seq length override #848

ahmeda14960 commented Jan 2, 2025

dlwh left a comment

Fix Transformers seq length override #848

Fix Transformers seq length override #848

Conversation

ahmeda14960 commented Jan 2, 2025

dlwh left a comment

Choose a reason for hiding this comment