Fix 1383 Llama model on transformers=4.41[WIP] #11280

songhappy · 2024-06-11T23:04:43Z

Description

add llama_attention_forward_4_41, llama_model_forward_4_41

songhappy · 2024-06-18T23:46:53Z

Tested on Max1100 and documented LLama2-7B model metrics on issue-1383, performance metrics of transformers4.41 are similar to 4.38

sgwhat

Others LGTM

sgwhat · 2024-06-19T14:26:47Z

python/llm/src/ipex_llm/transformers/models/llama.py

            if cache_position is not None:
                # for transformers 4.38.0
-                causal_mask = attention_mask[:, :, cache_position, : kv_seq_len]


The reason to remove causal_mask here?

the code is compared to wrong place, did not touch for 4_38

sgwhat · 2024-06-19T14:30:51Z

python/llm/src/ipex_llm/transformers/models/llama.py

+
+    next_cache = next_decoder_cache if use_cache else None
+    if return_legacy_cache:
+        next_cache = next_cache.to_legacy_cache()


Need to double check if next_decoder_cache is DynamicFP8Cache.

update

cf41ed9

songhappy force-pushed the fix_1383 branch from 9522fb8 to cf41ed9 Compare June 11, 2024 23:11

update

9486c5b

songhappy requested a review from sgwhat June 13, 2024 05:17

songhappy added 5 commits June 12, 2024 22:34

update4

a1f10b5

style

6e9cac7

style

1b4ac18

style

9b22b86

update

f789ca3

sgwhat approved these changes Jun 19, 2024

View reviewed changes

songhappy added 3 commits June 20, 2024 17:33

comments

ee0f6c3

update

cd37c4e

update

a1e5aa1

songhappy merged commit 7507000 into intel-analytics:main Jun 21, 2024
17 of 18 checks passed

RyuKosei pushed a commit to RyuKosei/ipex-llm that referenced this pull request Jul 19, 2024

Fix 1383 Llama model on transformers=4.41[WIP] (intel-analytics#11280)

b370cef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix 1383 Llama model on transformers=4.41[WIP] #11280

Fix 1383 Llama model on transformers=4.41[WIP] #11280

songhappy commented Jun 11, 2024

songhappy commented Jun 18, 2024

sgwhat left a comment

sgwhat Jun 19, 2024

songhappy Jun 20, 2024

sgwhat Jun 19, 2024

Fix 1383 Llama model on transformers=4.41[WIP] #11280

Fix 1383 Llama model on transformers=4.41[WIP] #11280

Conversation

songhappy commented Jun 11, 2024

Description

songhappy commented Jun 18, 2024

sgwhat left a comment

Choose a reason for hiding this comment

sgwhat Jun 19, 2024

Choose a reason for hiding this comment

songhappy Jun 20, 2024

Choose a reason for hiding this comment

sgwhat Jun 19, 2024

Choose a reason for hiding this comment