Chatglm2 rope optimization on xpu #9350

qiuxin2012 · 2023-11-03T08:23:14Z

Description

Speed up chat glm2 on xpu

1. Why the change?

Speed up chat glm2 on xpu with ipex

2. Summary of the change

use torch.ops.torch_ipex.apply_rotary_embedding instead of apply_rotary_pos_emb
(sin, cos)'s repeat_interleave in model's forward if xpu is used. (cpu unchanged)

python/llm/src/bigdl/llm/transformers/models/chatglm2.py

hkvision · 2023-11-06T03:24:55Z

python/llm/src/bigdl/llm/transformers/models/chatglm2.py

+    use_fuse_rope = input_ids.device.type == "xpu"
+    use_fuse_rope = use_fuse_rope and not self.training


we can combine this to if input_ids.device.type == "xpu" and not self.training

qiuxin2012 · 2023-11-06T03:56:50Z

Upstream output:

Once upon a time, there was a young girl named Samantha who lived with her parents in a small town. Samantha had always dreamed of traveling the world and experiencing new cultures. One day, she heard about a travel agency that offered guided tours to far-off lands.

The travel agency offered a variety

PR's output:

Once upon a time, there was a young girl named Samantha who lived with her parents in a small town. Samantha had always dreamed of traveling the world and experiencing new cultures. One day, she heard about a travel agency that offered guided tours to far-off lands.

The travel agency offered a variety

jason-dai · 2023-11-06T05:59:00Z

Need to add test similar to #9347

qiuxin2012 · 2023-11-14T01:53:47Z

Need to add test similar to #9347

Correctness test in #9450

qiuxin2012 added 13 commits October 27, 2023 10:05

rebase kai's code

b4c1ece

update

03ee5ba

update

c3f02c6

update

1a86e76

update

bfaf886

update

93aec0f

code cleanup

77be988

update

1fa88c0

chatglm2

9dc2601

fix

4fbe335

fix style

8ba2819

fix

90e82c6

fix

9b7f1db

qiuxin2012 requested a review from yangw1234 November 3, 2023 12:14

cleanup

41eee85

qiuxin2012 assigned qiuxin2012 and unassigned qiuxin2012 Nov 6, 2023

qiuxin2012 requested a review from hkvision November 6, 2023 02:34

hkvision reviewed Nov 6, 2023

View reviewed changes

python/llm/src/bigdl/llm/transformers/models/chatglm2.py Show resolved Hide resolved

qiuxin2012 added 2 commits November 6, 2023 11:17

update

3ea9e10

change comments

fc547d2

hkvision reviewed Nov 6, 2023

View reviewed changes

hkvision approved these changes Nov 6, 2023

View reviewed changes

hkvision mentioned this pull request Nov 6, 2023

Align rope of chatglm with llama #9103

Closed

fix style check

60a4349

qiuxin2012 merged commit 0cd751f into intel-analytics:main Nov 6, 2023
23 checks passed

liu-shaojun pushed a commit that referenced this pull request Mar 25, 2024

Chatglm2 rope optimization on xpu (#9350)

1420e45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chatglm2 rope optimization on xpu #9350

Chatglm2 rope optimization on xpu #9350

qiuxin2012 commented Nov 3, 2023 •

edited

Loading

hkvision Nov 6, 2023

qiuxin2012 commented Nov 6, 2023

jason-dai commented Nov 6, 2023

qiuxin2012 commented Nov 14, 2023

		use_fuse_rope = input_ids.device.type == "xpu"
		use_fuse_rope = use_fuse_rope and not self.training

Chatglm2 rope optimization on xpu #9350

Chatglm2 rope optimization on xpu #9350

Conversation

qiuxin2012 commented Nov 3, 2023 • edited Loading

Description

1. Why the change?

2. Summary of the change

hkvision Nov 6, 2023

Choose a reason for hiding this comment

qiuxin2012 commented Nov 6, 2023

jason-dai commented Nov 6, 2023

qiuxin2012 commented Nov 14, 2023

qiuxin2012 commented Nov 3, 2023 •

edited

Loading