Support pipeline parallel for glm-4-9b-chat #11463

plusbang · 2024-06-28T08:56:31Z

Description

Support pipeline parallel inference & serving for glm-4-9b-chat

2. User API changes

N/A

4. How to test?

Unit test (https://github.com/intel-analytics/ipex-llm-workflow/actions/runs/9772330989)
Local test (both serving and inference of glm4 is verified)

xiangyuT · 2024-07-02T02:59:11Z

python/llm/src/ipex_llm/transformers/models/chatglm4.py

        inputs_embeds = self.embedding(input_ids)
+    else:
+        batch_size, seq_length, _ = inputs_embeds.shape


If attention_mask is given a not-None and not-All value, input_ids is needed in line 58 self.get_masks() and it will raise error if input_ids is still None. Maybe add an empty tensor here?

input_ids = torch.empty((batch_size, seq_length), device=inputs_embeds.device)

Relative code in modeling_chatglm:

def get_masks(self, input_ids, past_key_values, padding_mask=None): batch_size, seq_length = input_ids.shape

If attention_mask is given a not-None and not-All value, input_ids is needed in line 58 self.get_masks() and it will raise error if input_ids is still None. Maybe add an empty tensor here?

Have updated in chatglm2.py and chatglm4.py.

xiangyuT

LGTM

plusbang force-pushed the support-glm4-pp branch from 714ea50 to e0f3be7 Compare June 28, 2024 09:10

plusbang changed the title ~~[WIP] Support pipeline parallel for glm-4-9b-chat~~ Support pipeline parallel for glm-4-9b-chat Jun 28, 2024

plusbang requested review from qiuxin2012 and glorysdj June 28, 2024 09:35

plusbang added 5 commits June 29, 2024 01:08

test

aa5a845

fix output

542a23c

update

e0f3be7

add serving support

67da763

fix code style

ae03661

glorysdj requested a review from xiangyuT July 2, 2024 02:45

xiangyuT reviewed Jul 2, 2024

View reviewed changes

plusbang requested a review from xiangyuT July 2, 2024 06:15

fix

2bf7bb3

xiangyuT approved these changes Jul 3, 2024

View reviewed changes

plusbang merged commit 9274282 into intel-analytics:main Jul 3, 2024
1 check passed

RyuKosei pushed a commit to RyuKosei/ipex-llm that referenced this pull request Jul 19, 2024

Support pipeline parallel for glm-4-9b-chat (intel-analytics#11463)

7f402ed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support pipeline parallel for glm-4-9b-chat #11463

Support pipeline parallel for glm-4-9b-chat #11463

plusbang commented Jun 28, 2024 •

edited

Loading

xiangyuT Jul 2, 2024

xiangyuT Jul 2, 2024

plusbang Jul 2, 2024 •

edited

Loading

xiangyuT left a comment

Support pipeline parallel for glm-4-9b-chat #11463

Support pipeline parallel for glm-4-9b-chat #11463

Conversation

plusbang commented Jun 28, 2024 • edited Loading

Description

2. User API changes

4. How to test?

xiangyuT Jul 2, 2024

Choose a reason for hiding this comment

xiangyuT Jul 2, 2024

Choose a reason for hiding this comment

plusbang Jul 2, 2024 • edited Loading

Choose a reason for hiding this comment

xiangyuT left a comment

Choose a reason for hiding this comment

plusbang commented Jun 28, 2024 •

edited

Loading

plusbang Jul 2, 2024 •

edited

Loading