Enable `use_batch_forward` Optimization on Battlemage GPU #12516

liu-shaojun · 2024-12-09T07:59:57Z

Description:

Enable use_batch_forward Optimization on Intel® Arc™ B-Series Graphics Cards (Battlemage)

Currently, torch.xpu.get_device_name(0) returns Intel(R) Graphics [0xe20b] on Intel® Arc™ B-Series graphics cards (code-named Battlemage), unlike the Intel® Arc™ A-Series which returns more specific names such as Intel(R) Arc(TM) A770 Graphics.

This PR updates the device name matching logic to recognize Intel(R) Graphics [0xe20b] as an indicator for Battlemage GPUs, enabling the use_batch_forward optimization on these devices.

Changes:

Extend device name matching to include Intel(R) Graphics [0xe20b].

Testing:

Validate that the optimization is correctly enabled and functions as expected on B-Series (Battlemage) GPUs.
PR validation: https://github.com/intel-analytics/ipex-llm-workflow/actions/runs/12231715729

glorysdj · 2024-12-09T12:03:04Z

python/llm/src/ipex_llm/transformers/low_bit_linear.py

@@ -405,6 +405,7 @@ def use_batch_forward(x: torch.Tensor, qtype: int, output_len: int):
            or (device in ["arc", "flex"] and qtype in [SYM_INT8, FP4])
            or (device in ["arc", "flex", "mtl"] and qtype in [FP8E4])
            or (device in ["lnl"] and qtype in [SYM_INT4] and x.shape[1] % 512 == 0)
+            or (device in ["bmg"] and qtype in [SYM_INT4])


To support more inference and serving requirements on BMG, shall we also verify that FP8/INT8 can benefit from use_batch_forward. @jason-dai

I collected FP8 data on BMG using the all-in-one. Models larger than 7B cause the machine to crash, while models smaller than 7B show a 1-2% improvement in next-token performance after enabling the batch kernel.

Should we enable the batch kernel for qtype=FP8E5 as well?

glorysdj

LGTM

liu-shaojun added 2 commits December 9, 2024 15:33

Update get_xpu_device_type() to support bmg

3a72f5f

enable use_batch_forward for bmg

b696335

liu-shaojun changed the title ~~Update get_xpu_device_type() to support bmg~~ Enable use_batch_forward Optimization on Intel® Arc™ B-Series Graphics Cards (Battlemage) Dec 9, 2024

liu-shaojun changed the title ~~Enable use_batch_forward Optimization on Intel® Arc™ B-Series Graphics Cards (Battlemage)~~ Enable use_batch_forward Optimization on Battlemage GPU Dec 9, 2024

liu-shaojun requested a review from MeouSker77 December 9, 2024 08:18

liu-shaojun added 2 commits December 9, 2024 16:26

Update low_bit_linear.py

56247e6

Update utils.py

478fcdf

MeouSker77 approved these changes Dec 9, 2024

View reviewed changes

liu-shaojun marked this pull request as ready for review December 9, 2024 08:59

liu-shaojun requested review from jason-dai, glorysdj and Oscilloscope98 December 9, 2024 09:00

glorysdj reviewed Dec 9, 2024

View reviewed changes

use batch kernel for fp8e5

c0225e9

glorysdj approved these changes Dec 12, 2024

View reviewed changes

liu-shaojun merged commit 2cce896 into intel-analytics:main Dec 12, 2024
1 check passed

liu-shaojun deleted the bmg branch December 12, 2024 04:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable `use_batch_forward` Optimization on Battlemage GPU #12516

Enable `use_batch_forward` Optimization on Battlemage GPU #12516

liu-shaojun commented Dec 9, 2024 •

edited

Loading

glorysdj Dec 9, 2024

liu-shaojun Dec 12, 2024

glorysdj left a comment

Enable use_batch_forward Optimization on Battlemage GPU #12516

Enable use_batch_forward Optimization on Battlemage GPU #12516

Conversation

liu-shaojun commented Dec 9, 2024 • edited Loading

Description:

Changes:

Testing:

glorysdj Dec 9, 2024

Choose a reason for hiding this comment

liu-shaojun Dec 12, 2024

Choose a reason for hiding this comment

glorysdj left a comment

Choose a reason for hiding this comment

Enable `use_batch_forward` Optimization on Battlemage GPU #12516

Enable `use_batch_forward` Optimization on Battlemage GPU #12516

liu-shaojun commented Dec 9, 2024 •

edited

Loading