Refactor code according to upstream changes #62

tdoublep · 2025-01-17T13:38:50Z

This PR reworks our code according to some important upstream changes. In particular, there is no longer any need to have a separate SpyreExecutor and MultiprocessingSpyreExecutor. Upstream has added generic classes for this that work across different platforms. Acutally, it simplifies our code quite a lot.

The model runner classes now inherit from ModelRunnerBase and we need to define a ModelInputForSpyre class accordingly.

This is current passing all CPU tests, but needs to be tested on Spyre and needs careful review since it quite a big change.

Note: the target for this PR is a branch upstream-2025-01-17 containing upstream changes merged into our current branch. I've done it like this so it is easier to review the changes. If this PR is approved, we can then merge the changes into upstream-2025-01-17 and then merge that one into main.

github-actions · 2025-01-17T13:39:02Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: Thomas Parnell <[email protected]>

tdoublep · 2025-01-17T20:37:15Z

Still needs some work; moved back to draft.

### sequence level processing -> batch level processing In this PR the code for preparing the input tensors for the AIU is completely rewritten based on the assumption that we have to finish the current decoding on AIU before doing another prefill. Changes: * [rewriting](https://github.ibm.com/ai-foundation/vllm/pull/62/commits/cea122c220b18e3de3dce95faa5e03fe3efe0835) `sendnn_model_runner.py`, `sendnn_worker.py` and `sendnn.py` based on the above constraint. * [removing](https://github.ibm.com/ai-foundation/vllm/pull/62/commits/6869231d83734d3c03ffd15bc6754c1857d063cc) class variable `self._padded_batch_size` since other solution implemented * [removing](https://github.ibm.com/ai-foundation/vllm/pull/62/commits/ff9ebf6923fd9ac6c99e64dfffc7763f6c194399) the unused `input_block_ids` since AIU does not support paged attention yet. * [removing](https://github.ibm.com/ai-foundation/vllm/pull/62/commits/a6d63899bf3d9fae59edde414b8bd2a3c56bc8c7) some unused function arguments in model loading * [removing](https://github.ibm.com/ai-foundation/vllm/pull/62/commits/4527300ee9be4dd1fb76007fb6e0862b97d51676) unused function _get_model_architecture() and global variable `_SENDNN_SUPPORTED_MODELS` The code has been tested in client/server mode for the `llama 194m` and `granite 3b` on `AIU` and `CPU`.

Signed-off-by: Thomas Parnell <[email protected]>

maxdebayser · 2025-01-21T16:51:11Z

All the changes make sense to me. I'll run one of the embedding benchmarks to validate the embedding model branch of the changes.

maxdebayser · 2025-01-21T17:57:05Z

I'm getting the same results:

	SciFact	ArguAna	FiQA2018
sentence-transformers/all-MiniLM-L12-v2	0.59364	0.4772	0.31501

maxdebayser

LGTM

tdoublep added 8 commits January 17, 2025 21:02

Adapt SpyreWorker to some changes from upstream.

8dd2228

Signed-off-by: Thomas Parnell <[email protected]>

Working on CPU in TP

82bda5c

Signed-off-by: Thomas Parnell <[email protected]>

Resolve issue with TP=1

bef766f

Signed-off-by: Thomas Parnell <[email protected]>

Working on TP>1; there is a dynamic shape bug though

a0d55de

Signed-off-by: Thomas Parnell <[email protected]>

Dynamic shapes works too

20f5565

Signed-off-by: Thomas Parnell <[email protected]>

Set env vars in CI test

9f973d1

Signed-off-by: Thomas Parnell <[email protected]>

Remove debug prints

3209382

Signed-off-by: Thomas Parnell <[email protected]>

Remove unused code

3aa8196

Signed-off-by: Thomas Parnell <[email protected]>

tdoublep force-pushed the tpa-executor branch from 2e1758d to 3aa8196 Compare January 17, 2025 20:04

tdoublep marked this pull request as ready for review January 17, 2025 20:06

fmt

a3af854

Signed-off-by: Thomas Parnell <[email protected]>

tdoublep requested review from maxdebayser and jvlunteren January 17, 2025 20:10

tdoublep marked this pull request as draft January 17, 2025 20:36

tdoublep force-pushed the main branch from b9d944b to 2cd4d08 Compare January 20, 2025 10:33

tdoublep force-pushed the main branch from 2cd4d08 to 7e068b5 Compare January 20, 2025 10:35

tdoublep changed the base branch from main to upstream-2025-01-17 January 20, 2025 10:41

tdoublep added 3 commits January 21, 2025 12:45

Fix embedding model runner + fmt

5510021

Signed-off-by: Thomas Parnell <[email protected]>

Remove comment

6e8de27

Signed-off-by: Thomas Parnell <[email protected]>

Pass template argument

9703645

Signed-off-by: Thomas Parnell <[email protected]>

tdoublep marked this pull request as ready for review January 21, 2025 12:54

enable CI tests on every PR

124f3a9

Signed-off-by: Thomas Parnell <[email protected]>

tdoublep requested a review from yannicks1 January 21, 2025 14:28

maxdebayser approved these changes Jan 21, 2025

View reviewed changes

tdoublep requested a review from sducouedic January 21, 2025 21:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor code according to upstream changes #62

Refactor code according to upstream changes #62

tdoublep commented Jan 17, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 17, 2025

tdoublep commented Jan 17, 2025

maxdebayser commented Jan 21, 2025

maxdebayser commented Jan 21, 2025

maxdebayser left a comment

Refactor code according to upstream changes #62

Are you sure you want to change the base?

Refactor code according to upstream changes #62

Conversation

tdoublep commented Jan 17, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 17, 2025

tdoublep commented Jan 17, 2025

maxdebayser commented Jan 21, 2025

maxdebayser commented Jan 21, 2025

maxdebayser left a comment

Choose a reason for hiding this comment

tdoublep commented Jan 17, 2025 •

edited by github-actions bot

Loading