Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor code according to upstream changes #62

Open
wants to merge 13 commits into
base: upstream-2025-01-17
Choose a base branch
from

Conversation

tdoublep
Copy link
Member

@tdoublep tdoublep commented Jan 17, 2025

This PR reworks our code according to some important upstream changes. In particular, there is no longer any need to have a separate SpyreExecutor and MultiprocessingSpyreExecutor. Upstream has added generic classes for this that work across different platforms. Acutally, it simplifies our code quite a lot.

The model runner classes now inherit from ModelRunnerBase and we need to define a ModelInputForSpyre class accordingly.

This is current passing all CPU tests, but needs to be tested on Spyre and needs careful review since it quite a big change.

Note: the target for this PR is a branch upstream-2025-01-17 containing upstream changes merged into our current branch. I've done it like this so it is easier to review the changes. If this PR is approved, we can then merge the changes into upstream-2025-01-17 and then merge that one into main.

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
@tdoublep tdoublep marked this pull request as ready for review January 17, 2025 20:06
Signed-off-by: Thomas Parnell <[email protected]>
@tdoublep tdoublep marked this pull request as draft January 17, 2025 20:36
@tdoublep
Copy link
Member Author

Still needs some work; moved back to draft.

tdoublep pushed a commit that referenced this pull request Jan 20, 2025
### sequence level processing -> batch level processing
In this PR the code for preparing the input tensors for the AIU is
completely rewritten based on the assumption that we have to finish the
current decoding on AIU before doing another prefill.

Changes:
*
[rewriting](https://github.ibm.com/ai-foundation/vllm/pull/62/commits/cea122c220b18e3de3dce95faa5e03fe3efe0835)
`sendnn_model_runner.py`, `sendnn_worker.py` and `sendnn.py` based on
the above constraint.
*
[removing](https://github.ibm.com/ai-foundation/vllm/pull/62/commits/6869231d83734d3c03ffd15bc6754c1857d063cc)
class variable `self._padded_batch_size` since other solution
implemented
*
[removing](https://github.ibm.com/ai-foundation/vllm/pull/62/commits/ff9ebf6923fd9ac6c99e64dfffc7763f6c194399)
the unused `input_block_ids` since AIU does not support paged attention
yet.
*
[removing](https://github.ibm.com/ai-foundation/vllm/pull/62/commits/a6d63899bf3d9fae59edde414b8bd2a3c56bc8c7)
some unused function arguments in model loading
*
[removing](https://github.ibm.com/ai-foundation/vllm/pull/62/commits/4527300ee9be4dd1fb76007fb6e0862b97d51676)
unused function _get_model_architecture() and global variable
`_SENDNN_SUPPORTED_MODELS`

The code has been tested in client/server mode for the `llama 194m` and
`granite 3b` on `AIU` and `CPU`.
@tdoublep tdoublep changed the base branch from main to upstream-2025-01-17 January 20, 2025 10:41
Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
@tdoublep tdoublep marked this pull request as ready for review January 21, 2025 12:54
Signed-off-by: Thomas Parnell <[email protected]>
@tdoublep tdoublep requested a review from yannicks1 January 21, 2025 14:28
@maxdebayser
Copy link
Contributor

All the changes make sense to me. I'll run one of the embedding benchmarks to validate the embedding model branch of the changes.

@maxdebayser
Copy link
Contributor

I'm getting the same results:

SciFact ArguAna FiQA2018
sentence-transformers/all-MiniLM-L12-v2 0.59364 0.4772 0.31501

Copy link
Contributor

@maxdebayser maxdebayser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tdoublep tdoublep requested a review from sducouedic January 21, 2025 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants