Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLM: Refactor Pipeline-Parallel-FastAPI example #11319

Merged
merged 27 commits into from
Jun 25, 2024

Conversation

xiangyuT
Copy link
Contributor

Description

  • Use AutoModelForCausalLM.from_pretrained to load pipeline_divided model
  • Add /generate_stream endpoint
  • Support OpenAI-formatted API
  • Add Gradio WebUI
  • Add benchmark.py to do stream benchmark.

@glorysdj glorysdj requested a review from plusbang June 14, 2024 07:29
@xiangyuT xiangyuT marked this pull request as ready for review June 14, 2024 07:59
@xiangyuT xiangyuT changed the title [WIP] Refactor Pipeline-Parallel-FastAPI example LLM: Refactor Pipeline-Parallel-FastAPI example Jun 17, 2024
Copy link
Contributor

@plusbang plusbang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could merge it as the first step of refactoring PP serving. Will continue to organize code in next PR :)

@xiangyuT xiangyuT merged commit 8ddae22 into intel:main Jun 25, 2024
28 of 29 checks passed
RyuKosei pushed a commit to RyuKosei/ipex-llm that referenced this pull request Jul 19, 2024
Initially Refactor for Pipeline-Parallel-FastAPI example
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants