Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor streaming and g-sampling FSM logics #239

Merged
merged 10 commits into from
Aug 14, 2024

Conversation

jeffreymeetkai
Copy link
Collaborator

@jeffreymeetkai jeffreymeetkai commented Aug 8, 2024

  • prompt_template.initialize_fsm_gen_state initializes prompt template's FSM gen state. Both streaming and gsampling will call this method at the start.
  • In streaming, the generator will repeatedly call prompt_template.stream_delta_text method that streams specific delta texts and updates the gen state at every iteration
  • In gsampling, monkey-patched async_llm_engine will call prompt_template.grammar_sample method that grammar samples the tokens and updates the gen state at every iteration
  • prompt_template.update_fsm_gen_state updates the gen_state.
  • If both gsampling is enabled and the request is a streaming request, grammar sampling will first create and maintain a FSM to produce the grammar-sampled token to yield to the wrapping generator. The wrapping streaming generator will also create and maintain a separate FSM to stream the grammar-sampled tokens.

gen_state["func_name"] = func_name
gen_state["func_index"] += 1
gen_state["call_id"] = prompt_utils.get_random_tool_call_id()
gen_state["first_time_func"] = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh why this is always True?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is True, it means we need to stream an empty chunk before streaming the chunks containing the function name and arguments. Thereafter, we will set this to False.

empty_response = prompt_utils.get_text_delta_response(
"", True, finish_reason
# Form the options for the following stages
options = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can create a function for getting options, this is kind of duplicate in function: stream_delta_text and grammar_sample

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure will do

@jeffreymeetkai jeffreymeetkai merged commit 2041dad into main Aug 14, 2024
3 checks passed
@jeffreymeetkai jeffreymeetkai deleted the combine-streaming-gsampling-fsm branch August 14, 2024 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants