Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add (Async)StreamedResponse for multi-part responses #383

Merged
merged 19 commits into from
Nov 30, 2024
Merged
Changes from 1 commit
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add docs for StreamedResponse
jackmpcollins committed Nov 30, 2024
commit 36ecd8520d600d9bb47de7c63fe302e5db9be71e
44 changes: 44 additions & 0 deletions docs/streaming.md
Original file line number Diff line number Diff line change
@@ -79,3 +79,47 @@ for hero in create_superhero_team("The Food Dudes"):
# 4.03s : name='Captain Carrot' age=35 power='Super strength and agility from eating carrots' enemies=['The Sugar Squad', 'The Greasy Gang']
# 6.05s : name='Ice Cream Girl' age=25 power='Can create ice cream out of thin air' enemies=['The Hot Sauce Squad', 'The Healthy Eaters']
```

## StreamedResponse

Some LLMs have the ability to generate text output and make tool calls in the same response. This allows them to perform chain-of-thought reasoning or provide additional context to the user. In magentic, the `StreamedResponse` (or `AsyncStreamedResponse`) class can be used to request this type of output. This object is an iterable of `StreamedStr` (or `AsyncStreamedStr`) and `FunctionCall` instances.

!!! warning "Consuming StreamedStr"

The StreamedStr object must be iterated over before the next item in the `StreamedResponse` is processed, otherwise the string output will be lost. This is because the `StreamedResponse` and `StreamedStr` share the same underlying generator, so advancing the `StreamedResponse` iterator skips over the `StreamedStr` items. The `StreamedStr` object has internal caching so after iterating over it once the chunks will remain available.

In the example below, we request that the LLM generates a greeting and then calls a function to get the weather for two cities. The `StreamedResponse` object is then iterated over to print the output, and the `StreamedStr` and `FunctionCall` items are processed separately.

```python
from magentic import prompt, FunctionCall, StreamedResponse, StreamedStr


def get_weather(city: str) -> str:
return f"The weather in {city} is 20°C."


@prompt(
"Say hello, then get the weather for: {cities}",
functions=[get_weather],
)
def describe_weather(cities: list[str]) -> StreamedResponse: ...


response = describe_weather(["Cape Town", "San Francisco"])
for item in response:
if isinstance(item, StreamedStr):
for chunk in item:
# print the chunks as they are received
print(chunk, sep="", end="")
print()
if isinstance(item, FunctionCall):
# print the function call, then call it and print the result
print(item)
print(item())

# Hello! I'll get the weather for Cape Town and San Francisco for you.
# FunctionCall(<function get_weather at 0x1109825c0>, 'Cape Town')
# The weather in Cape Town is 20°C.
# FunctionCall(<function get_weather at 0x1109825c0>, 'San Francisco')
# The weather in San Francisco is 20°C.
```
4 changes: 4 additions & 0 deletions docs/structured-outputs.md
Original file line number Diff line number Diff line change
@@ -148,6 +148,10 @@ print(hero_defeated)

## Chain-of-Thought Prompting

!!! warning "StreamedResponse"

It is now recommended to use `StreamedResponse` for chain-of-thought prompting, as this uses the LLM provider's native chain-of-thought capabilities. See [StreamedResponse](streaming.md#StreamedResponse) for more information.

Using a simple Python type as the return annotation might result in poor results as the LLM has no time to arrange its thoughts before answering. To allow the LLM to work through this "chain of thought" you can instead return a pydantic model with initial fields for explaining the final response.

```python hl_lines="5-9 20"