Add docs for StreamedResponse

jackmpcollins · jackmpcollins · Nov 30, 2024 · Nov 29, 2024 · Nov 30, 2024 · Nov 30, 2024
commit 36ecd8520d600d9bb47de7c63fe302e5db9be71e
diff --git a/docs/streaming.md b/docs/streaming.md
@@ -79,3 +79,47 @@ for hero in create_superhero_team("The Food Dudes"):
 # 4.03s : name='Captain Carrot' age=35 power='Super strength and agility from eating carrots' enemies=['The Sugar Squad', 'The Greasy Gang']
 # 6.05s : name='Ice Cream Girl' age=25 power='Can create ice cream out of thin air' enemies=['The Hot Sauce Squad', 'The Healthy Eaters']
 ```
+
+## StreamedResponse
+
+Some LLMs have the ability to generate text output and make tool calls in the same response. This allows them to perform chain-of-thought reasoning or provide additional context to the user. In magentic, the `StreamedResponse` (or `AsyncStreamedResponse`) class can be used to request this type of output. This object is an iterable of `StreamedStr` (or `AsyncStreamedStr`) and `FunctionCall` instances.
+
+!!! warning "Consuming StreamedStr"
+
+    The StreamedStr object must be iterated over before the next item in the `StreamedResponse` is processed, otherwise the string output will be lost. This is because the `StreamedResponse` and `StreamedStr` share the same underlying generator, so advancing the `StreamedResponse` iterator skips over the `StreamedStr` items. The `StreamedStr` object has internal caching so after iterating over it once the chunks will remain available.
+
+In the example below, we request that the LLM generates a greeting and then calls a function to get the weather for two cities. The `StreamedResponse` object is then iterated over to print the output, and the `StreamedStr` and `FunctionCall` items are processed separately.
+
+```python
+from magentic import prompt, FunctionCall, StreamedResponse, StreamedStr
+
+
+def get_weather(city: str) -> str:
+    return f"The weather in {city} is 20°C."
+
+
+@prompt(
+    "Say hello, then get the weather for: {cities}",
+    functions=[get_weather],
+)
+def describe_weather(cities: list[str]) -> StreamedResponse: ...
+
+
+response = describe_weather(["Cape Town", "San Francisco"])
+for item in response:
+    if isinstance(item, StreamedStr):
+        for chunk in item:
+            # print the chunks as they are received
+            print(chunk, sep="", end="")
+        print()
+    if isinstance(item, FunctionCall):
+        # print the function call, then call it and print the result
+        print(item)
+        print(item())
+
+# Hello! I'll get the weather for Cape Town and San Francisco for you.
+# FunctionCall(<function get_weather at 0x1109825c0>, 'Cape Town')
+# The weather in Cape Town is 20°C.
+# FunctionCall(<function get_weather at 0x1109825c0>, 'San Francisco')
+# The weather in San Francisco is 20°C.
+```
diff --git a/docs/structured-outputs.md b/docs/structured-outputs.md
@@ -148,6 +148,10 @@ print(hero_defeated)
 
 ## Chain-of-Thought Prompting
 
+!!! warning "StreamedResponse"
+
+    It is now recommended to use `StreamedResponse` for chain-of-thought prompting, as this uses the LLM provider's native chain-of-thought capabilities. See [StreamedResponse](streaming.md#StreamedResponse) for more information.
+
 Using a simple Python type as the return annotation might result in poor results as the LLM has no time to arrange its thoughts before answering. To allow the LLM to work through this "chain of thought" you can instead return a pydantic model with initial fields for explaining the final response.
 
 ```python hl_lines="5-9 20"