Skip to content

Commit

Permalink
Merge pull request #38 from gradion-ai/wip-gemini-sdk-upgrade
Browse files Browse the repository at this point in the history
Upgrade to latest Gemini 2.0 Flash models
  • Loading branch information
krasserm authored Feb 7, 2025
2 parents 5013be5 + a027cf6 commit 0b13d5a
Show file tree
Hide file tree
Showing 17 changed files with 51 additions and 120 deletions.
8 changes: 1 addition & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,11 @@

A lightweight library for code-action based agents.

## Contents

- [Introduction](#introduction)
- [Key capabilities](#key-capabilities)
- [Quickstart](#quickstart)
- [Evaluation](#evaluation)
- [Supported models](#supported-models)
- [Supported models](https://gradion-ai.github.io/freeact/models/)

## Introduction

Expand Down Expand Up @@ -115,7 +113,3 @@ When comparing our results with smolagents using Claude 3.5 Sonnet on [m-ric/age
[<img src="docs/eval/eval-plot-comparison.png" alt="Performance comparison" width="60%">](docs/eval/eval-plot-comparison.png)

Interestingly, these results were achieved using zero-shot prompting in `freeact`, while the smolagents implementation utilizes few-shot prompting. You can find all evaluation details [here](evaluation).

## Supported models

In addition to all [supported models](https://gradion-ai.github.io/freeact/models/), `freeact` also supports the [integration](https://gradion-ai.github.io/freeact/integration/) of new models from any provider that is compatible with the [OpenAI Python SDK](https://github.com/openai/openai-python), including open models deployed locally with [ollama](https://ollama.com/) or [TGI](https://huggingface.co/docs/text-generation-inference/index), for example.
2 changes: 1 addition & 1 deletion docker/dependencies-basic.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
freeact-skills = {version = "0.0.7", extras = ["search-google", "search-perplexity"]}
freeact-skills = {version = "0.0.8", extras = ["search-google", "search-perplexity"]}
matplotlib = "^3.10"
numpy = "^2.2"
pandas = "^2.2"
Expand Down
2 changes: 1 addition & 1 deletion docker/dependencies-eval.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
freeact-skills = {version = "^0.0.7", extras = ["all"]}
freeact-skills = {version = "^0.0.8", extras = ["all"]}
requests = "^2.32.0"
markdownify = "^0.14.1"
matplotlib = "^3.10.0"
Expand Down
2 changes: 1 addition & 1 deletion docker/dependencies-example.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
freeact-skills = {version = "0.0.7", extras = ["all"]}
freeact-skills = {version = "0.0.8", extras = ["all"]}
matplotlib = "^3.10"
numpy = "^2.2"
pandas = "^2.2"
Expand Down
2 changes: 1 addition & 1 deletion docker/dependencies-minimal.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
freeact-skills = {version = "0.0.7", extras = ["search-google", "search-perplexity"]}
freeact-skills = {version = "0.0.8", extras = ["search-google", "search-perplexity"]}
20 changes: 16 additions & 4 deletions docs/models.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,14 @@ For the following models, `freeact` provides model-specific prompt templates.
|-----------------------------|------------|-----------|--------------|
| Claude 3.5 Sonnet | 2024-10-22 || optimized |
| Claude 3.5 Haiku | 2024-10-22 || optimized |
| Gemini 2.0 Flash | 2024-12-11 || draft |
| Gemini 2.0 Flash | 2024-02-05 |[^1] | draft |
| Gemini 2.0 Flash Thinking | 2024-02-05 || experimental |
| Qwen 2.5 Coder 32B Instruct | || draft |
| DeepSeek V3 | || draft |
| DeepSeek R1[^1] | || experimental |
| DeepSeek R1[^2] | || experimental |

[^1]: DeepSeek R1 wasn't trained on agentic tool use but demonstrates strong performance with code actions, even surpassing Claude 3.5 Sonnet on the GAIA subset in our [evaluation](evaluation.md). However, its token usage for reasoning remains significantly higher than other models, making it impractical for everyday use yet.
[^1]: We evaluated Gemini 2.0 Flash Experimental (`gemini-2.0-flash-exp`), released on 2024-12-11.
[^2]: DeepSeek R1 wasn't trained on agentic tool use but demonstrates strong performance with code actions, even surpassing Claude 3.5 Sonnet on the GAIA subset in our [evaluation](evaluation.md). See [this article](https://krasserm.github.io/2025/02/05/deepseek-r1-agent/) for further details.

!!! Info

Expand Down Expand Up @@ -56,7 +58,17 @@ python -m freeact.cli \

```bash
python -m freeact.cli \
--model-name=gemini-2.0-flash-exp \
--model-name=gemini-2.0-flash \
--ipybox-tag=ghcr.io/gradion-ai/ipybox:basic \
--skill-modules=freeact_skills.search.google.stream.api \
--api-key=$GOOGLE_API_KEY
```

### Gemini 2.0 Flash Thinking

```bash
python -m freeact.cli \
--model-name=gemini-2.0-flash-thinking-exp \
--ipybox-tag=ghcr.io/gradion-ai/ipybox:basic \
--skill-modules=freeact_skills.search.google.stream.api \
--api-key=$GOOGLE_API_KEY
Expand Down
2 changes: 1 addition & 1 deletion freeact/cli/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ def main(
system_extension: Annotated[Path | None, typer.Option(help="Path to a system extension file")] = None,
log_file: Annotated[Path, typer.Option(help="Path to the log file")] = Path("logs", "agent.log"),
temperature: Annotated[float, typer.Option(help="Temperature for generating model responses")] = 0.0,
max_tokens: Annotated[int, typer.Option(help="Maximum number of tokens for each model response")] = 4096,
max_tokens: Annotated[int, typer.Option(help="Maximum number of tokens for each model response")] = 8192,
show_token_usage: Annotated[bool, typer.Option(help="Include token usage data in responses")] = False,
record_conversation: Annotated[bool, typer.Option(help="Record conversation as SVG file")] = False,
record_path: Annotated[Path, typer.Option(help="Path to the SVG file")] = Path("conversation.svg"),
Expand Down
2 changes: 1 addition & 1 deletion freeact/examples/commands.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ python -m freeact.cli \

# --8<-- [start:cli-basics-gemini]
python -m freeact.cli \
--model-name=gemini-2.0-flash-exp \
--model-name=gemini-2.0-flash \
--ipybox-tag=ghcr.io/gradion-ai/ipybox:example \
--executor-key=example \
--skill-modules=freeact_skills.search.google.stream.api
Expand Down
68 changes: 18 additions & 50 deletions freeact/model/gemini/model/chat.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,12 @@
from google.genai.types import GenerateContentConfig, ThinkingConfig

from freeact.model.base import CodeActModel, CodeActModelResponse, CodeActModelTurn, StreamRetry
from freeact.model.gemini.prompt import default, thinking
from freeact.model.gemini.prompt import EXECUTION_ERROR_TEMPLATE, EXECUTION_OUTPUT_TEMPLATE, SYSTEM_TEMPLATE

GeminiModelName = Literal[
"gemini-2.0-flash-exp",
"gemini-2.0-flash",
"gemini-2.0-flash-001",
"gemini-2.0-flash-lite-preview-02-05" "gemini-2.0-flash-exp",
"gemini-2.0-flash-thinking-exp",
"gemini-2.0-flash-thinking-exp-01-21",
]
Expand Down Expand Up @@ -39,7 +41,7 @@ def code(self) -> str | None:

@staticmethod
def _extract_code_blocks(text: str):
pattern = r"```(?:python|tool_code)\s*(.*?)(?:\s*```|\s*$)"
pattern = r"```(?:python|tool_code|tool)\s*(.*?)(?:\s*```|\s*$)"
return re.findall(pattern, text, re.DOTALL)


Expand All @@ -48,7 +50,6 @@ def __init__(self, chat: AsyncChat, message: str):
self.chat = chat
self.message = message

self._thoughts: str = ""
self._response: str = ""
self._stream_consumed = False

Expand All @@ -57,10 +58,10 @@ async def response(self) -> GeminiResponse:
async for _ in self.stream():
pass
# TODO: include token usage data into response object
return GeminiResponse(text=self._response, thoughts=self._thoughts, is_error=False)
return GeminiResponse(text=self._response, is_error=False)

async def stream(self, emit_retry: bool = False) -> AsyncIterator[str | StreamRetry]:
async for chunk in self.chat.send_message_stream(self.message):
async for chunk in await self.chat.send_message_stream(self.message):
text = chunk.text
if text is not None:
yield text
Expand All @@ -69,27 +70,6 @@ async def stream(self, emit_retry: bool = False) -> AsyncIterator[str | StreamRe
self._stream_consumed = True


class GeminiThinkingTurn(GeminiTurn):
async def stream(self, emit_retry: bool = False) -> AsyncIterator[str | StreamRetry]:
thinking = True
yield "<thinking>\n"

async for chunk in self.chat.send_message_stream(self.message):
for part in chunk.candidates[0].content.parts:
text = part.text
if part.thought:
self._thoughts += text
yield text
else:
if thinking:
thinking = False
yield "\n</thinking>\n\n"
yield text
self._response += text

self._stream_consumed = True


class Gemini(CodeActModel):
"""A `CodeActModel` implementation based on Google's Gemini 2 chat API.
Expand All @@ -103,46 +83,34 @@ class Gemini(CodeActModel):

def __init__(
self,
model_name: GeminiModelName = "gemini-2.0-flash-exp",
model_name: GeminiModelName = "gemini-2.0-flash",
skill_sources: str | None = None,
temperature: float = 0.0,
max_tokens: int = 4096,
**kwargs,
):
self._model_name = model_name
self._client = genai.Client(http_options={"api_version": "v1alpha"}, **kwargs)
self._client = genai.Client(**kwargs, http_options={"api_version": "v1alpha"})
self._chat = self._client.aio.chats.create(
model=model_name,
config=GenerateContentConfig(
temperature=temperature,
max_output_tokens=max_tokens,
response_modalities=["TEXT"],
thinking_config=self.thinking_config,
system_instruction=self.system_template.format(python_modules=skill_sources or ""),
system_instruction=SYSTEM_TEMPLATE.format(python_modules=skill_sources or ""),
thinking_config=ThinkingConfig(include_thoughts=True) if self.thinking else None,
),
)

@property
def thinking(self) -> bool:
return "thinking" in self._model_name.lower()

def request(self, user_query: str, **kwargs) -> GeminiTurn:
return GeminiThinkingTurn(self._chat, user_query) if self.thinking else GeminiTurn(self._chat, user_query)
return GeminiTurn(self._chat, user_query)

def feedback(
self, feedback: str, is_error: bool, tool_use_id: str | None, tool_use_name: str | None, **kwargs
) -> GeminiTurn:
if self.thinking:
feedback_template = thinking.EXECUTION_ERROR_TEMPLATE if is_error else thinking.EXECUTION_OUTPUT_TEMPLATE
return GeminiThinkingTurn(self._chat, feedback_template.format(execution_feedback=feedback))
else:
feedback_template = default.EXECUTION_ERROR_TEMPLATE if is_error else default.EXECUTION_OUTPUT_TEMPLATE
return GeminiTurn(self._chat, feedback_template.format(execution_feedback=feedback))

@property
def system_template(self) -> str:
return thinking.SYSTEM_TEMPLATE if self.thinking else default.SYSTEM_TEMPLATE

@property
def thinking_config(self) -> ThinkingConfig | None:
return ThinkingConfig(include_thoughts=True) if self.thinking else None

@property
def thinking(self) -> bool:
return "thinking" in self._model_name.lower()
feedback_template = EXECUTION_ERROR_TEMPLATE if is_error else EXECUTION_OUTPUT_TEMPLATE
return GeminiTurn(self._chat, feedback_template.format(execution_feedback=feedback))
6 changes: 3 additions & 3 deletions freeact/model/gemini/model/live.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

from freeact.model.base import CodeActModel, CodeActModelTurn, StreamRetry
from freeact.model.gemini.model.chat import GeminiModelName, GeminiResponse
from freeact.model.gemini.prompt.default import EXECUTION_ERROR_TEMPLATE, EXECUTION_OUTPUT_TEMPLATE, SYSTEM_TEMPLATE
from freeact.model.gemini.prompt import EXECUTION_ERROR_TEMPLATE, EXECUTION_OUTPUT_TEMPLATE, SYSTEM_TEMPLATE


class GeminiLiveTurn(CodeActModelTurn):
Expand All @@ -30,7 +30,7 @@ async def stream(self, emit_retry: bool = False) -> AsyncIterator[str | StreamRe

@asynccontextmanager
async def GeminiLive(
model_name: GeminiModelName = "gemini-2.0-flash-exp",
model_name: GeminiModelName = "gemini-2.0-flash",
skill_sources: str | None = None,
temperature: float = 0.0,
max_tokens: int = 4096,
Expand All @@ -48,7 +48,7 @@ async def GeminiLive(
Example:
```python
async with GeminiLive(model_name="gemini-2.0-flash-exp", skill_sources=skill_sources) as model:
async with GeminiLive(model_name="gemini-2.0-flash", skill_sources=skill_sources) as model:
# use model with active session to Gemini 2 live API
agent = CodeActAgent(model=model, ...)
```
Expand Down
File renamed without changes.
Empty file.
42 changes: 0 additions & 42 deletions freeact/model/gemini/prompt/thinking.py

This file was deleted.

9 changes: 4 additions & 5 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ packages = [
aioconsole = "^0.8.1"
aiofiles = "^24.1"
anthropic = "^0.43.0"
google-genai = "^0.6.0"
google-genai = "^1.0"
ipybox = "^0.3.1"
openai = "^1.59"
prompt_toolkit = "^3.0"
Expand Down
2 changes: 1 addition & 1 deletion tests/integration/test_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ def gemini(skill_sources, request):
use_skill_sources = "skill_sources" in request.node.fixturenames # check if the test requires skill sources

return Gemini(
model_name="gemini-2.0-flash-exp",
model_name="gemini-2.0-flash",
skill_sources=skill_sources if use_skill_sources else None,
temperature=0.0,
max_tokens=1024,
Expand Down
2 changes: 1 addition & 1 deletion tests/integration/test_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ def gemini(skill_sources, request):
use_skill_sources = "skill_sources" in request.node.fixturenames # check if the test requires skill sources

return Gemini(
model_name="gemini-2.0-flash-exp",
model_name="gemini-2.0-flash",
skill_sources=skill_sources if use_skill_sources else None,
temperature=0.0,
max_tokens=1024,
Expand Down

0 comments on commit 0b13d5a

Please sign in to comment.