Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated changelog #1490

Merged
merged 9 commits into from
Mar 7, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 13 additions & 13 deletions .github/workflows/documentation_codeblock_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,20 +24,20 @@ jobs:
# Get list of changed files in docs directory
if [[ "${{ github.event_name }}" == "pull_request" ]]; then
# For pull requests, compare with base branch
echo "paths=$(
git diff --name-only origin/${{ github.base_ref }} |
grep -E '^apps/opik-documentation/documentation/docs/.*\.(md|mdx)$' |
sed 's|apps/opik-documentation/documentation/||' |
jq -R -s -c 'split("\n")[:-1]'
)" >> $GITHUB_OUTPUT
CHANGED_FILES=$(git diff --name-only origin/${{ github.base_ref }} | grep -E '^apps/opik-documentation/documentation/fern/docs/.*\.(md|mdx)$' || true)
if [ -n "$CHANGED_FILES" ]; then
echo "paths=$(echo "$CHANGED_FILES" | sed 's|apps/opik-documentation/documentation/||' | jq -R -s 'split("\n")[:-1]' -c)" >> $GITHUB_OUTPUT
else
echo "paths=[]" >> $GITHUB_OUTPUT
fi
else
# For manual runs and scheduled runs, check all files
echo "paths=$(
(
ls -d docs/*/ 2>/dev/null;
find docs -maxdepth 1 -type f -name "*.md" -o -name "*.mdx"
) | jq -R -s -c 'split("\n")[:-1]'
)" >> $GITHUB_OUTPUT
# For manual runs, get all md/mdx files
FILES=$(find fern/docs -type f \( -name "*.md" -o -name "*.mdx" \))
if [ -n "$FILES" ]; then
echo "paths=$(echo "$FILES" | jq -R -s 'split("\n")[:-1]' -c)" >> $GITHUB_OUTPUT
else
echo "paths=[]" >> $GITHUB_OUTPUT
fi
fi

test:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet opik instructor"
"%pip install --upgrade --quiet opik instructor anthropic google-generativeai google-genai"
]
},
{
Expand Down
3 changes: 3 additions & 0 deletions apps/opik-documentation/documentation/fern/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,9 @@ navigation:
- page: Log traces
path: docs/tracing/log_traces.mdx
slug: log_traces
- page: Log conversations
path: docs/tracing/log_chat_conversations.mdx
slug: log_chat_conversations
- page: Log agents
path: docs/tracing/log_agents.mdx
slug: log_agents
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
**Opik Dashboard**:

- Chat conversations can now be reviewed in the platform

<Frame>
<img src="/img/changelog/2025-03-03/chat_conversations.png" />
</Frame>

- Added the ability to leave comments on experiments
- You can now leave reasons on feedback scores, see [Annotating Traces](/tracing/annotate_traces)
- Added support for Gemini in the playground
- A thumbs up / down feedback score definition is now added to all projects by default to make it easier
to annotate traces.

**JS / TS SDK**:

- The AnswerRelevanceMetric can now be run without providing a context field
- Made some updates to how metrics are uploaded to optimize data ingestion
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,9 @@ import os
from opik.integrations.anthropic import track_anthropic

anthropic_client = anthropic.Anthropic()
anthropic_client = track_anthropic(anthropic_client, project_name="anthropic-integration-demo")
anthropic_client = track_anthropic(
anthropic_client, project_name="anthropic-integration-demo"
)
```


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Opik integrates with Gemini to provide a simple way to log traces for all Gemini


```python
%pip install --upgrade opik google-generativeai litellm
%pip install --upgrade opik google-genai litellm
```


Expand All @@ -28,45 +28,34 @@ First, we will set up our OpenAI API keys.
```python
import os
import getpass
import google.generativeai as genai

if "GEMINI_API_KEY" not in os.environ:
genai.configure(api_key=getpass.getpass("Enter your Gemini API key: "))
if "GOOGLE_API_KEY" not in os.environ:
os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter your Gemini API key: ")
```

## Configure LiteLLM
## Logging traces

Add the LiteLLM OpikTracker to log traces and steps to Opik:
Now each completion will logs a separate trace to LiteLLM:


```python
import litellm
import os
from litellm.integrations.opik.opik import OpikLogger
from google import genai
from opik import track
from opik.opik_context import get_current_span_data
from opik.integrations.genai import track_genai

os.environ["OPIK_PROJECT_NAME"] = "gemini-integration-demo"
opik_logger = OpikLogger()
litellm.callbacks = [opik_logger]
```

## Logging traces

Now each completion will logs a separate trace to LiteLLM:

client = genai.Client()
gemini_client = track_genai(client)

```python
prompt = """
Write a short two sentence story about Opik.
"""

response = litellm.completion(
model="gemini/gemini-pro",
messages=[{"role": "user", "content": prompt}],
response = gemini_client.models.generate_content(
model="gemini-2.0-flash-001", contents=prompt
)

print(response.choices[0].message.content)
print(response.text)
```

The prompt and response messages are automatically logged to Opik and can be viewed in the UI.
Expand All @@ -81,31 +70,19 @@ If you have multiple steps in your LLM pipeline, you can use the `track` decorat
```python
@track
def generate_story(prompt):
response = litellm.completion(
model="gemini/gemini-pro",
messages=[{"role": "user", "content": prompt}],
metadata={
"opik": {
"current_span_data": get_current_span_data(),
},
},
response = gemini_client.models.generate_content(
model="gemini-2.0-flash-001", contents=prompt
)
return response.choices[0].message.content
return response.text


@track
def generate_topic():
prompt = "Generate a topic for a story about Opik."
response = litellm.completion(
model="gemini/gemini-pro",
messages=[{"role": "user", "content": prompt}],
metadata={
"opik": {
"current_span_data": get_current_span_data(),
},
},
response = gemini_client.models.generate_content(
model="gemini-2.0-flash-001", contents=prompt
)
return response.choices[0].message.content
return response.text


@track
Expand All @@ -121,5 +98,3 @@ generate_opik_story()
The trace can now be viewed in the UI:

![Gemini Cookbook](https://raw.githubusercontent.com/comet-ml/opik/main/apps/opik-documentation/documentation/fern/img/cookbook/gemini_trace_decorator_cookbook.png)


Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

```python
%pip install --upgrade --quiet opik instructor
%pip install --upgrade --quiet opik instructor anthropic google-generativeai google-genai
```


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -433,8 +433,6 @@ We can now use the `evaluate` method to evaluate the summaries in our dataset:
```python
from opik.evaluation import evaluate

os.environ["OPIK_PROJECT_NAME"] = "summary-evaluation-prompts"

MODEL = "gpt-4o-mini"
DENSITY_ITERATIONS = 2

Expand Down Expand Up @@ -490,8 +488,6 @@ Guidelines:
```python
from opik.evaluation import evaluate

os.environ["OPIK_PROJECT_NAME"] = "summary-evaluation-prompts"

MODEL = "gpt-4o-mini"
DENSITY_ITERATIONS = 2

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
Annotating traces is a crucial aspect of evaluating and improving your LLM-based applications. By systematically recording qualitative or quantitative feedback on specific interactions or entire conversation flows, you can:
Annotating traces is a crucial aspect of evaluating and improving your LLM-based applications. By systematically recording qualitative or quantitative
feedback on specific interactions or entire conversation flows, you can:

1. Track performance over time
2. Identify areas for improvement
Expand All @@ -10,7 +11,8 @@ Opik allows you to annotate traces through the SDK or the UI.

## Annotating Traces through the UI

To annotate traces through the UI, you can navigate to the trace you want to annotate in the traces page and click on the `Annotate` button. This will open a sidebar where you can add annotations to the trace.
To annotate traces through the UI, you can navigate to the trace you want to annotate in the traces page and click on the `Annotate` button.
This will open a sidebar where you can add annotations to the trace.

You can annotate both traces and spans through the UI, make sure you have selected the correct span in the sidebar.

Expand All @@ -19,10 +21,12 @@ You can annotate both traces and spans through the UI, make sure you have select
</Frame>

<Tip>
In order to ensure a consistent set of feedback, you will need to define feedback definitions in the `Feedback
Definitions` page which supports both numerical and categorical annotations.
Once a feedback scores has been provided, you can also add a reason to explain why this particular score was provided.
This is useful to add additional context to the score.
</Tip>

You can also add comments to traces and experiments to share insights with other team members.

## Online evaluation

You don't need to manually annotate each trace to measure the performance of your LLM applications! By using Opik's [online evaluation feature](/production/rules), you can define LLM as a Judge metrics that will automatically score all, or a subset, of your production traces.
Expand Down Expand Up @@ -77,9 +81,10 @@ client.log_spans_feedback_scores(
)
```

:::note
The `FeedbackScoreDict` class supports an optional `reason` field that can be used to provide a human-readable explanation for the feedback score.
:::
<Note>
The `FeedbackScoreDict` class supports an optional `reason` field that can be used to provide a human-readable
explanation for the feedback score.
</Note>

### Using Opik's built-in evaluation metrics

Expand All @@ -90,7 +95,7 @@ Opik's built-in evaluation metrics are broken down into two main categories:
1. Heuristic metrics
2. LLM as a judge metrics

### Heuristic Metrics
#### Heuristic Metrics

Heuristic metrics are use rule-based or statistical methods that can be used to evaluate the output of LLM models.

Expand Down Expand Up @@ -118,7 +123,7 @@ score = metric.score(
)
```

### LLM as a Judge Metrics
#### LLM as a Judge Metrics

For LLM outputs that cannot be evaluated using heuristic metrics, you can use LLM as a judge metrics. These metrics are based on the idea of using an LLM to evaluate the output of another LLM.

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
You can log chat conversations to the Opik platform and track the full conversations
your users are having with your chatbot.

<Frame>
<img src="/img/tracing/chat_conversations.png" />
</Frame>

## Logging conversations

You can log chat conversations by specifying the `thread_id` parameter when using either the low level SDK or
Python decorators:

<Tabs>
<Tab title="Python decorators" value="Python decorators">
```python
import opik
from opik import opik_context

@opik.track
def chat_message(input, thread_id):
opik_context.update_current_trace(
thread_id=thread_id
)
return "Opik is an Open Source GenAI platform"

thread_id = "f174a"
chat_message("What is Opik ?", thread_id)
chat_message("Repeat the previous message", thread_id)
```
</Tab>
<Tab title="Low level SDK" value="Low level SDK">
```python
import opik

opik_client = opik.Opik()

thread_id = "55d84"

# Log a first message
trace = opik_client.trace(
name="chat_conversation",
input="What is Opik?",
output="Opik is an Open Source GenAI platform",
thread_id=thread_id
)

# Log a second message
trace = opik_client.trace(
name="chat_conversation",
input="Can you track chat conversations in Opik",
output="Yes, of course !",
thread_id=thread_id
)
```
</Tab>

</Tabs>

<Note>
The input to each trace will be displayed as the user message while the output will be displayed as the AI assistant
response.
</Note>

## Reviewing conversations

Conversations can be viewed at a project level in the `threads` tab. All conversations are tracked and by clicking on the thread ID you will be able to
view the full conversation.

The thread view supports markdown making it easier for you to review the content that was returned to the user. If you would like to dig in deeper, you
can click on the `View trace` button to deepdive into how the AI assistant response was generated.

By clicking on the thumbs up or thumbs down icons, you can quickly rate the AI assistant response. This feedback score will be logged and associated to
the relevant trace. By switching to the trace view, you can review the full trace as well as add additional feedback scores through the annotation
functionality.

<Frame>
<img src="/img/tracing/chat_conversations_actions.png" />
</Frame>
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading