Skip to content

Commit

Permalink
Introduce Opik.search_spans (#545)
Browse files Browse the repository at this point in the history
* Added Opik.search_spans method
  • Loading branch information
jverre authored Nov 4, 2024
1 parent 1c81f4f commit e53d35d
Show file tree
Hide file tree
Showing 6 changed files with 275 additions and 113 deletions.
168 changes: 168 additions & 0 deletions apps/opik-documentation/documentation/docs/tracing/export_data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
---
sidebar_label: Export Traces and Spans
toc_max_heading_level: 4
---

# Exporting Traces and Spans

When working with Opik, it is important to be able to export traces and spans so that you can use them to fine-tune your models or run deeper analysis.

You can export the traces you have logged to the Opik platform using:

1. Using the Opik SDK: You can use the [`Opik.search_traces`](https://www.comet.com/docs/opik/python-sdk-reference/Opik.html#opik.Opik.search_traces) and [`Opik.search_spans`](https://www.comet.com/docs/opik/python-sdk-reference/Opik.html#opik.Opik.search_spans) methods to export traces and spans.
2. Using the Opik REST API: You can use the [`/traces`](/reference/rest_api/get-traces-by-project.api.mdx) and [`/spans`](/reference/rest_api/get-spans-by-project.api.mdx) endpoints to export traces and spans.
3. Using the UI: Once you have selected the traces or spans you want to export, you can click on the `Export CSV` button in the `Actions` dropdown.

:::tip
The recommended way to export traces is to use the [`Opik.search_traces`](https://www.comet.com/docs/opik/python-sdk-reference/Opik.html#opik.Opik.search_traces) and [`Opik.search_spans`](https://www.comet.com/docs/opik/python-sdk-reference/Opik.html#opik.Opik.search_spans) methods in the Opik SDK.
:::

## Using the Opik SDK

### Exporting traces

The [`Opik.search_traces`](https://www.comet.com/docs/opik/python-sdk-reference/Opik.html#opik.Opik.search_traces) method allows you to both export all the traces in a project or search for specific traces and export them.

#### Exporting all traces

To export all traces, you will need to specify a `max_results` value that is higher than the total number of traces in your project:

```python
import opik

client = opik.Opik()

traces = client.search_traces(project_name="Default project", max_results=1000000)
```

#### Search for specific traces

You can use the `filter_string` parameter to search for specific traces:

```python
import opik

client = opik.Opik()

traces = client.search_traces(
project_name="Default project",
filter_string='input contains "Opik"'
)

# Convert to Dict if required
traces = [trace.dict() for trace in traces]
```

The `filter_string` parameter should follow the format `<column> <operator> <value>` with:

1. `<column>`: The column to filter on, these can be:
- `name`
- `input`
- `output`
- `start_time`
- `end_time`
- `metadata`
- `feedback_score`
- `tags`
- `usage.total_tokens`
- `usage.prompt_tokens`
- `usage.completion_tokens`.
2. `<operator>`: The operator to use for the filter, this can be `=`, `!=`, `>`, `>=`, `<`, `<=`, `contains`, `not_contains`. Not that not all operators are supported for all columns.
3. `<value>`: The value to filter on. If you are filtering on a string, you will need to wrap it in double quotes.

Here are some additional examples of valid `filter_string` values:

```python
import opik

client = opik.Opik(
project_name="Default project"
)

# Search for traces where the input contains text
traces = client.search_traces(
filter_string='input contains "Opik"'
)

# Search for traces that were logged after a specific date
traces = client.search_traces(filter_string='start_time >= "2024-01-01T00:00:00Z"')

# Search for traces that have a specific tag
traces = client.search_traces(filter_string='tags contains "production"')

# Search for traces based on the number of tokens used
traces = client.search_traces(filter_string='usage.total_tokens > 1000')

# Search for traces based on the model used
traces = client.search_traces(filter_string='metadata.model = "gpt-4o"')
```

### Exporting spans

You can export spans using the [`Opik.search_spans`](https://www.comet.com/docs/opik/python-sdk-reference/Opik.html#opik.Opik.search_spans) method. This methods allows you to search for spans based on `trace_id` or based on a filter string.

#### Exporting spans based on `trace_id`

To export all the spans associated with a specific trace, you can use the `trace_id` parameter:

```python
import opik

client = opik.Opik()

spans = client.search_spans(
project_name="Default project",
trace_id="067092dc-e639-73ff-8000-e1c40172450f"
)
```

#### Search for specific spans

You can use the `filter_string` parameter to search for specific spans:

```python
import opik

client = opik.Opik()

spans = client.search_spans(
project_name="Default project",
filter_string='input contains "Opik"'
)
```

:::tip
The `filter_string` parameter should follow the same format as the `filter_string` parameter in the `Opik.search_traces` method as [defined above](#search-for-specific-traces).
:::

## Using the Opik REST API

To export traces using the Opik REST API, you can use the [`/traces`](/reference/rest_api/get-traces-by-project.api.mdx) endpoint and the [`/spans`](/reference/rest_api/get-spans-by-project.api.mdx) endpoint. These endpoints are paginated so you will need to make multiple requests to retrieve all the traces or spans you want.

To search for specific traces or spans, you can use the `filter` parameter. While this is a string parameter, it does not follow the same format as the `filter_string` parameter in the Opik SDK. Instead it is a list of json objects with the following format:

```json
[
{
"field": "name",
"type": "string",
"operator": "=",
"value": "Opik"
}
]
```

:::warning
The `filter` parameter was designed to be used with the Opik UI and has therefore limited flexibility. If you need more flexibility,
please raise an issue on [GitHub](https://github.com/comet-ml/opik/issues) so we can help.
:::

## Using the UI

To export traces as a CSV file from the UI, you can simply select the traces or spans you wish to export and click on `Export CSV` in the `Actions` dropdown:

![Export CSV](/img/tracing/download_traces.png)

:::tip
The UI only allows you to export up to 100 traces or spans at a time as it is linked to the page size of the traces table. If you need to export more traces or spans, we recommend using the Opik SDK.
:::
111 changes: 0 additions & 111 deletions apps/opik-documentation/documentation/docs/tracing/export_traces.md

This file was deleted.

2 changes: 1 addition & 1 deletion apps/opik-documentation/documentation/sidebars.ts
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ const sidebars: SidebarsConfig = {
"tracing/log_distributed_traces",
"tracing/annotate_traces",
"tracing/sdk_configuration",
"tracing/export_traces",
"tracing/export_data",
{
type: "category",
label: "Integrations",
Expand Down
10 changes: 10 additions & 0 deletions sdks/python/examples/search_traces_and_spans.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
import opik

opik_client = opik.Opik()

spans = opik_client.search_spans(
project_name="Demo Project",
filter_string='input contains "How many unique albums"',
)

print(spans)
42 changes: 41 additions & 1 deletion sdks/python/src/opik/api_objects/opik_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -505,7 +505,7 @@ def search_traces(
Search for traces in the given project.
Args:
project_name: The name of the project to search traces in. If not provided the project name configured when the Client was created will be used.
project_name: The name of the project to search traces in. If not provided, will search across the project name configured when the Client was created which defaults to the `Default Project`.
filter_string: A filter string to narrow down the search. If not provided, all traces in the project will be returned up to the limit.
max_results: The maximum number of traces to return.
"""
Expand All @@ -532,6 +532,46 @@ def search_traces(

return traces[:max_results]

def search_spans(
self,
project_name: Optional[str] = None,
trace_id: Optional[str] = None,
filter_string: Optional[str] = None,
max_results: int = 1000,
) -> List[span_public.SpanPublic]:
"""
Search for spans in the given trace. This allows you to search spans based on the span input, output,
metadata, tags, etc or based on the trace ID.
Args:
project_name: The name of the project to search spans in. If not provided, will search across the project name configured when the Client was created which defaults to the `Default Project`.
trace_id: The ID of the trace to search spans in. If provided, the search will be limited to the spans in the given trace.
filter_string: A filter string to narrow down the search.
max_results: The maximum number of spans to return.
"""
page_size = 200
spans: List[span_public.SpanPublic] = []

filters = opik_query_language.OpikQueryLanguage(filter_string).parsed_filters

page = 1
while len(spans) < max_results:
page_spans = self._rest_client.spans.get_spans_by_project(
project_name=project_name or self._project_name,
trace_id=trace_id,
filters=filters,
page=page,
size=page_size,
)

if len(page_spans.content) == 0:
break

spans.extend(page_spans.content)
page += 1

return spans[:max_results]

def get_trace_content(self, id: str) -> trace_public.TracePublic:
"""
Args:
Expand Down
Loading

0 comments on commit e53d35d

Please sign in to comment.