OpenTelemetry Trace Semantic Conventions for the LLM Stack #71
Closed
karthikscale3
started this conversation in
General
Replies: 1 comment
-
Closing this as we are migrating to https://opentelemetry.io/docs/specs/semconv/attributes-registry/gen-ai/ |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey everyone,
We are hoping to start an open discussion around trace semantic conventions for the LLM stack. The ultimate goal of this is to converge on a standard set of naming conventions for the span attributes generated by LLM frameworks, VectorDBs and LLM APIs.
The vision of this project is to generate spans with attributes that provides a great developer experience for both DevOps Engineers and Observability client developers to build rich interfaces and visualization tools in order to effectively evaluate, debug and improve LLM based applications.
The goals we are hoping to achieve are as follows:
Span Attribute Naming Goals:
We have shared below the current state of all the semantic conventions adopted by this project so far. The general thought process so far have been as follows:
The landscape of the LLM stack is constantly evolving. But most of the implementation today can be broken down into 3 layers - LLMs, Frameworks and VectorDBs. With this in mind, we have tried to instrument enough to get this project off to a start with a good number of attributes for each layer.
LLM Attributes
We have adopted a mixed approach between standardizing certain properties with strict schema vs serializing some of them without applying any standardization. Based on general observations, the inputs and outputs are the most important fields. Couple of client features that will become increasingly important are:
In order to provide a great developer experience for Observability client developers to implement the above features, we have decided to go with an opinionated schema for llm prompts(inputs) and responses(outputs). The schema for the same is described below in the table.
For majority of the rest of the fields, we have directly serialized with almost the same names as the API parameters without any data structure mutations. Fortunately, LLM API developers seem to be converging on similar parameter naming conventions for the most part.
Framework Attributes
For the frameworks, we have kept it simple by directly serializing the inputs and outputs of every function call. Since frameworks today are mostly trying to provide a good set of constructs by abstracting away complex details, we have added a
framework.task.name
attribute that generally tries to convey the task the framework is doing.VectorDB Attributes
For vector databases, we have tried to follow some of the OpenTelemetry conventions already set for other databases.
Having said this, converging on standards and guidelines for the LLM layers early on will not only help developers gain deep insights by providing high cardinal telemetry data but will also help this project thrive and support all the popular observability clients that support OpenTelemetry.
Current State
Sharing below the current set of span attribute names and descriptions that are split across 4 categories.
Notes:
1. Prompts are standardized for every LLM vendor.
2. The "system" role will always represent the system prompt passed. Ex: The preamble parameter passed to the cohere API is appended to the system prompt and captured within llm.prompts.
Notes:
1. For image generation, content is an object which has, 'url' which is the url of the image and any other properties that gets attached with it based on the LLM vendor.
2. For tool calling, the list includes role, content and additional properties like tool_id depending on the LLM vendor.
input_tokens: number,
output_tokens: number,
total_tokens: number
}
Notes:
1. For streaming mode, some LLM vendors like OpenAI do not have the token counts. So, this metric calculates the token counts for each stream chunk using the tiktoken library. As a result, it may not be accurate.
2. For cohere, this captures the billed units. And also captures the search_units when search capabilities are used.
Note:
1. For LLMs that support top_n, the argument is captured in this attribute as both top_k and top_n represent the same thing.
- LLM
- VectorDB
- Framework
Repository
https://github.com/Scale3-Labs/langtrace-trace-attributes
Beta Was this translation helpful? Give feedback.
All reactions