OpenTelemetry Trace Semantic Conventions for the LLM Stack #71

karthikscale3 · 2024-04-25T06:35:25Z

karthikscale3
Apr 25, 2024
Maintainer

Hey everyone,

We are hoping to start an open discussion around trace semantic conventions for the LLM stack. The ultimate goal of this is to converge on a standard set of naming conventions for the span attributes generated by LLM frameworks, VectorDBs and LLM APIs.

The vision of this project is to generate spans with attributes that provides a great developer experience for both DevOps Engineers and Observability client developers to build rich interfaces and visualization tools in order to effectively evaluate, debug and improve LLM based applications.

The goals we are hoping to achieve are as follows:

Span Attribute Naming Goals:

Adhere to the semantic naming conventions of OpenTelemetry.
Achieve a healthy balance between opinionated schema for standardization vs flexibility for the sake of intuitiveness and ease of development.
Self-explanatory in order to minimize the overhead of maintaining extensive documentation.
Makes it easy to visualize with any observability client software that supports OpenTelemetry.

We have shared below the current state of all the semantic conventions adopted by this project so far. The general thought process so far have been as follows:

The landscape of the LLM stack is constantly evolving. But most of the implementation today can be broken down into 3 layers - LLMs, Frameworks and VectorDBs. With this in mind, we have tried to instrument enough to get this project off to a start with a good number of attributes for each layer.

LLM Attributes
We have adopted a mixed approach between standardizing certain properties with strict schema vs serializing some of them without applying any standardization. Based on general observations, the inputs and outputs are the most important fields. Couple of client features that will become increasingly important are:

Evaluations
Datasets

In order to provide a great developer experience for Observability client developers to implement the above features, we have decided to go with an opinionated schema for llm prompts(inputs) and responses(outputs). The schema for the same is described below in the table.

For majority of the rest of the fields, we have directly serialized with almost the same names as the API parameters without any data structure mutations. Fortunately, LLM API developers seem to be converging on similar parameter naming conventions for the most part.

Framework Attributes
For the frameworks, we have kept it simple by directly serializing the inputs and outputs of every function call. Since frameworks today are mostly trying to provide a good set of constructs by abstracting away complex details, we have added a framework.task.name attribute that generally tries to convey the task the framework is doing.

VectorDB Attributes
For vector databases, we have tried to follow some of the OpenTelemetry conventions already set for other databases.

Having said this, converging on standards and guidelines for the LLM layers early on will not only help developers gain deep insights by providing high cardinal telemetry data but will also help this project thrive and support all the popular observability clients that support OpenTelemetry.

Current State

Sharing below the current set of span attribute names and descriptions that are split across 4 categories.

LLMs - These are the attributes of the spans generated by OpenAI, Anthropic and Cohere. And any other LLMs we support in the future.
Frameworks - These are the attributes of the spans generated by Langchain and LlamaIndex.
VectorDBs - These are the attributes of the spans generated by vector databases like pinecone and chromaDB.
Langtrace - Langtrace specific span attributes for tracking the SDK version, name, testID and user ID.

Service Type	Name	Type/Schema	Description
LLM	llm.prompts	[{role: string, content: string}]	Captures the input messages given to the LLM. It includes the prompt with role "System" and any "user" and "assistant" messages along with the history. Notes: 1. Prompts are standardized for every LLM vendor. 2. The "system" role will always represent the system prompt passed. Ex: The preamble parameter passed to the cohere API is appended to the system prompt and captured within llm.prompts.
LLM	llm.responses	[{role: string, content: string}]	Captures the output messages given by the LLM. Notes: 1. For image generation, content is an object which has, 'url' which is the url of the image and any other properties that gets attached with it based on the LLM vendor. 2. For tool calling, the list includes role, content and additional properties like tool_id depending on the LLM vendor.
LLM	llm.token.counts	llm.token.counts: { input_tokens: number, output_tokens: number, total_tokens: number }	Captures the token counts used with the request including input, output and total tokens. Notes: 1. For streaming mode, some LLM vendors like OpenAI do not have the token counts. So, this metric calculates the token counts for each stream chunk using the tiktoken library. As a result, it may not be accurate. 2. For cohere, this captures the billed units. And also captures the search_units when search capabilities are used.
LLM	llm.api	string	The endpoint being invoked. Ex: /chat/completions
LLM	llm.model	string	The model used for the call. The model is captured from the response and not from the request. Response has the accurate model name. Ex: Passing "gpt-4" in the request can result in "gpt-4-0613" in the response depending on the version of gpt-4 being used. This is more accurate description of the model used for the call.
LLM	llm.temprature	number	The temperature setting used
LLM	llm.top_p	number	Top P setting
LLM	llm.top_k	number	Top K setting Note: 1. For LLMs that support top_n, the argument is captured in this attribute as both top_k and top_n represent the same thing.
LLM	llm.user	string	This is an LLM request parama for identifying the user originating this request. Not to be confused with the user.id attribute passed to the langtrace SDK using with_additional_attributes option.
LLM	llm.system.fingerprint	string	The system fingerprint parameter passed to the API.
LLM	llm.stream	boolean	Whether or not streaming is used
LLM	llm.encoding.formats	[string]	Mainly applies to Embedding models. List of encoding formats used for embedding.
LLM	llm.dimensions	string	The number of dimensions the resulting output embeddings should have
LLM	llm.generation_id	string	Captures the generation_id from a response if any.
LLM	llm.response_id	string	Captures the response_id from a response if any.
LLM	llm.citations	[object]	List of citations from cohere’s response. Serialized as is without any mutation to apply any standardization. Cohere Documentation on Documents and Citations
LLM	llm.documents	[object]	Serialized list of documents passed to the rerank API of cohere. This primarily applies to retrieval models and serialized as is without any mutation to apply any standardization.
LLM	llm.frequency_penalty	string	Frequency penalty if passed
LLM	llm.presence_penalty	string	Presence penalty if passed
LLM	llm.connectors	[object]	Applies mainly for cohere. Serialized directly without mutation.
LLM	llm.tools	[object]	The list of tools or functions available for the LLM to take a decision on. There is no standardization applied for the schema and serialized as is for different LLM vendors.
LLM	llm.tool_results	[object]	For LLM vendors that require tool_results passed as a separate parameter with the request. Ex: Cohere. For OpenAI, tool results are part of the messages parameter and are captured with llm.prompts.
LLM	llm.embedding_inputs	[string]	Captures the input strings provided to the embedding model.
LLM	llm.embedding_dataset_id	string	Applies only for cohere
LLM	llm.embedding_input_type	string	Applies only for cohere
LLM	llm.embedding_job_name	string	Applies only for the embed_job API for cohere.
LLM	llm.retrieval.query	string	Query passed to the retrieval model. Ex: Cohere Rerank
LLM	llm.retrieval.results	[string]	Serialized array of objects returned by a retrieval model that usually includes the score and the index of the documents passed.
VectorDB	server.address	string	Captures the DB server address if found
VectorDB	db.operation	string	Operations of a vectorDB - add, delete, query, peek etc.
VectorDB	db.system	string	Captures the db - chromedb, pinecone etc.
VectorDB	db.namespace	string	Namespace of the database
VectorDB	db.index	string	Index passed to the database if any
VectorDB	db.collection.name	string	Captures the collection name where vectors are stored that the operation is querying.
VectorDB	db.pinecone.top_k	string	Captures the top_k value for KNN search
VectorDB	db.chromadb.embedding_model	string	Captures the embedding model used with chromadb
Framework	http://langchain.task.name/angchain.task.name	string	Short term that indicates what task the framework is performing. The names are framework specific. Currently it could be one of the following: load_pdf, vector_store, split_text, retriever, prompt, runnable, runnablepassthrough, jsonoutputparser, stroutputparser, listoutputparser, xmloutputparser.
Framework	langchain.inputs	string	Serialized inputs to the function call
Framework	langchain.outputs	string	Serialized outputs of the function call
Framework	llamaindex.task.name	string	Short term that indicates what task the framework is performing. Currently it could be one of the following - query, retrieve, extract, aextract, load_data, chat, achat
Framework	llamaindex.inputs	string	Serialized inputs to the function call
Framework	llamaindex.outputs	string	Serialized outputs of the function call
Langtrace	user.feedback.rating	number	This is useful for capturing the feedback provided by the user of the application for an LLM’s response. Ex: a user hitting a thumbs up or down for a chatbot’s response.
Langtrace	user.id	string	This is application specific and can be optionally passed using the with_additional_attributes option from the SDK for tying users to requests. More details: Langtrace Trace User Feedback
Langtrace	langtrace.testId	string	Unique id of the test generated within langtrace for capturing requests to a specific test bucket. Useful for evaluating a set of requests against a specific test. Ex: A test for measuring factual accuracy.
Langtrace	langtrace.service.name	string	Captures the service name - Ex: openai, llamaindex etc.
Langtrace	langtrace.service.type	string	Captures the service type - It can be one of the below 3 - LLM - VectorDB - Framework
Langtrace	langtrace.service.version	string	Version of the library being used: Ex: 3.0.0 represents the 3.0.0 version of openai python library
Langtrace	langtrace.sdk.name	string	Langtrace SDK that is generating this span. Currently its typescript or python.
Langtrace	langtrace.version	string	Langtrace SDK version.

Repository
https://github.com/Scale3-Labs/langtrace-trace-attributes

karthikscale3 · 2024-06-28T18:43:35Z

karthikscale3
Jun 28, 2024
Maintainer Author

Closing this as we are migrating to https://opentelemetry.io/docs/specs/semconv/attributes-registry/gen-ai/

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenTelemetry Trace Semantic Conventions for the LLM Stack #71

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

OpenTelemetry Trace Semantic Conventions for the LLM Stack #71

karthikscale3 Apr 25, 2024 Maintainer

Replies: 1 comment

karthikscale3 Jun 28, 2024 Maintainer Author

karthikscale3
Apr 25, 2024
Maintainer

karthikscale3
Jun 28, 2024
Maintainer Author