Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add common guidance on recording errors on spans and metrics, clarify DB conventions #1716

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .chloggen/1716.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
change_type: enhancement
component: docs, db
note: Add common guidance on recording errors on spans and metrics, clarify DB conventions.
issues: [1536, 1716]
3 changes: 2 additions & 1 deletion docs/cli/cli-spans.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@ Span kind SHOULD be `INTERNAL` when the traced program is the callee or `CLIENT`
The span name SHOULD be set to `{process.executable.name}`.
Instrumentations that have additional context about executed commands MAY use a different low-cardinality span name format and SHOULD document it.

Span status SHOULD be set to `Error` if `{process.exit.code}` is not 0.
Span status SHOULD be set to `Error` if `{process.exit.code}` is not 0. Refer to the [Recording Errors](/docs/general/recording-errors.md) document for
additional details on how to record span status.

<!-- TODO: context propagation https://github.com/open-telemetry/semantic-conventions/issues/1612 -->

Expand Down
3 changes: 1 addition & 2 deletions docs/database/cassandra.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,7 @@ system specific term if more applicable.

**[5] `db.operation.name`:** If readily available and if there is a single operation name that describes the database call. The operation name MAY be parsed from the query text, in which case it SHOULD be the single operation name found in the query.

**[6] `db.response.status_code`:** The status code returned by the database. Usually it represents an error code, but may also represent partial success, warning, or differentiate between various types of successful outcomes.
Semantic conventions for individual database systems SHOULD document what `db.response.status_code` means in the context of that system.
**[6] `db.response.status_code`:** All Cassandra protocol error codes SHOULD be considered errors.

**[7] `db.response.status_code`:** If the operation failed and status code is available.

Expand Down
3 changes: 1 addition & 2 deletions docs/database/cosmosdb.md
Original file line number Diff line number Diff line change
Expand Up @@ -193,8 +193,7 @@ additional values when introducing new operations.

**[5] `db.operation.name`:** If readily available and if there is a single operation name that describes the database call. The operation name MAY be parsed from the query text, in which case it SHOULD be the single operation name found in the query.

**[6] `db.response.status_code`:** The status code returned by the database. Usually it represents an error code, but may also represent partial success, warning, or differentiate between various types of successful outcomes.
Semantic conventions for individual database systems SHOULD document what `db.response.status_code` means in the context of that system.
**[6] `db.response.status_code`:** Response codes in the 4xx and 5xx range SHOULD be considered errors.

**[7] `error.type`:** The `error.type` SHOULD match the `db.response.status_code` returned by the database or the client library, or the canonical name of exception that occurred.
When using canonical exception type name, instrumentation SHOULD do the best effort to report the most relevant type. For example, if the original exception is wrapped into a generic one, the original exception SHOULD be preferred.
Expand Down
5 changes: 2 additions & 3 deletions docs/database/couchdb.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,16 +23,15 @@ The Semantic Conventions for [CouchDB](https://couchdb.apache.org/) extend and o
|---|---|---|---|---|---|
| [`db.namespace`](/docs/attributes-registry/db.md) | string | The name of the database, fully qualified within the server address and port. | `customers`; `test.users` | `Conditionally Required` If available. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
| [`db.operation.name`](/docs/attributes-registry/db.md) | string | The HTTP method + the target REST route. [1] | `GET /{db}/{docid}` | `Conditionally Required` If readily available. | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
| [`db.response.status_code`](/docs/attributes-registry/db.md) | string | The HTTP response code returned by the Couch DB. [2] | `200`; `201`; `429` | `Conditionally Required` [3] | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
| [`db.response.status_code`](/docs/attributes-registry/db.md) | string | The HTTP response code returned by the Couch DB recorded as string. [2] | `200`; `201`; `429` | `Conditionally Required` [3] | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
| [`error.type`](/docs/attributes-registry/error.md) | string | Describes a class of error the operation ended with. [4] | `timeout`; `java.net.UnknownHostException`; `server_certificate_invalid`; `500` | `Conditionally Required` If and only if the operation failed. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
| [`server.port`](/docs/attributes-registry/server.md) | int | Server port number. [5] | `80`; `8080`; `443` | `Conditionally Required` [6] | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
| [`db.operation.batch.size`](/docs/attributes-registry/db.md) | int | The number of queries included in a batch operation. [7] | `2`; `3`; `4` | `Recommended` | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
| [`server.address`](/docs/attributes-registry/server.md) | string | Name of the database host. [8] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Recommended` | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |

**[1] `db.operation.name`:** In **CouchDB**, `db.operation.name` should be set to the HTTP method + the target REST route according to the API reference documentation. For example, when retrieving a document, `db.operation.name` would be set to (literally, i.e., without replacing the placeholders with concrete values): [`GET /{db}/{docid}`](https://docs.couchdb.org/en/stable/api/document/common.html#get--db-docid).

**[2] `db.response.status_code`:** The status code returned by the database. Usually it represents an error code, but may also represent partial success, warning, or differentiate between various types of successful outcomes.
Semantic conventions for individual database systems SHOULD document what `db.response.status_code` means in the context of that system.
**[2] `db.response.status_code`:** HTTP response codes in the 4xx and 5xx range SHOULD be considered errors.

**[3] `db.response.status_code`:** If response was received and the HTTP response code is available.

Expand Down
58 changes: 4 additions & 54 deletions docs/database/database-spans.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ linkTitle: Client Calls

- [Name](#name)
- [Status](#status)
- [Recording exception events](#recording-exception-events)
- [Common attributes](#common-attributes)
- [Notes and well-known identifiers for `db.system`](#notes-and-well-known-identifiers-for-dbsystem)
- [Sanitization of `db.query.text`](#sanitization-of-dbquerytext)
Expand Down Expand Up @@ -89,59 +88,11 @@ For example, for an operation describing SQL query on an anonymous table like `S

## Status

[Span Status Code][SpanStatus] MUST be left unset if the operation has ended without any errors.
Refer to the [Recording Errors](/docs/general/recording-errors.md) document for
details on how to record span status.

Instrumentation SHOULD consider the operation as failed if any of the following is true:

- the `db.response.status_code` value indicates an error

> [!NOTE]
>
> The classification of status code as an error depends on the context.
> For example, a SQL STATE `02000` (`no_data`) indicates an error when the application
> expected the data to be available. However, it is not an error when the
> application is simply checking whether the data exists.
>
> Instrumentations that have additional context about a specific operation MAY use
> this context to set the span status more precisely.
> Instrumentations that don't have any additional context MUST follow the
> guidelines in this section.

- an exception is thrown by the instrumented method call
- the instrumented method returns an error in another way

When the operation ends with an error, instrumentation:

- SHOULD set the span status code to `Error`
- SHOULD set the `error.type` attribute
- SHOULD set the span status description when it has additional information
about the error which is not expected to contain sensitive details and aligns
with [Span Status Description][SpanStatus] definition.

It's NOT RECOMMENDED to duplicate `db.response.status_code` or `error.type`
in span status description.

When the operation fails with an exception, the span status description SHOULD be set to
the exception message.

### Recording exception events

**Status**: [Experimental][DocumentStatus]

When the operation fails with an exception, instrumentation SHOULD record
an [exception event](../exceptions/exceptions-spans.md) by default if, and only if,
the span being recorded is a local root span (does not have a local parent).

> [!NOTE]
>
> Exception stack traces could be very long and are expensive to capture and store.
> Exceptions which are not handled by instrumented libraries are likely to be handled
> and logged by the caller.
> Exceptions that are not handled will be recorded by the outermost (local root)
> instrumentation such as HTTP or gRPC server.

Instrumentation MAY provide a configuration option to record exceptions that
escape the surface of the instrumented API.
Semantic conventions for individual systems SHOULD specify which values of `db.response.status_code`
classify as errors.

## Common attributes

Expand Down Expand Up @@ -466,4 +417,3 @@ More specific Semantic Conventions are defined for the following database techno
* [SQL](sql.md): Semantic Conventions for *SQL* databases.

[DocumentStatus]: https://opentelemetry.io/docs/specs/otel/document-status
[SpanStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.39.0/specification/trace/api.md#set-status
3 changes: 1 addition & 2 deletions docs/database/elasticsearch.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,8 +82,7 @@ When a query string value is redacted, the query string key SHOULD still be pres

**[4] `db.elasticsearch.path_parts`:** Many Elasticsearch url paths allow dynamic values. These SHOULD be recorded in span attributes in the format `db.elasticsearch.path_parts.<key>`, where `<key>` is the url path part name. The implementation SHOULD reference the [elasticsearch schema](https://raw.githubusercontent.com/elastic/elasticsearch-specification/main/output/schema/schema.json) in order to map the path part values to their names.

**[5] `db.response.status_code`:** The status code returned by the database. Usually it represents an error code, but may also represent partial success, warning, or differentiate between various types of successful outcomes.
Semantic conventions for individual database systems SHOULD document what `db.response.status_code` means in the context of that system.
**[5] `db.response.status_code`:** HTTP response codes in the 4xx and 5xx range SHOULD be considered errors.

**[6] `error.type`:** The `error.type` SHOULD match the `db.response.status_code` returned by the database or the client library, or the canonical name of exception that occurred.
When using canonical exception type name, instrumentation SHOULD do the best effort to report the most relevant type. For example, if the original exception is wrapped into a generic one, the original exception SHOULD be preferred.
Expand Down
38 changes: 3 additions & 35 deletions docs/database/mariadb.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,41 +42,9 @@ Instrumentation SHOULD document if `db.namespace` reflects the database provided

It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization.

**[2] `db.response.status_code`:** SQL defines [SQLSTATE](https://wikipedia.org/wiki/SQLSTATE) as a database
return code which is adopted by some database systems like PostgreSQL.
See [PostgreSQL error codes](https://www.postgresql.org/docs/current/errcodes-appendix.html)
for the details.

Other systems like MySQL, Oracle, or MS SQL Server define vendor-specific
error codes. Database SQL drivers usually provide access to both properties.
For example, in Java, the [`SQLException`](https://docs.oracle.com/javase/8/docs/api/java/sql/SQLException.html)
class reports them with `getSQLState()` and `getErrorCode()` methods.

Instrumentations SHOULD populate the `db.response.status_code` with the
the most specific code available to them.

Here's a non-exhaustive list of databases that report vendor-specific
codes with granularity higher than SQLSTATE (or don't report SQLSTATE
at all):

- [DB2 SQL codes](https://www.ibm.com/docs/db2-for-zos/12?topic=codes-sql).
- [Maria DB error codes](https://mariadb.com/kb/en/mariadb-error-code-reference/)
- [Microsoft SQL Server errors](https://docs.microsoft.com/sql/relational-databases/errors-events/database-engine-events-and-errors)
- [MySQL error codes](https://dev.mysql.com/doc/mysql-errors/9.0/en/error-reference-introduction.html)
- [Oracle error codes](https://docs.oracle.com/cd/B28359_01/server.111/b28278/toc.htm)
- [SQLite result codes](https://www.sqlite.org/rescode.html)

These systems SHOULD set the `db.response.status_code` to a
known vendor-specific error code. If only SQLSTATE is available,
it SHOULD be used.

When multiple error codes are available and specificity is unclear,
instrumentation SHOULD set the `db.response.status_code` to the
concatenated string of all codes with '/' used as a separator.

For example, generic DB instrumentation that detected an error and has
SQLSTATE `"42000"` and vendor-specific `1071` should set
`db.response.status_code` to `"42000/1071"`."
**[2] `db.response.status_code`:** MariaDB uses vendor-specific error codes on all errors and reports [SQLSTATE](https://mariadb.com/kb/en/sqlstate/) in some cases.
MariaDB error codes are more granular than SQLSTATE, so MariaDB instrumentations SHOULD set the `db.response.status_code` to this known error code.
When SQLSTATE is available, SQLSTATE of "Class 02" or higher SHOULD be considered errors. When SQLSTATE is not available, all MariaDB error codes SHOULD be considered errors.

**[3] `error.type`:** The `error.type` SHOULD match the `db.response.status_code` returned by the database or the client library, or the canonical name of exception that occurred.
When using canonical exception type name, instrumentation SHOULD do the best effort to report the most relevant type. For example, if the original exception is wrapped into a generic one, the original exception SHOULD be preferred.
Expand Down
3 changes: 1 addition & 2 deletions docs/database/mongodb.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,7 @@ then that collection name SHOULD be used.

**[2] `db.operation.name`:** See [MongoDB database commands](https://www.mongodb.com/docs/manual/reference/command/).

**[3] `db.response.status_code`:** The status code returned by the database. Usually it represents an error code, but may also represent partial success, warning, or differentiate between various types of successful outcomes.
Semantic conventions for individual database systems SHOULD document what `db.response.status_code` means in the context of that system.
**[3] `db.response.status_code`:** All MongoDB error codes SHOULD be considered errors.

**[4] `db.response.status_code`:** If the operation failed and error code is available.

Expand Down
1 change: 1 addition & 0 deletions docs/database/mssql.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ Instrumentation SHOULD document if `db.namespace` reflects the database provided
It is RECOMMENDED to capture the value as provided by the application without attempting to do any case normalization.

**[2] `db.response.status_code`:** Microsoft SQL Server does not report SQLSTATE.
Instrumentations SHOULD use [error severity](https://learn.microsoft.com/sql/relational-databases/errors-events/database-engine-error-severities) returned along with the status code to determine the status of the span. Response codes with severity 11 or higher SHOULD be considered errors.

**[3] `error.type`:** The `error.type` SHOULD match the `db.response.status_code` returned by the database or the client library, or the canonical name of exception that occurred.
When using canonical exception type name, instrumentation SHOULD do the best effort to report the most relevant type. For example, if the original exception is wrapped into a generic one, the original exception SHOULD be preferred.
Expand Down
Loading
Loading