Skip to content

Commit

Permalink
Sampling context improvements (#3847)
Browse files Browse the repository at this point in the history
  • Loading branch information
sentrivana authored Dec 5, 2024
1 parent 7c70b9c commit bcadb61
Show file tree
Hide file tree
Showing 18 changed files with 221 additions and 151 deletions.
199 changes: 103 additions & 96 deletions MIGRATION_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,102 +20,109 @@ Looking to upgrade from Sentry SDK 2.x to 3.x? Here's a comprehensive list of wh
- Redis integration: In Redis pipeline spans there is no `span["data"]["redis.commands"]` that contains a dict `{"count": 3, "first_ten": ["cmd1", "cmd2", ...]}` but instead `span["data"]["redis.commands.count"]` (containing `3`) and `span["data"]["redis.commands.first_ten"]` (containing `["cmd1", "cmd2", ...]`).
- clickhouse-driver integration: The query is now available under the `db.query.text` span attribute (only if `send_default_pii` is `True`).
- `sentry_sdk.init` now returns `None` instead of a context manager.
- The `sampling_context` argument of `traces_sampler` now additionally contains all span attributes known at span start.
- If you're using the Celery integration, the `sampling_context` argument of `traces_sampler` doesn't contain the `celery_job` dictionary anymore. Instead, the individual keys are now available as:

| Dictionary keys | Sampling context key |
| ---------------------- | -------------------- |
| `celery_job["args"]` | `celery.job.args` |
| `celery_job["kwargs"]` | `celery.job.kwargs` |
| `celery_job["task"]` | `celery.job.task` |

Note that all of these are serialized, i.e., not the original `args` and `kwargs` but rather OpenTelemetry-friendly span attributes.

- If you're using the AIOHTTP integration, the `sampling_context` argument of `traces_sampler` doesn't contain the `aiohttp_request` object anymore. Instead, some of the individual properties of the request are accessible, if available, as follows:

| Request property | Sampling context key(s) |
| ---------------- | ------------------------------- |
| `path` | `url.path` |
| `query_string` | `url.query` |
| `method` | `http.request.method` |
| `host` | `server.address`, `server.port` |
| `scheme` | `url.scheme` |
| full URL | `url.full` |

- If you're using the Tornado integration, the `sampling_context` argument of `traces_sampler` doesn't contain the `tornado_request` object anymore. Instead, some of the individual properties of the request are accessible, if available, as follows:

| Request property | Sampling context key(s) |
| ---------------- | --------------------------------------------------- |
| `path` | `url.path` |
| `query` | `url.query` |
| `protocol` | `url.scheme` |
| `method` | `http.request.method` |
| `host` | `server.address`, `server.port` |
| `version` | `network.protocol.name`, `network.protocol.version` |
| full URL | `url.full` |

- If you're using the generic WSGI integration, the `sampling_context` argument of `traces_sampler` doesn't contain the `wsgi_environ` object anymore. Instead, the individual properties of the environment are accessible, if available, as follows:

| Env property | Sampling context key(s) |
| ----------------- | ------------------------------------------------- |
| `PATH_INFO` | `url.path` |
| `QUERY_STRING` | `url.query` |
| `REQUEST_METHOD` | `http.request.method` |
| `SERVER_NAME` | `server.address` |
| `SERVER_PORT` | `server.port` |
| `SERVER_PROTOCOL` | `server.protocol.name`, `server.protocol.version` |
| `wsgi.url_scheme` | `url.scheme` |
| full URL | `url.full` |

- If you're using the generic ASGI integration, the `sampling_context` argument of `traces_sampler` doesn't contain the `asgi_scope` object anymore. Instead, the individual properties of the scope, if available, are accessible as follows:

| Scope property | Sampling context key(s) |
| -------------- | ------------------------------- |
| `type` | `network.protocol.name` |
| `scheme` | `url.scheme` |
| `path` | `url.path` |
| `query` | `url.query` |
| `http_version` | `network.protocol.version` |
| `method` | `http.request.method` |
| `server` | `server.address`, `server.port` |
| `client` | `client.address`, `client.port` |
| full URL | `url.full` |

- If you're using the RQ integration, the `sampling_context` argument of `traces_sampler` doesn't contain the `rq_job` object anymore. Instead, the individual properties of the job and the queue, if available, are accessible as follows:

| RQ property | Sampling context key(s) |
| --------------- | ---------------------------- |
| `rq_job.args` | `rq.job.args` |
| `rq_job.kwargs` | `rq.job.kwargs` |
| `rq_job.func` | `rq.job.func` |
| `queue.name` | `messaging.destination.name` |
| `rq_job.id` | `messaging.message.id` |

Note that `rq.job.args`, `rq.job.kwargs`, and `rq.job.func` are serialized and not the actual objects on the job.

- If you're using the AWS Lambda integration, the `sampling_context` argument of `traces_sampler` doesn't contain the `aws_event` and `aws_context` objects anymore. Instead, the following, if available, is accessible:

| AWS property | Sampling context key(s) |
| ------------------------------------------- | ----------------------- |
| `aws_event["httpMethod"]` | `http.request.method` |
| `aws_event["queryStringParameters"]` | `url.query` |
| `aws_event["path"]` | `url.path` |
| full URL | `url.full` |
| `aws_event["headers"]["X-Forwarded-Proto"]` | `network.protocol.name` |
| `aws_event["headers"]["Host"]` | `server.address` |
| `aws_context["function_name"]` | `faas.name` |

- If you're using the GCP integration, the `sampling_context` argument of `traces_sampler` doesn't contain the `gcp_env` and `gcp_event` keys anymore. Instead, the following, if available, is accessible:

| Old sampling context key | New sampling context key |
| --------------------------------- | -------------------------- |
| `gcp_env["function_name"]` | `faas.name` |
| `gcp_env["function_region"]` | `faas.region` |
| `gcp_env["function_project"]` | `gcp.function.project` |
| `gcp_env["function_identity"]` | `gcp.function.identity` |
| `gcp_env["function_entry_point"]` | `gcp.function.entry_point` |
| `gcp_event.method` | `http.request.method` |
| `gcp_event.query_string` | `url.query` |
- The `sampling_context` argument of `traces_sampler` and `profiles_sampler` now additionally contains all span attributes known at span start.
- The integration-specific content of the `sampling_context` argument of `traces_sampler` and `profiles_sampler` now looks different.
- The Celery integration doesn't add the `celery_job` dictionary anymore. Instead, the individual keys are now available as:

| Dictionary keys | Sampling context key | Example |
| ---------------------- | --------------------------- | ------------------------------ |
| `celery_job["args"]` | `celery.job.args.{index}` | `celery.job.args.0` |
| `celery_job["kwargs"]` | `celery.job.kwargs.{kwarg}` | `celery.job.kwargs.kwarg_name` |
| `celery_job["task"]` | `celery.job.task` | |

Note that all of these are serialized, i.e., not the original `args` and `kwargs` but rather OpenTelemetry-friendly span attributes.

- The AIOHTTP integration doesn't add the `aiohttp_request` object anymore. Instead, some of the individual properties of the request are accessible, if available, as follows:

| Request property | Sampling context key(s) |
| ----------------- | ------------------------------- |
| `path` | `url.path` |
| `query_string` | `url.query` |
| `method` | `http.request.method` |
| `host` | `server.address`, `server.port` |
| `scheme` | `url.scheme` |
| full URL | `url.full` |
| `request.headers` | `http.request.header.{header}` |

- The Tornado integration doesn't add the `tornado_request` object anymore. Instead, some of the individual properties of the request are accessible, if available, as follows:

| Request property | Sampling context key(s) |
| ----------------- | --------------------------------------------------- |
| `path` | `url.path` |
| `query` | `url.query` |
| `protocol` | `url.scheme` |
| `method` | `http.request.method` |
| `host` | `server.address`, `server.port` |
| `version` | `network.protocol.name`, `network.protocol.version` |
| full URL | `url.full` |
| `request.headers` | `http.request.header.{header}` |

- The WSGI integration doesn't add the `wsgi_environ` object anymore. Instead, the individual properties of the environment are accessible, if available, as follows:

| Env property | Sampling context key(s) |
| ----------------- | ------------------------------------------------- |
| `PATH_INFO` | `url.path` |
| `QUERY_STRING` | `url.query` |
| `REQUEST_METHOD` | `http.request.method` |
| `SERVER_NAME` | `server.address` |
| `SERVER_PORT` | `server.port` |
| `SERVER_PROTOCOL` | `server.protocol.name`, `server.protocol.version` |
| `wsgi.url_scheme` | `url.scheme` |
| full URL | `url.full` |
| `HTTP_*` | `http.request.header.{header}` |

- The ASGI integration doesn't add the `asgi_scope` object anymore. Instead, the individual properties of the scope, if available, are accessible as follows:

| Scope property | Sampling context key(s) |
| -------------- | ------------------------------- |
| `type` | `network.protocol.name` |
| `scheme` | `url.scheme` |
| `path` | `url.path` |
| `query` | `url.query` |
| `http_version` | `network.protocol.version` |
| `method` | `http.request.method` |
| `server` | `server.address`, `server.port` |
| `client` | `client.address`, `client.port` |
| full URL | `url.full` |
| `headers` | `http.request.header.{header}` |

-The RQ integration doesn't add the `rq_job` object anymore. Instead, the individual properties of the job and the queue, if available, are accessible as follows:

| RQ property | Sampling context key | Example |
| --------------- | ---------------------------- | ---------------------- |
| `rq_job.args` | `rq.job.args.{index}` | `rq.job.args.0` |
| `rq_job.kwargs` | `rq.job.kwargs.{kwarg}` | `rq.job.args.my_kwarg` |
| `rq_job.func` | `rq.job.func` | |
| `queue.name` | `messaging.destination.name` | |
| `rq_job.id` | `messaging.message.id` | |

Note that `rq.job.args`, `rq.job.kwargs`, and `rq.job.func` are serialized and not the actual objects on the job.

- The AWS Lambda integration doesn't add the `aws_event` and `aws_context` objects anymore. Instead, the following, if available, is accessible:

| AWS property | Sampling context key(s) |
| ------------------------------------------- | ------------------------------- |
| `aws_event["httpMethod"]` | `http.request.method` |
| `aws_event["queryStringParameters"]` | `url.query` |
| `aws_event["path"]` | `url.path` |
| full URL | `url.full` |
| `aws_event["headers"]["X-Forwarded-Proto"]` | `network.protocol.name` |
| `aws_event["headers"]["Host"]` | `server.address` |
| `aws_context["function_name"]` | `faas.name` |
| `aws_event["headers"]` | `http.request.headers.{header}` |

- The GCP integration doesn't add the `gcp_env` and `gcp_event` keys anymore. Instead, the following, if available, is accessible:

| Old sampling context key | New sampling context key |
| --------------------------------- | ------------------------------ |
| `gcp_env["function_name"]` | `faas.name` |
| `gcp_env["function_region"]` | `faas.region` |
| `gcp_env["function_project"]` | `gcp.function.project` |
| `gcp_env["function_identity"]` | `gcp.function.identity` |
| `gcp_env["function_entry_point"]` | `gcp.function.entry_point` |
| `gcp_event.method` | `http.request.method` |
| `gcp_event.query_string` | `url.query` |
| `gcp_event.headers` | `http.request.header.{header}` |


### Removed
Expand Down
16 changes: 15 additions & 1 deletion sentry_sdk/integrations/_wsgi_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

import sentry_sdk
from sentry_sdk.scope import should_send_default_pii
from sentry_sdk.utils import AnnotatedValue, logger
from sentry_sdk.utils import AnnotatedValue, logger, SENSITIVE_DATA_SUBSTITUTE

try:
from django.http.request import RawPostDataException
Expand Down Expand Up @@ -221,6 +221,20 @@ def _filter_headers(headers):
}


def _request_headers_to_span_attributes(headers):
# type: (dict[str, str]) -> dict[str, str]
attributes = {}

headers = _filter_headers(headers)

for header, value in headers.items():
if isinstance(value, AnnotatedValue):
value = SENSITIVE_DATA_SUBSTITUTE
attributes[f"http.request.header.{header.lower()}"] = value

return attributes


def _in_http_status_code_range(code, code_ranges):
# type: (object, list[HttpStatusCodeRange]) -> bool
for target in code_ranges:
Expand Down
7 changes: 4 additions & 3 deletions sentry_sdk/integrations/aiohttp.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
from sentry_sdk.sessions import track_session
from sentry_sdk.integrations._wsgi_common import (
_filter_headers,
_request_headers_to_span_attributes,
request_body_within_bounds,
)
from sentry_sdk.tracing import (
Expand Down Expand Up @@ -389,11 +390,11 @@ def _prepopulate_attributes(request):
except ValueError:
attributes["server.address"] = request.host

try:
with capture_internal_exceptions():
url = f"{request.scheme}://{request.host}{request.path}" # noqa: E231
if request.query_string:
attributes["url.full"] = f"{url}?{request.query_string}"
except Exception:
pass

attributes.update(_request_headers_to_span_attributes(dict(request.headers)))

return attributes
11 changes: 7 additions & 4 deletions sentry_sdk/integrations/asgi.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
)
from sentry_sdk.integrations._wsgi_common import (
DEFAULT_HTTP_METHODS_TO_CAPTURE,
_request_headers_to_span_attributes,
)
from sentry_sdk.sessions import track_session
from sentry_sdk.tracing import (
Expand All @@ -32,6 +33,7 @@
)
from sentry_sdk.utils import (
ContextVar,
capture_internal_exceptions,
event_from_exception,
HAS_REAL_CONTEXTVARS,
CONTEXTVARS_ERROR_MESSAGE,
Expand Down Expand Up @@ -348,19 +350,20 @@ def _prepopulate_attributes(scope):
try:
host, port = scope[attr]
attributes[f"{attr}.address"] = host
attributes[f"{attr}.port"] = port
if port is not None:
attributes[f"{attr}.port"] = port
except Exception:
pass

try:
with capture_internal_exceptions():
full_url = _get_url(scope)
query = _get_query(scope)
if query:
attributes["url.query"] = query
full_url = f"{full_url}?{query}"

attributes["url.full"] = full_url
except Exception:
pass

attributes.update(_request_headers_to_span_attributes(_get_headers(scope)))

return attributes
15 changes: 12 additions & 3 deletions sentry_sdk/integrations/aws_lambda.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,10 @@
reraise,
)
from sentry_sdk.integrations import Integration
from sentry_sdk.integrations._wsgi_common import _filter_headers
from sentry_sdk.integrations._wsgi_common import (
_filter_headers,
_request_headers_to_span_attributes,
)

from typing import TYPE_CHECKING

Expand Down Expand Up @@ -162,7 +165,7 @@ def sentry_handler(aws_event, aws_context, *args, **kwargs):
name=aws_context.function_name,
source=TRANSACTION_SOURCE_COMPONENT,
origin=AwsLambdaIntegration.origin,
attributes=_prepopulate_attributes(aws_event, aws_context),
attributes=_prepopulate_attributes(request_data, aws_context),
):
try:
return handler(aws_event, aws_context, *args, **kwargs)
Expand Down Expand Up @@ -468,6 +471,7 @@ def _event_from_error_json(error_json):


def _prepopulate_attributes(aws_event, aws_context):
# type: (Any, Any) -> dict[str, Any]
attributes = {
"cloud.provider": "aws",
}
Expand All @@ -486,10 +490,15 @@ def _prepopulate_attributes(aws_event, aws_context):
url += f"?{aws_event['queryStringParameters']}"
attributes["url.full"] = url

headers = aws_event.get("headers") or {}
headers = {}
if aws_event.get("headers") and isinstance(aws_event["headers"], dict):
headers = aws_event["headers"]

if headers.get("X-Forwarded-Proto"):
attributes["network.protocol.name"] = headers["X-Forwarded-Proto"]
if headers.get("Host"):
attributes["server.address"] = headers["Host"]

attributes.update(_request_headers_to_span_attributes(headers))

return attributes
13 changes: 10 additions & 3 deletions sentry_sdk/integrations/celery/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@
ensure_integration_enabled,
event_from_exception,
reraise,
_serialize_span_attribute,
)

from typing import TYPE_CHECKING
Expand Down Expand Up @@ -514,9 +513,17 @@ def sentry_publish(self, *args, **kwargs):


def _prepopulate_attributes(task, args, kwargs):
# type: (Any, *Any, **Any) -> dict[str, str]
attributes = {
"celery.job.task": task.name,
"celery.job.args": _serialize_span_attribute(args),
"celery.job.kwargs": _serialize_span_attribute(kwargs),
}

for i, arg in enumerate(args):
with capture_internal_exceptions():
attributes[f"celery.job.args.{i}"] = str(arg)

for kwarg, value in kwargs.items():
with capture_internal_exceptions():
attributes[f"celery.job.kwargs.{kwarg}"] = str(value)

return attributes
Loading

0 comments on commit bcadb61

Please sign in to comment.