-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do we emit internal telemetry that works with existing Datadog Agent operational tooling? #118
Comments
One idea: metric remapping. Conceptually, specific components in Saluki map to specific components in the core Agent. For example, the DogStatsD source in ADP is the For example, Since we should expect to only have one Datadog Metrics destination running in ADP, we could conceivably map all instances of We could likely do this pretty simply with a dedicated transform that remaps metric names, perhaps one even designed solely for remapping to Datadog Agent-equivalent metric names. Biggest downside, I think, is just the general aspect of us having to maintain this mapping in the first place rather than doing it by default. |
Another idea: change all points where we register metrics to also register Datadog Agent-specific versions. Essentially, we would emit duplicate metrics -- a generically-named one for "pure" Saluki usage, and a Datadog Agent-specific one -- and that way anything using Saluki that wasn't ADP could have the more generic/flexible metric names, and ADP could still emit the Datadog Agent-specific metric names to meet our goal of being drop-in compatible. This, obviously, means emitting more telemetry than absolutely necessary. If we really didn't want to do that, we could also have a transform for filtering out the generically-named metrics, leaving only the Datadog Agent-specific ones. We could also, perhaps, try and do something where we have a toggle for emitting the Saluki or Datadog Agent version... but threading that state all through Saluki would be very ugly. |
We've ended up taking an approach (in #240) which blends the two ideas previously mentioned: a dedicated component that "remaps" specific configured metrics, but does so by emitting additional metrics. This was chosen in order to allow us to keep the more intentionally abstract/generic Saluki telemetry, such that we could potentially design a more complete "Agent Data Plane Health" dashboard, while also meeting our goals around operational continuity and emitting equivalent telemetry to ensure existing Agent dashboards, focused on DSD, still work for ADP-based Agent deployments. We're keeping this PR open for now, however, to signify that we don't yet remap all of the internal telemetry that is emitted by the DogStatsD or aggregator components in the Agent. |
At this point, we remap the majority of DogStatsD-specific telemetry that is relevant to day-to-day debug and operations, so we'll close this issue for now and open additional issues for follow-ups around remapping additional bits of telemetry. |
At a high-level, both Datadog Agent and Agent Data Plane/Saluki emit internal telemetry used for debugging performance issues and understanding their operational state. However, the naming differs between the two by a large amount, even for metrics that are functionally identical. This makes it challenging to use ADP, as it currently exists, as a drop-in replacement for DSD support in the core Agent.
The metric prefix we use when emitting internal metrics is configurable at the tippity top when initializing the metrics subsystem via
saluki_app::metrics::initialize_metrics
, so that's fine... but how do we line up individual metrics with their spiritual equivalent in the Datadog Agent?This is a problem we need to solve if we hope to have ADP replace DSD in the core Agent.
The text was updated successfully, but these errors were encountered: