Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/azuremonitorreceiver] feat: Allow to not split result by dimension #36240

Conversation

celian-garcia
Copy link

@celian-garcia celian-garcia commented Nov 6, 2024

Description

Currently there is a mechanism allowing to split the result by dimension, thanks to a filter param hack.
This is great that all the dimensions are collected as labels in the metrics, but for some resources types it could be unwanted. In cause some concern about cardinality, or continuity of the queries between one version to another (e.g with prometheus exporter, if one does not do a "sum by ..." and the otelcol version is updated, the query can display different results)

To mitigate that I propose to put an optout for this collection so if we want for some resource types and not for others, we can create two separate receivers for example. Or we completely opt out if we don't want additional labels.

Edit:

After a first review, we agreed on the fact that it could do a bit more like allowing us to specify a list of dimensions for a particular metric.
e.g from added documentation

receivers:
  azuremonitor:
    dimensions:
      enabled: true
      overrides:
        "Microsoft.Network/azureFirewalls":
          # Real example of an Azure limitation here:
          # Dimensions exposed are Reason, Status, Protocol,
          # but when selecting Protocol in the filters, it returns nothing.
          # Note here that the metric display name is ``Network rules hit count`` but it's programmatic value is ``NetworkRuleHit``
          # Ref: https://learn.microsoft.com/en-us/azure/azure-monitor/reference/supported-metrics/microsoft-network-azurefirewalls-metrics
          "NetworkRuleHit": [Reason, Status]

Without this config you won't have the azure_networkrulehit_average_Count metric at all in your results
But with the config we have it with reason and status labels:

# HELP azure_networkrulehit_average_Count
# TYPE azure_networkrulehit_average_Count gauge
azure_networkrulehit_average_Count{azuremonitor_resource_id="/subscriptions/<redacted>/resourceGroups/<redacted>/providers/Microsoft.Network/azureFirewalls/<redacted>",location="<redacted>",metadata_reason="<redacted>",metadata_status="Allow",name="<redacted>",resource_group="<redacted>",type="Microsoft.Network/azureFirewalls"} 21.875
azure_networkrulehit_average_Count{azuremonitor_resource_id="/subscriptions/<redacted>/resourceGroups/<redacted>/providers/Microsoft.Network/azureFirewalls/<redacted>",location="<redacted>",metadata_reason="<redacted>",metadata_status="Deny",name="<redacted>",resource_group="<redacted>",type="Microsoft.Network/azureFirewalls"} 10

Link to tracking issue

Fixes #36611

Testing

Documentation

@celian-garcia celian-garcia requested review from codeboten and a team as code owners November 6, 2024 15:29
Copy link

linux-foundation-easycla bot commented Nov 6, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: celian-garcia / name: Célian GARCIA (9709d07)

@github-actions github-actions bot requested a review from nslaughter November 6, 2024 15:29
@celian-garcia celian-garcia changed the title feat: Allow to not split result by dimension [receiver/azuremonitorreceiver] feat: Allow to not split result by dimension Nov 6, 2024
@celian-garcia celian-garcia force-pushed the split-dimensions-optout branch 4 times, most recently from aa46e6e to 4f70e3d Compare November 6, 2024 16:06
@celian-garcia
Copy link
Author

Hey @codeboten, @nslaughter, @jpkrohling. Any opinion on that? I feel it's a no brainer so if you can run the pipeline and have a look, it would be cool!

@tesharp
Copy link

tesharp commented Nov 20, 2024

We have noticed that some metrics do not return any data if you include dimensions that do not have any data, are empty or nil (assumption based on results we have seen). For instance Azure Firewall Network Rule Hits have 3 dimensions Status, Reason and Protocol. If filtering on Status and Reason we get back data, but as soon as you add Protocol it returns no data at all. It returns no data if you only try to filter on Protocol too. Seen same for other metrics that have "empty" dimension values.

Would it maybe make sense to not only opt in / out but set which dimensions you would like to filter on? We would still want Reason and Status but can drop Protocol dimension

@celian-garcia
Copy link
Author

We have noticed that some metrics do not return any data if you include dimensions that do not have any data, are empty or nil (assumption based on results we have seen). For instance Azure Firewall Network Rule Hits have 3 dimensions Status, Reason and Protocol. If filtering on Status and Reason we get back data, but as soon as you add Protocol it returns no data at all. It returns no data if you only try to filter on Protocol too. Seen same for other metrics that have "empty" dimension values.

Would it maybe make sense to not only opt in / out but set which dimensions you would like to filter on? We would still want Reason and Status but can drop Protocol dimension

That's interesting. Then for that particular metric, using split_by_dimensions config field would allow you to receive data but you would lose the status and reason granularity. I just checked and indeed we have the same result even in the Azure Portal UI.

Selecting Network rules hit count and Split By = Protocol gives no result.
But for some metrics, like Application rules hits count, you have results split by http/https.
Also the SNAT Port utilization, you have results split by tcp/udp.

I believe this is a bug on the Network rules hit count definition that shouldn't propose protocol actually. I'm not sure that only this reason deserve a hack on our side, but still this feature would be interesting to me. Like it would be interesting to exclude a particular dimensions that would have too big cardinality or that we wouldn't be interested in.

Let me propose a config structure:

dimensions:
  # default to true to not introduce breaking change. This would mean that all the available dimensions will be collected, except if an exclusion exist.
  enabled: true | false 
  exclusions:
    "Microsoft.Network/azureFirewalls": # service name
        "Network rules hit count": # metric name
          - "Protocol"

We can also implement it the other way

dimensions:
  enabled: true 
  overrides:
    "Microsoft.Network/azureFirewalls":
        "Network rules hit count": [Reason, Status]

WDYT?

@tesharp
Copy link

tesharp commented Nov 20, 2024

I also think this is a bug in the api not only affecting this metric. We noticed it yesterday at 11:10 when suddenly all Postgresql flexible databases stoped reporting cpu, memory, storage etc. It had been working before that filtering on ServerName, but suddenly doesn't work anymore.

Nevertheless it seems like a good feature when you do not need the granularity with all dimensions. I think then the last option makes more sense to override and specify which dimensions you need.

@celian-garcia
Copy link
Author

I also think this is a bug in the api not only affecting this metric. We noticed it yesterday at 11:10 when suddenly all Postgresql flexible databases stoped reporting cpu, memory, storage etc. It had been working before that filtering on ServerName, but suddenly doesn't work anymore.

Nevertheless it seems like a good feature when you do not need the granularity with all dimensions. I think then the last option makes more sense to override and specify which dimensions you need.

Okay I will make an update.

@ahurtaud
Copy link

Nevertheless it seems like a good feature when you do not need the granularity with all dimensions. I think then the last option makes more sense to override and specify which dimensions you need.

I agree, second proposed config format is better to me

@celian-garcia celian-garcia force-pushed the split-dimensions-optout branch 2 times, most recently from 1d2a929 to 7bd6cad Compare November 21, 2024 12:57
@celian-garcia
Copy link
Author

celian-garcia commented Nov 21, 2024

ping @tesharp I finished and pushed and I confirm that before this new overrides, the network rule hit metric was not even in the result ! Now it works well 💪

Note it's probably worth it that I move these dimensions functions out of scraper.go file. As if I make scraper_batch right after, I will reuse them..

Edit: moved to a dedicated dimensions.go files + added some unit tests

@celian-garcia celian-garcia force-pushed the split-dimensions-optout branch from 7bd6cad to 2191959 Compare November 21, 2024 15:16
@tesharp
Copy link

tesharp commented Nov 22, 2024

Nice :) Looks good.

@MovieStoreGuy
Copy link
Contributor

Please make sure you run the linters :)

@celian-garcia celian-garcia force-pushed the split-dimensions-optout branch from 2191959 to 72df93d Compare November 27, 2024 08:42
@celian-garcia
Copy link
Author

Please make sure you run the linters :)

Thanks @MovieStoreGuy, sorry for that. It should be fine now.

@celian-garcia celian-garcia force-pushed the split-dimensions-optout branch 2 times, most recently from 221eded to 514efd8 Compare December 2, 2024 08:58
@celian-garcia celian-garcia force-pushed the split-dimensions-optout branch 2 times, most recently from 438ed36 to 2fad09b Compare December 10, 2024 20:27
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Dec 25, 2024
@celian-garcia celian-garcia force-pushed the split-dimensions-optout branch from 2fad09b to 3649244 Compare December 25, 2024 10:34
@github-actions github-actions bot removed the Stale label Dec 26, 2024
Copy link
Contributor

github-actions bot commented Jan 9, 2025

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[receiver/azuremonitorreceiver] feat: Allow to not split result by dimension
5 participants