-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
azure exporter: Enable features from upstream exporter #5707
Conversation
"ApiName", | ||
"TransactionType", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oddly, locally this is all spaced properly and renders as I would expect. I'm really confused about why it's off when pushed.
`validate_dimensions` is disabled by default to reduce the number of azure exporter instances requires when a `resource_type` has metrics with varying dimensions. When `validate_dimensions` is enabled you will need 1 exporter instance per metric + dimension combination which is more tedious to maintain. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a breaking change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe? This used to default to true
so any existing configs won't break but the default behavior will change. I could flip the default to true
to match the existing behavior or add it in the breaking changes section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
considering the feature, I think it's better to leave the false default and do a breaking change. wdyt @tpaschalis @mattdurham ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand, it would only mean existing configs could potentially start accepting more dimensions although they would be manually specified anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whats the scope of the change between the two? Feels like we are getting more metrics, are we talking 2x,10x,100x? Consider me very ignorant of the internal workings of azure :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some additional context, if it makes sense to do a breaking change then I am onboard with it but it would need to be documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattdurham I don't know much either about azure internals but I'm thinking the "break" is that wrong configs would start working. Say if someone has something like this and the agent is running, it would start working catching the default dimensions for each metric.
prometheus.exporter.azure "example" {
subscriptions = ["<...>"]
resource_type = "Microsoft.Network/applicationGateways"
included_dimensions = []
metrics = [
"BackendResponseStatus",
"Throughput",
]
timespan = "PT1M"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be more of a behavioral change vs a breaking change. It won't change any currently working configuration someone will be running. They will still get the same amount of data.
On main a config which requests this data from azure fails but runs without issue on this branch,
resource_type: ["microsoft.containerservice/managedclusters"]
metrics:
- node_cpu_usage_millicores
- node_cpu_usage_percentage
- node_disk_usage_bytes
- node_disk_usage_percentage
- node_memory_rss_bytes
- node_memory_rss_percentage
- node_memory_working_set_bytes
- node_memory_working_set_percentage
- node_network_in_bytes
- node_network_out_bytes
included_resource_tags:
- environment
included_dimensions:
- node
- nodepool
- device
The device
dimension is only valid for some of the requested metrics, https://learn.microsoft.com/en-us/azure/azure-monitor/reference/supported-metrics/microsoft-containerservice-managedclusters-metrics. The only valid way to do this on main was to split the config in to multiple definitions (do a self scrape in static mode or multiple exporter definitions in flow mode).
Without validation though there's no safe guards for mistyping/misconfiguration so accidentally doing something like devie
would not give you an error where it would previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable then and pretty fine with this then, maybe add the equivalent of this blurb to breaking changes just to cover our bases.
1. Set the regions to gather metrics from and get metrics for all resources across those regions | ||
1. This will make an API call per subscription reducing the number of API calls dramatically | ||
1. This approach does not work with all resource types and Azure's does not document which resource types do/do not work | ||
1. A resource type which is not support will produce errors which look like `Resource type: microsoft.containerservice/managedclusters not enabled for Cross Resource metrics` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the detail. What is the recourse if a user hits this error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great question, I think the only option is to fall back to the resource graph API based approach which I can add in this list to be clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome I think that would help!
The exporter offers two options for gathering metrics, | ||
1. (Default) Use an [Azure Resource Graph](https://azure.microsoft.com/en-us/get-started/azure-portal/resource-graph/#overview) query to identify resources for gathering metrics | ||
1. This will make 1 API call per resource identified | ||
1. Subscriptions with a reasonable amount of resources are liable to hit the [12000 requests per hour rate limit](https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/request-limits-and-throttling#subscription-and-tenant-limits) azure enforces | ||
1. Set the regions to gather metrics from and get metrics for all resources across those regions | ||
1. This will make an API call per subscription reducing the number of API calls dramatically | ||
1. This approach does not work with all resource types and Azure's does not document which resource types do/do not work | ||
1. A resource type which is not support will produce errors which look like `Resource type: microsoft.containerservice/managedclusters not enabled for Cross Resource metrics` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exporter offers two options for gathering metrics, | |
1. (Default) Use an [Azure Resource Graph](https://azure.microsoft.com/en-us/get-started/azure-portal/resource-graph/#overview) query to identify resources for gathering metrics | |
1. This will make 1 API call per resource identified | |
1. Subscriptions with a reasonable amount of resources are liable to hit the [12000 requests per hour rate limit](https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/request-limits-and-throttling#subscription-and-tenant-limits) azure enforces | |
1. Set the regions to gather metrics from and get metrics for all resources across those regions | |
1. This will make an API call per subscription reducing the number of API calls dramatically | |
1. This approach does not work with all resource types and Azure's does not document which resource types do/do not work | |
1. A resource type which is not support will produce errors which look like `Resource type: microsoft.containerservice/managedclusters not enabled for Cross Resource metrics` | |
The exporter offers the following two options for gathering metrics. | |
1. (Default) Use an [Azure Resource Graph](https://azure.microsoft.com/en-us/get-started/azure-portal/resource-graph/#overview) query to identify resources for gathering metrics. | |
1. This query will make one API call per resource identified. | |
1. Subscriptions with a reasonable amount of resources can hit the [12000 requests per hour rate limit](https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/request-limits-and-throttling#subscription-and-tenant-limits) Azure enforces. | |
1. Set the regions to gather metrics from and get metrics for all resources across those regions. | |
1. This option will make one API call per subscription, dramatically reducing the number of API calls. | |
1. This approach does not work with all resource types, and Azure does not document which resource types do or do not work. | |
1. A resource type that is not supported produces errors that look like `Resource type: microsoft.containerservice/managedclusters not enabled for Cross Resource metrics`. |
## Authentication | ||
|
||
Grafana agent must be running in an environment with access to Azure. The exporter uses the Azure SDK for go and supports [authentication](https://learn.microsoft.com/en-us/azure/developer/go/azure-sdk-authentication?tabs=bash#2-authenticate-with-azure). | ||
|
||
The account used by Grafana Agent needs: | ||
|
||
- [Read access to the resources that will be queried by Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview#permissions-in-azure-resource-graph) | ||
- When using an Azure Resoure Graph query, [read access to the resources that will be queried by Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview#permissions-in-azure-resource-graph) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- When using an Azure Resoure Graph query, [read access to the resources that will be queried by Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview#permissions-in-azure-resource-graph) | |
- [Read access to the resources that will be queried by Resource Graph](https://learn.microsoft.com/en-us/azure/governance/resource-graph/overview#permissions-in-azure-resource-graph) if you are using an Azure Resource Graph query. |
| `metric_help_template` | `string` | Description of the metric. | `"Azure metric {metric} for {type} with aggregation {aggregation} as {unit}"` | no | | ||
| Name | Type | Description | Default | Required | | ||
|-------------------------------|----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------|----------| | ||
| `subscriptions` | `list(string)` | List of subscriptions to scrap metrics from. | | yes | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| `subscriptions` | `list(string)` | List of subscriptions to scrap metrics from. | | yes | | |
| `subscriptions` | `list(string)` | List of subscriptions to scrape metrics from. | | yes | |
| `subscriptions` | `list(string)` | List of subscriptions to scrap metrics from. | | yes | | ||
| `resource_type` | `string` | The Azure Resource Type to scrape metrics for. | | yes | | ||
| `metrics` | `list(string)` | The metrics to scrape from resources. | | yes | | ||
| `resource_graph_query_filter` | `string` | The [Kusto query][] filter to apply when searching for resources. Cannot be used if `regions` is set. | | no | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| `resource_graph_query_filter` | `string` | The [Kusto query][] filter to apply when searching for resources. Cannot be used if `regions` is set. | | no | | |
| `resource_graph_query_filter` | `string` | The [Kusto query][] filter to apply when searching for resources. Can't be used if `regions` is set. | | no | |
| `resource_type` | `string` | The Azure Resource Type to scrape metrics for. | | yes | | ||
| `metrics` | `list(string)` | The metrics to scrape from resources. | | yes | | ||
| `resource_graph_query_filter` | `string` | The [Kusto query][] filter to apply when searching for resources. Cannot be used if `regions` is set. | | no | | ||
| `regions` | `list(string)` | The list of regions for gathering metrics and enables gathering metrics for all resources in the subscription. Cannot be used if `resource_graph_query_filter` is set. | | no | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| `regions` | `list(string)` | The list of regions for gathering metrics and enables gathering metrics for all resources in the subscription. Cannot be used if `resource_graph_query_filter` is set. | | no | | |
| `regions` | `list(string)` | The list of regions for gathering metrics and enables gathering metrics for all resources in the subscription. Can't be used if `resource_graph_query_filter` is set. | | no | |
|
||
# Optional: The list of regions for gathering metrics. Enables gather metrics for all resources in the subscription. | ||
# The list of available `regions` to your subscription can be found by running the azure CLI command `az account list-locations --query '[].name'`. | ||
# Cannot be used if `resource_graph_query_filter` is set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Cannot be used if `resource_graph_query_filter` is set. | |
# Can't be used if `resource_graph_query_filter` is set. |
@@ -93,7 +100,14 @@ The account used by Grafana Agent needs: | |||
|
|||
# Optional: The [kusto query](https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/) filter to apply when searching for resources | |||
# This value will be embedded in to a template query of the form `Resources | where type =~ "<resource_type>" <resource_graph_query_filter> | project id, tags` | |||
# Cannot be used if `regions` is set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Cannot be used if `regions` is set. | |
# Can't be used if `regions` is set. |
@@ -137,6 +151,10 @@ The account used by Grafana Agent needs: | |||
|
|||
# Optional: Which azure cloud environment to connect to, azurecloud, azurechinacloud, azuregovernmentcloud, or azurepprivatecloud | |||
[azure_cloud_environment: <string> | default = "azurecloud"] | |||
|
|||
# Optional: validation is disabled by default to reduce the number of azure exporter instances required when a `resource_type` has metrics with varying dimensions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Optional: validation is disabled by default to reduce the number of azure exporter instances required when a `resource_type` has metrics with varying dimensions. | |
# Optional: Validation is disabled by default to reduce the number of Azure exporter instances required when a `resource_type` has metrics with varying dimensions. |
@@ -77,6 +90,8 @@ Tags in `included_resource_tags` will be added as labels with the name `tag_<tag | |||
|
|||
Valid values for `azure_cloud_environment` are `azurecloud`, `azurechinacloud`, `azuregovernmentcloud` and `azurepprivatecloud`. | |||
|
|||
`validate_dimensions` is disabled by default to reduce the number of azure exporter instances requires when a `resource_type` has metrics with varying dimensions. When `validate_dimensions` is enabled you will need 1 exporter instance per metric + dimension combination which is more tedious to maintain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`validate_dimensions` is disabled by default to reduce the number of azure exporter instances requires when a `resource_type` has metrics with varying dimensions. When `validate_dimensions` is enabled you will need 1 exporter instance per metric + dimension combination which is more tedious to maintain. | |
`validate_dimensions` is disabled by default to reduce the number of Azure exporter instances requires when a `resource_type` has metrics with varying dimensions. When `validate_dimensions` is enabled you will need one exporter instance per metric + dimension combination which is more tedious to maintain. |
@@ -137,6 +151,10 @@ The account used by Grafana Agent needs: | |||
|
|||
# Optional: Which azure cloud environment to connect to, azurecloud, azurechinacloud, azuregovernmentcloud, or azurepprivatecloud | |||
[azure_cloud_environment: <string> | default = "azurecloud"] | |||
|
|||
# Optional: validation is disabled by default to reduce the number of azure exporter instances required when a `resource_type` has metrics with varying dimensions. | |||
# Choosing to enable `validate_dimensions` will require 1 exporter instance per metric + dimension combination which can be very tedious to maintain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Choosing to enable `validate_dimensions` will require 1 exporter instance per metric + dimension combination which can be very tedious to maintain. | |
# Choosing to enable `validate_dimensions` will require one exporter instance per metric + dimension combination which can be very tedious to maintain. |
@mattdurham and @clayton-cornell I think I covered all the PR feedback if you wanted to take another pass |
This PR has not had any activity in the past 30 days, so the |
@mattdurham This one fell off my ToDo list. There's nothing more I have to add for docs input. Over to you for a final look/approval? |
I'll happily resolve conflicts after getting a 👍 on the changes. The |
33f19e1
to
2a22f19
Compare
* Update azure exporter for ValidateDimension support * Add support for subscription scoped metric gathering * Add docs for validate_dimensions * Fix conditional for when to use subscription scope * changelog entry * Fix crow error from prom common upgrade * Code review feedback * Cleanup merge + missing review comments + fix deprecations from go.mod update
PR Description
This PR adds two features from the upstream exporter which should help reduce the complexity when pull metrics from azure,
I did as much as I could to add further context around these in the code/docs.
Which issue(s) this PR fixes
Fixes: #5411
Azure API rate limiting has come up internally but I couldn't find a corresponding issue to close. I can make one if needed
Notes to the Reviewer
PR Checklist