Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access metrics from opa-envoy-plugin via HTTP metrics API #648

Open
emaincourt opened this issue Oct 4, 2021 · 13 comments
Open

Access metrics from opa-envoy-plugin via HTTP metrics API #648

emaincourt opened this issue Oct 4, 2021 · 13 comments
Labels
help wanted Extra attention is needed

Comments

@emaincourt
Copy link
Contributor

Expected Behavior

Hi,

I'm using OPA as an Envoy authz filter with opa-envoy-plugin and can't get to access proper metrics from it. If I understand it well, the plugin does not use HTTP to run queries and metrics against the embed OPA server and therefore the http_request_duration_seconds_bucket does not seem to be enriched throughout time. If I run curl http://localhost:8181/metrics against the side container, I get the proper metrics but with almost no information at all in it:

# HELP http_request_duration_seconds A histogram of duration for requests.
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{code="200",handler="health",method="get",le="1e-06"} 0
http_request_duration_seconds_bucket{code="200",handler="health",method="get",le="5e-06"} 0
http_request_duration_seconds_bucket{code="200",handler="health",method="get",le="1e-05"} 0
http_request_duration_seconds_bucket{code="200",handler="health",method="get",le="5e-05"} 0
http_request_duration_seconds_bucket{code="200",handler="health",method="get",le="0.0001"} 0
http_request_duration_seconds_bucket{code="200",handler="health",method="get",le="0.0005"} 79887
http_request_duration_seconds_bucket{code="200",handler="health",method="get",le="0.001"} 88664
http_request_duration_seconds_bucket{code="200",handler="health",method="get",le="0.01"} 90436
http_request_duration_seconds_bucket{code="200",handler="health",method="get",le="0.1"} 90448
http_request_duration_seconds_bucket{code="200",handler="health",method="get",le="1"} 90448
http_request_duration_seconds_bucket{code="200",handler="health",method="get",le="+Inf"} 90448
http_request_duration_seconds_sum{code="200",handler="health",method="get"} 30.07359942200008
http_request_duration_seconds_count{code="200",handler="health",method="get"} 90448

Basically this only gives me timing information about the health path, which is not really useful. If I'm right, there should also be metrics about the query path. Is it correct ?

If so, is there a way to get access to timing metrics such as timer_rego_builtin_http_send_ns, timer_rego_query_eval_ns or timer_server_handler_ns that are actually exposed in the logs ?

Thanks in advance

@srenatus srenatus added the question Further information is requested label Oct 4, 2021
@srenatus
Copy link
Collaborator

srenatus commented Oct 4, 2021

There are metrics reported in the decision log entries, like this:

{
  "decision_id": "846b21d5-df09-409d-8448-a7d48f32ab5a",
  "input": {
    "attributes": {
      "destination": {
        "address": {
          "socketAddress": {
            "address": "172.19.0.4",
            "portValue": 51051
          }
        }
      },
      "metadataContext": {},
      "request": {
        "http": {
          "headers": {
            ":authority": "127.0.0.1:51051",
            ":method": "POST",
            ":path": "/test.KitchenSink/Exchange",
            ":scheme": "http",
            "content-type": "application/grpc",
            "te": "trailers",
            "user-agent": "grpc-go/1.30.0",
            "x-envoy-auth-partial-body": "false",
            "x-forwarded-proto": "http",
            "x-request-id": "f13b639f-069e-4c99-93c0-ee009883501c"
          },
          "host": "127.0.0.1:51051",
          "id": "9349687039263147790",
          "method": "POST",
          "path": "/test.KitchenSink/Exchange",
          "protocol": "HTTP/2",
          "rawBody": "AAAAAOwiEQh7EgRhcm5vSgcIehIDYm9iKAVNpHCdP2GuR+F6FK7zP3oDasdfggHAAQoLCPrFnf4FELijzjgiAggeOjcKKmdvb2dsZWFwaXMuY29tL2dvb2dsZS5wcm90b2J1Zi5TdHJpbmdWYWx1ZRIJCgdIaXRoZXJlUg4KDAoDZm9vEgUaA2JhcmoIGgZzdHJpbmeCARsKBhoEemVybwoFGgNvbmUKChoIaW5maW5pdHmaAQUKAwAAALIBBgoEYWJjZMoBAggB4gEJCbgehetRuL4/+gEFDY/C9T2SAgIIAaoCAggCwgICCGTaAgIIZQ==",
          "scheme": "http",
          "size": "241"
        },
        "time": "2021-10-04T10:03:31.167166Z"
      },
      "source": {
        "address": {
          "socketAddress": {
            "address": "172.19.0.1",
            "portValue": 64954
          }
        }
      }
    },
    "parsed_body": {
      "neededNumA": 1.23,
      "neededNumB": 1.23,
      "opaqueId": "asdf",
      "person": {
        "id": "123",
        "name": "arno",
        "parent": {
          "id": "122",
          "name": "bob"
        }
      },
      "state": "AWAITING_INPUT",
      "wk": {
        "bigId": "101",
        "bigInt": "2",
        "bool": true,
        "bytes": "AAAA",
        "double": 0.12,
        "float": 0.12,
        "list": [
          "zero",
          "one",
          "infinity"
        ],
        "neat": {
          "@type": "googleapis.com/google.protobuf.StringValue",
          "value": "Hithere"
        },
        "now": "2020-12-02T09:48:42.118723Z",
        "object": {
          "foo": "bar"
        },
        "period": "30s",
        "smallId": 100,
        "smallInt": 1,
        "string": "abcd",
        "value": "string"
      }
    },
    "parsed_path": [
      "test.KitchenSink",
      "Exchange"
    ],
    "parsed_query": {},
    "truncated_body": false,
    "version": {
      "encoding": "protojson",
      "ext_authz": "v3"
    }
  },
  "labels": {
    "id": "e0d479ee-f794-4f0c-bf45-65d98c9e3ad7",
    "version": "0.31.0-envoy-4"
  },
  "level": "info",
  "metrics": {
    "timer_rego_query_eval_ns": 73700,
    "timer_server_handler_ns": 662300
  },
  "msg": "Decision Log",
  "path": "envoy/authz/allow",
  "requested_by": "",
  "result": false,
  "time": "2021-10-04T10:03:31Z",
  "timestamp": "2021-10-04T10:03:31.168389Z",
  "type": "openpolicyagent.org/decision_logs"
}

Note the

  "metrics": {
    "timer_rego_query_eval_ns": 73700,
    "timer_server_handler_ns": 662300
  },

which would also include the timer_rego_builtin_http_send_ns metric if http.send is used in the policy.

Does that help?

@emaincourt
Copy link
Contributor Author

Hi @srenatus,

Thanks for your answer. Actually I'm aware of those metrics being reported in the decision logs. However, I'm trying to get Prometheus to scrape metrics that need to be exposed through an HTTP endpoint and in the right format. I was then expecting that the /metrics endpoint from OPA would still be available but it does not seem so.

Do you know how I could get those metrics to Prometheus ?

Thanks in advance.

@ashutosh-narkar
Copy link
Member

The Prometheus metrics OPA provides are for the http handler. For the opa-envoy plugin the metrics should be in the decision log as Stephan mentioned. Have you looked into the metrics Envoy provides ? It should have information like the number of allowed/denied requests etc.

@srenatus
Copy link
Collaborator

srenatus commented Oct 4, 2021

Hmm I think having an endpoint to scrape the OPA-specific metrics from an opa-envoy-plugin instance isn't an unreasonable expectation.... what would it take to expose the metrics currently pushed through the DL via the (well-known, documented) metrics HTTP endpoint? 🤔

@srenatus srenatus changed the title Access metrics from opa-envoy-plugin Access metrics from opa-envoy-plugin via HTTP metrics API Oct 5, 2021
@srenatus srenatus removed the question Further information is requested label Oct 5, 2021
@emaincourt
Copy link
Contributor Author

@ashutosh-narkar Thanks for your message. Yes that is exactly what I supposed, that those metrics were only http related. Regarding Envoy, metrics exposed from the Istio sidecar in my case do not seem to include anything relative to the authz_filter, but only the whole request itself. Then we can actually get those allowed/denied requests, but not the duration buckets for the envoy filter itself. However I'm pretty new to the Envoy/Istio so I might be wrong.

Having the whole set of OPA related metrics being exposed through HTTP as @srenatus stated would be really awesome. Also not relying on Envoy would allow more granularity in terms of what is being exported I guess.

@be-a-bee
Copy link
Contributor

Hi, we just wanted to know if the following issue would be addressed in the near future. This is an important requirement for us because our Site Reliability Engineering is mandating us to expose the OPA metrics before we can have it in production.

So, we at least want to know that it could be addressed in the coming few weeks.. and also request you to prioritize it if you didn't plan for it earlier.

@ashutosh-narkar
Copy link
Member

@LionOnTheChase if you would like to contribute this feature, we'd be happy to guide you.

@tsandall tsandall added the help wanted Extra attention is needed label Mar 14, 2023
@stale
Copy link

stale bot commented Apr 17, 2023

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.

@be-a-bee
Copy link
Contributor

be-a-bee commented Jun 9, 2023

We have been working around this problem by relying on decision logs for observability data . It turned out that due to high load , the amount of decision logs being generated are consuming high disk usage.

We need to turn off the decision logs but if we do that we will lose all observability built on decision logs.

So, if this enhancement can be added, we can afford to switch of the decision logs and still have observability on OPA service.

Sorry, I don't have the expertise to assist on this PR.

@ashutosh-narkar
Copy link
Member

@LionOnTheChase thanks for the insight.

the amount of decision logs being generated are consuming high disk usage

Decision logs are not persisted to disk. They are in-memory. You can control that via decision_logs.reporting.buffer_size_limit_bytes config parameter.

@stale
Copy link

stale bot commented Jul 9, 2023

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days. Although currently inactive, the issue could still be considered and actively worked on in the future. More details about the use-case this issue attempts to address, the value provided by completing it or possible solutions to resolve it would help to prioritize the issue.

@be-a-bee
Copy link
Contributor

@LionOnTheChase if you would like to contribute this feature, we'd be happy to guide you.

Thanks @ashutosh-narkar . Will get back to you once I do some initial research.

Copy link

stale bot commented Nov 23, 2023

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days. Although currently inactive, the issue could still be considered and actively worked on in the future. More details about the use-case this issue attempts to address, the value provided by completing it or possible solutions to resolve it would help to prioritize the issue.

@anderseknert anderseknert transferred this issue from open-policy-agent/opa Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants