-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support a DNSSRVNOA Option on the loadbalancing Exporter #18412
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This looks cool, I'll be happy to review a PR adding support for this. |
@jpkrohling If you can point me in the direction of the code that makes the DNS requests in the otel collector I can work on delivering a PR for it. |
Sure, here's a short summary:
That's pretty much it. The current DNS resolver (linked below) just uses the TCP stack to resolve the IP for a name, so it's not directly making a DNS call. You'll likely need to use a go library to perform a SRV query. Unfortunately, I can't recommend one, as I haven't used one yet. For reference, here's the DNS resolver: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/loadbalancingexporter/resolver_dns.go |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
It looks like the service here is located in a k8s cluster, and the k8s resolver (pr #22776) should be better suited to this scenario |
While the Kubernetes service resolver would solve the problem for Kubernetes' deployments, a service record can be used outside of Kubernetes as well. I'll keep this issue open, to gauge interest from other folks. |
**Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> Add k8s service resolver for exporter/loadbalancingexporter The exporter/loadbalancingexporter component currently supports both static and dns resolvers, and does not currently support the ability to load balance pods when the collector application is running in a kubernetes environment. (Backends address discovery is achieved by monitoring kubernetes endpoints resources). This pr provides that capability. **Link to tracking Issue:** <Issue number if applicable> suitable for scenarios where services are located in a k8s cluster #18412 **Testing:** <Describe what testing was performed and which tests were added.> **Documentation:** <Describe the documentation added.> Signed-off-by: Yuan Fang <[email protected]>
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
@fyuan1316 The k8s resolver actually has the same problem. It looks at The static resolver might actually be a potential solution as well. If you were to add some sort of "resolve interval" option to it allowing for more than once resolutions that would also work. |
@jpkrohling Really appreciate the guidance. I've been working through a PR and don't foresee any issues implementing a When using an SRV record and your backends are a
Excuse the example if it's unnecessary, but if we have a
Feel free to correct me here, but I think we have two problems with this. First, the hash ring will be the same and so we won't kick off these two methods. Even if we did something like "on IP change, shuffle the endpoints so a new hash is made" (which feels like it could have its own issues,) we'd still run into issues inside of I don't really have a good idea on how to fix this, but I think conceptually it would take implementing some sort of restart concept. If the hash rings are the same, but a restart flag is present |
Is this just a matter of forcing a new DNS resolution for those hosts? If so, the new resolver can keep a map of hosts and last seen IPs, and do a DNS lookup (a simple LookupIPAddr should be sufficient) when changes are detected. This is exclusive to the business of this resolver, I don't think we need a more generic solution at the load balancer level. |
I'm not sure, but I think for istio/envoy to work it would just require adding a |
It is a matter of forcing new resolution, however we need to be specific about how the resolution is done. I don't know if I'm using the right terminology here when I say "TCP Stack" or "OS", but here is an example. As @gravufo mentioned, setting the sampler to
Hopefully that makes sense. What we want to send to One solution I thought of which I think avoids changing the LB is by calling callback twice. The first time with the restarting endpoints not provided and the 2nd time as normal. |
Here's my branch before writing tests (or testing for that matter.) |
I think there are two valid requests here in this ticket at this point:
I would treat the first as a bug and should enable mTLS with Istio usage. The second is a feature request, of which I would love to review a PR. |
@jpkrohling I'm assuming I am only responsible for #2? I'm hoping to have a PR in the next week or so. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
The linked PR was closed without being merged, I believe it was useful to understand that we need to think about a more complete design for the solution as a whole. |
Apologies for the delay @jpkrohling, I haven't been able to get back to the PR and after writing the code we're able to simply run the fork so it's hard to prioritize. That being said unless I missed something I was waiting for the person who had reviewed it after you to re-review. I'm okay with keeping it closed and waiting for the larger scoped SRV implementation as long as the feature of only doing one resolution (SRV to A, but not A to IP address,) remains. Do you think that would be acceptable? |
Absolutely! I feel like a refactor of the load-balancing exporter is overdue, especially after batching is added to exporters natively. We can use that opportunity to see what we need to change to support this use-case more cleanly. |
@jpkrohling I had one other thought on how to fix this and before I go and ask for time in my July or August sprint to tackle this I figured I'd run it by you. I noticed that |
That sounds like a good idea, and without double-checking the code, that's what I would intuitively expect this resolver to do. |
To summarize and hopefully help someone who came to this thread via a google search. Currently (e.g. OTel LB collector + backend collector), if istio is involved (e.g. injecting istio sidecar to backend service and the mtls mode is STRICT), this will cause isito-proxy to interrupt the connection and the OTel backend collector will not be able to establish connections properly. The istio community already has an issue tracker: istio/istio#37431 The istio community has a long term solution:
For service invocation scenarios, short-term workaround solutions: Additionally, if it is possible to allow the mtls to be set to non-STRICT, this will not cause the above issue. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Not sure if a comment is required to be marked not stale, but I submitted a PR which resolves the overall user story. Using the k8s resolver will work for most, if not all, scenarios. I also agree with the requests made on the SRV solution I worked up. This is a kubernetes solution for a kubernetes problem and will not impact non-k8s environments. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
…35411) **Description:** Adds an optional configuration option to the k8s resolver which allows for hostnames to be returned instead of IPs. This allows certain scenarios like using istio in sidecar mode. Requirements have been added to the documentation. **Link to tracking Issue:** #18412 **Testing:** Added corresponding hostname-based tests for adding backends/endpoints as well as deleting them. There were also tests missing for the k8s handler and so some tests were added as well there. Specifically failing if you want hostnames, but endpoints are returned that do not have hostnames. Aside from unit tests, also ran this in our lab cluster and deleted pods or performed rollouts to our statefulset. Somewhat tangential to the PR itself, but istio now reports mTLS traffic with zero workarounds required which was the motivation for the issue. **Documentation:** Added documentation explaining the new option and the requirements needed to use it. Also added an additional "important" note object specifically calling out that the k8s resolver needs RBAC to work.
…pen-telemetry#35411) **Description:** Adds an optional configuration option to the k8s resolver which allows for hostnames to be returned instead of IPs. This allows certain scenarios like using istio in sidecar mode. Requirements have been added to the documentation. **Link to tracking Issue:** open-telemetry#18412 **Testing:** Added corresponding hostname-based tests for adding backends/endpoints as well as deleting them. There were also tests missing for the k8s handler and so some tests were added as well there. Specifically failing if you want hostnames, but endpoints are returned that do not have hostnames. Aside from unit tests, also ran this in our lab cluster and deleted pods or performed rollouts to our statefulset. Somewhat tangential to the PR itself, but istio now reports mTLS traffic with zero workarounds required which was the motivation for the issue. **Documentation:** Added documentation explaining the new option and the requirements needed to use it. Also added an additional "important" note object specifically calling out that the k8s resolver needs RBAC to work.
Component(s)
exporter/loadbalancing
Is your feature request related to a problem? Please describe.
I discovered when trying to set configure an otel collector to use the loadbalancing to ship to another otel collector deployment that it fails when istio is involved.
Currently DNS requests made for loadbalancing fail to work with ISTIO services because it it doing an A record look up rather than accepting an SRC
When trying to setup a loadbalancing exporter to talk to a k8s headless service it makes a A record look up, that then means it uses the Pod IP address as the host name, and that fails because ISTIO doesn't allow routing via pod IPs.
This is true even if you use the pod hostname:
Istio proxy logs from the request being made:
Example of loadbalancing config:
It works properly if you configure the receiving otel-collector as a stateful set and then use each pod name, because the SRV record will come back matching.
Describe the solution you'd like
Support a similar option to https://thanos.io/tip/thanos/service-discovery.md/#dns-service-discovery where you can set +dnssrvnoa to allow us to use istio with a deployment of the receiving otel-collector
thanos-io/thanos@432785e is the thanos code that does this.
The general ask is that the loadbalancing configuration does the SRV resolve and then from there act as if it is filling in the static section.
Showing the difference:
Notice the SRV part on the end of the second which is what OTEL should accept as an alternative to the A record
Describe alternatives you've considered
Running as a statefulSet works, but doesn't autoscale properly. Right now we have to make do with manually listing out the pods in the stateful set:
Additional context
No response
The text was updated successfully, but these errors were encountered: