Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow to manage multiple opentelemetry collector with the same supervisor in IoT Gateway Environments #33682

Open
cforce opened this issue Jun 20, 2024 · 6 comments

Comments

@cforce
Copy link

cforce commented Jun 20, 2024

Component(s)

No response

Is your feature request related to a problem? Please describe.

The Supervisor should run only once per host group or cluster, managing all connected Collectors. This would reduce overhead for clusters and hosts with multiple VMs by centralizing management to a single Supervisor. This setup would enable one-to-many upgrades by a single Supervisor and synchronous configuration updates for all Collectors.

Note: Remote execution of Collectors on hosts different from the Supervisor is currently not supported (e.g., via SSH). Collectors need to connect to the Supervisor on a well-known host and port. This requires either pre-configured port knowledge on the Supervisor or multiplexing multiple agents on the same port.

Describe the solution you'd like

Dynamic port allocation, which already exists, should be used to serve multiple connected Collectors. This is beneficial for low-resource IoT field devices with dedicated Internet hubs/uplinks and for Kubernetes environments, reducing the need for more Supervisors than Collectors.

In addition to the current automated startup of the Collector triggered by execution on the same host, the initial startup/bootstrapping of the Collector should be handled by the Collector itself. The Collector should know which port to connect to the Supervisor, and the Supervisor should reserve such ports for the Collectors. This requires the Supervisor to listen on preconfigured ports, and the Collector should independently try to connect to this host and port as known ahead.

Alternatively, instead of using dedicated ports per Collector, allowing multiplexing of different clients on the same port could simplify bootstrapping based on a static, well-known host and port for all Collectors managed by the Supervisor. For security, each Collector should be authorized, using something like oauth2clientauthextension on the Collector side and oidcauthextension on the Supervisor side. Integration of OAuth into OpAMP client-server flows is also requested here #32762.

Describe alternatives you've considered

No response

Additional context

No response

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@cforce
Copy link
Author

cforce commented Sep 16, 2024

Enhanced Proposal for Extending OpAMP to Non-Kubernetes Environments

The OpAMP bridge currently used in Kubernetes should be extended to non-Kubernetes environments where the bridge serves as a gateway for managing Collectors on remote hosts.
Learn more about OpAMP in Kubernetes.

This enhanced proposal builds upon the foundational OpAMP design for Kubernetes and adapts it for non-Kubernetes environments, making it suitable for managing Collectors deployed in distributed, resource-constrained settings like IoT gateways or edge devices.

In this scenario, the bridge would act as an intermediary (or gateway) that manages communication with Collectors only reachable via the IoT gateway. This would mimic its Kubernetes role but adapted for IoT networks.


Enhanced Proposal: Adapting OpAMPBridge for Non-Kubernetes Deployments

1. OpAMPBridge for Non-Kubernetes Deployments

In Kubernetes, the OpAMPBridge resource manages the state of OpenTelemetry Collectors within a cluster. This concept can be adapted to non-Kubernetes environments where Collectors are deployed across various hosts or devices.

Key Functions of the Bridge:

  • Collector Management:

    • The OpAMPBridge will manage the lifecycle of Collectors (start, stop, update configurations) across connected devices, ensuring configurations stay synchronized with the OpAMP server.
  • Health Monitoring:

    • The bridge will collect and report health metrics and telemetry data from the Collectors, forwarding this to the OpAMP server for centralized monitoring.
  • Label-Based Management:

    • Labels like opentelemetry.io/opamp-reporting and opentelemetry.io/opamp-managed can be used to manage and track Collectors in non-Kubernetes environments, just as in Kubernetes clusters.

Deployment Architecture:

  • The OpAMPBridge will be deployed on a central device (e.g., an IoT gateway) with network access, while Collectors are distributed across other hosts or devices.
  • Collectors will connect to this bridge to receive configuration updates, send telemetry, and report health status.
  • In environments with limited connectivity, the OpAMPBridge acts as an intermediary, managing communications between Collectors and the central OpAMP server.

2. Dynamic Port Allocation and Multiplexing

Dynamic port allocation can be utilized to handle multiple Collectors. Rather than dedicating a port for each Collector, the bridge will:

  • Single-Port Multiplexing: Use a well-known port to handle connections from multiple Collectors.
  • Secure Authorization: OAuth2/OIDC mechanisms will authenticate Collectors before allowing connections to the bridge, ensuring security.

3. Supervisor Process

In non-Kubernetes setups, the Supervisor will manage the configuration of Collectors. Supervisors may either be embedded within Collectors or deployed separately.

  • Standalone Supervisor: Runs on a nearby host, managing Collectors directly.
  • Embedded Supervisor: In resource-constrained environments, the Supervisor is embedded within each Collector to reduce overhead.

4. Process Flow for Non-Kubernetes Environments

  1. Collector Initialization: Collectors are bootstrapped with the bridge’s host and port information, as well as authentication credentials.
  2. Connection to OpAMPBridge: Collectors connect to the OpAMPBridge, which forwards configuration updates and reports telemetry to the OpAMP server.
  3. Health Reporting: The bridge monitors health and status of the Collectors, sending reports to the OpAMP server.
  4. Configuration Updates: The OpAMP server sends configuration changes to the bridge, which applies them to the relevant Collectors.
  5. Multiplexing: The bridge uses multiplexing to efficiently manage communication over a single port, optimizing resource use.

5. Security Considerations

  • OAuth2/OIDC: Collectors will authenticate via OAuth2/OIDC before connecting to the bridge, ensuring only authorized devices are managed.
  • Encrypted Communication: All communication between the Collectors, bridge, and OpAMP server will be encrypted for secure data transmission.

6. Future Extensions

  • Instrumentation Management: The OpAMPBridge can be extended to manage Instrumentation resources alongside Collectors, providing comprehensive management across diverse environments.
  • Proxying OpAMP Connections: In scenarios where network reliability is an issue, the bridge could also take on the role of proxying OpAMP connections, although this introduces additional complexity.

Benefits of Extending OpAMP Beyond Kubernetes

  • Centralized Management: Collectors deployed across edge devices or IoT networks can now be managed from a single OpAMP server, simplifying updates and monitoring.
  • Scalability: By using multiplexing and dynamic port allocation, the system can scale efficiently without the need for additional Supervisors.
  • Improved Observability: The OpAMPBridge reports telemetry and status data from all connected Collectors, providing a clear, unified view of the fleet’s health.
  • Enhanced Security: OAuth2/OIDC ensures secure communication between Collectors and the OpAMPBridge, protecting against unauthorized access.

@cforce cforce changed the title allow to manage multiple opentelemetry collector with the same supervisor allow to manage multiple opentelemetry collector with the same supervisor in IoT Gateway Environments Sep 16, 2024
@cforce
Copy link
Author

cforce commented Oct 25, 2024

Related context is managing configuration of other collectors which sent to collectors - eg beyla's config.yaml
See #34321

Copy link
Contributor

Pinging code owners for cmd/opampsupervisor: @evan-bradley @atoulme @tigrannajaryan @BinaryFissionGames. See Adding Labels via Comments if you do not have permissions to add labels yourself.

Copy link
Contributor

Pinging code owners for extension/opamp: @portertech @evan-bradley @tigrannajaryan @BinaryFissionGames. See Adding Labels via Comments if you do not have permissions to add labels yourself.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Dec 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants