-
Notifications
You must be signed in to change notification settings - Fork 56
Logging Agent Architecture
Log shipping is done using the Opensearch Data Prepper application. This is configured to write logs to the Opni Opensearch endpoint using the cluster specific indexing user. Data Prepper is set up with an http input for receiving logs. Data Prepper and its configuration are managed by a custom resource, and controller that ships with the Opni Agent.
Log collection is configured by the Banzaicloud Logging Operator. This is imported as a library and run with the other controllers by the Opni Agent. Opni will create a ClusterOutput that points to the Data Prepper http interface, a ClusterFlow that collects all pod logs, and a generic Logging. It will also deploy a cluster specific log scraper for control plane logs (if available).
When Logging is enabled the Logging plugin will also begin watching all Kubernetes events, which are sent to the Data Prepper http interface.
If the Opni Agent disconnects from the Gateway for any reason logging will automatically be disabled until a connection can be re-established.
- Architecture
- Plugin interface
- Kubernetes cluster driver
- Events cluster driver
- Scale and performance
- Security
- High availability
- Testing
This receives SyncNow
requests from the gateway. It then passes these to the cluster drivers to take the appropriate action based on whether the status is enabled or not.
The Logging status message contains the following data along with the enabled boolean
- Opensearch username
- Opensearch password
- Opensearch external URL
This is the internal object responsible for managing the state of the Kubernetes resources used for Logging.
If the Logging capability is enabled the driver will reconcile the Kubernetes resources. If the capability is being disabled the resources will be deleted. The resources it manages are as follows:
LogAdapter
This is a wrapper resource. It creates a Banzaicloud Logging, and creates a Fluentbit DaemonSet specific to the Kubernetes distribution.
ClusterFlow
Manages the log scraping configuration. This will exclude the Opni system namespace from log scraping.
ClusterOutput
Configures the Fluentd output to be the Opni Shipper Data Prepper
DataPrepper
Manages the Data Prepper configuration. It will be updated with the URL and username and password sent in the sync request. The password is stored in a Secret which is referenced in the DataPrepper resource.
This is the internal object responsible for gathering events in the agent cluster
If the Logging capability is enabled the driver will start a watch on the Kubernetes events API. Events are buffered in an internal queue and sent as json documents to Data Prepper. If the capability is disabled the watch is canceled and the queue is purged.
Currently agent throughput doesn't represent a bottleneck so scaling and performance hasn't been extensively tested. If this changes in the future components may be scaled out using the custom resources.
Username and passwords are isolated to an individual cluster. The user account only has write access so should it be leaked it won't be able to read any data.
System works on eventual consistency so HA isn't a design consideration. All components have internal buffering so if an upstream service become unavailable they will retry.
All testing for this system is currently manual.
Architecture
- Backends
- Core Components