- Removed the advice
single_aws_zone
,single_azure_zone
andsingle_gcp_zone
and combined them using the generic attributek8s.label.topology.kubernetes.io/zone
. With the new advice, you are no longer required to install the cloud provider specific extension.- If you like to migrate your existing advice state, like created experiments and you are running ON-Premise, you can use the following migration script after installing the new version of the extension:
update sb_onprem.advice set advice_definition_id='com.steadybit.extension_kubernetes.advice.single-zone', validation_states = replace(validation_states::text, 'com.steadybit.extension_kubernetes.single-aws-zone', 'com.steadybit.extension_kubernetes.single-zone')::jsonb where advice_definition_id = 'com.steadybit.extension_kubernetes.advice.single-aws-zone'; update sb_onprem.advice set advice_definition_id='com.steadybit.extension_kubernetes.advice.single-zone', validation_states = replace(validation_states::text, 'com.steadybit.extension_kubernetes.single-azure-zone', 'com.steadybit.extension_kubernetes.single-zone')::jsonb where advice_definition_id = 'com.steadybit.extension_kubernetes.advice.single-azure-zone'; update sb_onprem.advice set advice_definition_id='com.steadybit.extension_kubernetes.advice.single-zone', validation_states = replace(validation_states::text, 'com.steadybit.extension_kubernetes.single-gcp-zone', 'com.steadybit.extension_kubernetes.single-zone')::jsonb where advice_definition_id = 'com.steadybit.extension_kubernetes.advice.single-gcp-zone';
- If you like to migrate your existing advice state, like created experiments and you are running ON-Premise, you can use the following migration script after installing the new version of the extension:
- Update dependencies
- Changed labels for selection templates
- Integrated support for experiment templates in Advice to ease service's validation
- Fixed a bug for Azure and GCP, where DaemonSets aren't considered in an Advice
- Avoid unnecessary enrichment rules for node labels, improving performance
- update dependencies
- Use uid instead of name for user statement in Dockerfile
- Update dependencies (go 1.23)
- Update dependencies
- Increased timeout in the experiment for the single zone advice to detect a pod as being down within 45 seconds instead of just 30 seconds
- Be able to install the extension with a role instead of a service account to be able to work only in one namespace
Example installation:
helm upgrade steadybit-agent --install --namespace <replace-me-with-namespace> \ --create-namespace \ --set agent.key="<replace-me>" \ --set global.clusterName="<replace-me>" \ --set extension-container.container.runtime="<replace-me>" \ --set agent.registerUrl="<replace-me>"\ --set rbac.roleKind="role" \ --set agent.extensions.autodiscovery.namespace="<replace-me-with-namespace>" \ --set extension-kubernetes.role.create=true \ --set extension-kubernetes.roleBinding.create=true \ --set extension-kubernetes.clusterRole.create=false \ --set extension-kubernetes.clusterRoleBinding.create=false \ steadybit/steadybit-agent
- Update dependencies
- Populate all k8s.node labels to host target type.
- Update dependencies
- Renamed "Pod Count Check" to "(Deployment, StatefulSet, DaemonSet) Pod Count Check"
- Pod-Targets now have a unique id. (Used by the UI to fetch details for a specific pod)
- Update dependencies
- Update dependencies (go 1.22)
- Added "Pod Count Check" for StatefulSets and DaemonSets
- Improved advice's experiment for multi availability zones (
single-azure-zone
,single-aws-zone
, andsingle-gcp-zone
) to establish a 20s base-line in the beginning of the experiment - Add namespace label to container, k8s-container, k8s-deployment, k8s-statefulset and k8s-daemonset
- Use FreeMarker syntax for advice templates.
- Ignore Pods not in state "Running" in all discoveries
- Fixed advice's experiment for multi availability zones (
single-azure-zone
,single-aws-zone
, andsingle-gcp-zone
) to consistently use the same zone in every step - Improved instruction text for advice
k8s-single-replica
to better explain how to increase replicas for deployments and HorizontalPodAutoscaler
- Update dependencies
- Remove some attributes which have been used by the old 'weakspot' feature
- Clarify the log message, if the extension stops listing pods, containers and hosts for deployments, statefulsets, etc. because of the
discovery.maxPodCount
configuration
- Update dependencies
- feat: add
host.domainname
attribute containing the host FQDN
- Update dependencies
- fix: update deployments if services/hpas have changes
- fix: integrate kubescore check
horizontalpodautoscaler-replicas
- Update dependencies
- use TargetEnrichmentRule Matcher Regex for copying k8s.label.* to container (exclude k8s.label.topology.*) (needs platform version >= 2.0.0 and agent version >= 2.0.2)
- Crash Loop Attack: validate specified container name with spec
- Crash Loop Attack: ignore when to be killed container is already gone
- Renamed attribute
k8s.deployment.replicas
tok8s.specification.replicas
- Update dependencies
- Add attributes
k8s.label.topology.kubernetes.io/zone
,k8s.label.topology.kubernetes.io/region
,k8s.label.node.kubernetes.io/instance-type
,k8s.label.kubernetes.io/os
andk8s.label.kubernetes.io/arch
to container, host, k8s-container, k8s-deplyoment, k8s-statefulset and k8s-daemonset
- Update extension-kit dependency to prevent a concurrent map write error
- invalid
- Update dependencies
- Discoveries added
- pods
- daemonsets
- statefulsets
- nodes
- Attack 'Delete Pod' added - ❗ Requires new permission
delete
forpods
resources - Attack 'Drain node' added - ❗ Requires new permission
create
forpods/eviction
resources andpatch
fornodes
resources - Attack 'Taint node' added - ❗ Requires new permission
patch
fornodes
resources - Attack 'Scale Deployment' added - ❗ Requires new permission
get
,update
andpatch
fordeployments/scale
resources - Attack 'Scale StatefulSet' added - ❗ Requires new permission
get
,update
andpatch
forstatefulsets/scale
resources - Attack 'Cause Crash Loop' added - ❗ Requires new permission
create
forpod/exec
resources - Added options to check if a pod count increased or decreased to the existing pod count check action
- Performance - Add hostnames to
kubernetes-deployment
during discovery instead of adding it via enrichment rule - Performance - Enrich hosts via
kubernetes-node
instead of frequent enrichments viakubernetes-container
- Added
pprof
endpoints for debugging purposes - Memory optimizations
- Removed the attribute
k8s.container.ready
as this causes unnecessary enrichment noise - Added additional attributes to support advice / weakspots - ❗ Requires new permission
get
,list
, andwatch
forhorizontalpodautoscalers
resources
- Possibility to exclude attributes from discovery
- fix k8s.service.name attribute incorrect for containers in multiple services
kubernetes-container
are handled as enrichment data and not as targets anymore. (This requires at least agent 1.0.92 and platform 1.0.79)
- fix node count check config parsing
- update dependencies
- ignore container with label
steadybit.com.discovery-disabled"="true"
during discovery
- migration to new unified steadybit actionIds and targetTypes
- ignore all labeled deployments and containers from discovery
- fix node count check config parsing
- update dependencies
- Read only file system
- Code refactorings
- Kubernetes Event Log will now listen to a stop method and send the last messages before exiting
- Kubernetes Event Log and Pod Metrics will need a cluster-selection to support multiple kubernetes clusters
- Added Discoveries for Deployments and Container
- Added Pod Count Check and Node Count check
- Added Pod Count Metrics and Event Logs
- Support creation of a TLS server through the environment variables
STEADYBIT_EXTENSION_TLS_SERVER_CERT
andSTEADYBIT_EXTENSION_TLS_SERVER_KEY
. Both environment variables must refer to files containing the certificate and key in PEM format. - Support mutual TLS through the environment variable
STEADYBIT_EXTENSION_TLS_CLIENT_CAS
. The environment must refer to a comma-separated list of files containing allowed clients' CA certificates in PEM format.
- Support for the
STEADYBIT_LOG_FORMAT
env variable. When set tojson
, extensions will log JSON lines to stderr.
- Rollout readiness check always fails when a timeout is specified.
- upgrade
extension-kit
to support additional debugging log output.
- Initial release