- Added
spec.pod.DNSTimeoutInSecond
toEtcdCluster
that allows setting a maximum allowed time for the init container of the etcd pod to reverse DNS lookup its IP given the hostname.
- Update Go version to 1.11.2
- Update k8s to 1.11.4
- k8s codegen updates are longer performed via container. Go dependencies are now vendored and updates are performed with shell script locally.
- Fixed leaking http connections while verifying backup snapshots. #1976
- Added the field
spec.pod.securityContext
toEtcdCluster
that allows setting a specific PodSecurityContext for the etcd pods. #1949
- Update Go version to 1.10
- Build
gcr.io/coreos-k8s-scale-testing/etcd-operator-builder:0.4.1-2
container with Go 1.10 and dep 0.4.1 - etcd pod containers no longer run with a non-root security context by default. This setting can be configured per cluster via the PodPolicy.
- Added optional flag
--cluster-wide
to etcd-operator to allow it to manage etcd clusters across all namespaces. #1777 - Added support for annotation
etcd.database.coreos.com/scope: clusterwide
inEtcdCluster
to allow it to be managed by a cluster wide operator. #1777 - Added the field
spec.pod.busyboxImage
to thePodPolicy
of theEtcdCluster
to allow overriding the default busybox image used for the etcd pod's init container. #1928
- Fixed a bug where the informer watch stream would timeout after 30s of not receiving an event. #1936
Same as v0.8.4. The version is bumping to 0.9.0 due to adding a new ABS backup API into etcd-backup-operator.
- Added ABS support for backup and restore
- Added tag to initContainer to enable offline deploys
- Enabled configurable backup timeout in backup operator
- Set 30s default request timeout for kube client
- Change check-dns init container image to busybox:1.28.0-glibc to fix nslookup failure in some environment.
- Removed self-hosted code
- Added the option to use PersistentVolume as non-stable storage for etcd pods. This feature is still alpha and subject to change in future releases #1861
- Changed etcd pod member names to be unique by having a random suffix instead of a sequence number. This change is backward compatible and should not affect operator upgrade.
Previously the etcd pod names would look like:
After this change:
NAME READY STATUS RESTARTS AGE example-etcd-cluster-0000 1/1 Running 0 1m example-etcd-cluster-0001 1/1 Running 0 1m example-etcd-cluster-0002 1/1 Running 0 1m
NAME READY STATUS RESTARTS AGE example-etcd-cluster-2885zjw9he 1/1 Running 0 1m example-etcd-cluster-gghrmbeid4 1/1 Running 0 1m example-etcd-cluster-w5q9sn37fd 1/1 Running 0 1m
- Fixed a bug where the restore operator would fail to restore the seed member because recreating an etcd pod with the same name as a recently deleted one would conflict as the older pod and its resources, like the DNS name, might still not be deleted. #1825
- Add support for backup and restore from custom S3 endpoint.
- All etcd pod containers now run as non-root.
- etcd-restore-operator will create a service for itself as the backup storage proxy. Delete the service in deployment yaml.
- Fix etcd-restore-operator wouldn't report error and keep looping if EtcdRestore name is different than EtcdCluster name.
Important Changes
Both etcd backup operator and etcd restore operator have changed their CR definition. Please follow the latest backup/restore CR definition for future backup and restore.
- Add readiness probe to etcd pod. The readiness state will be reflected on
status.members.ready/unready
. - TLS etcd cluster support in backup/restore-operator.
- Add spec validation in restore operator.
- Add BackupStorageType to EtcdRestore.RestoreSpec to indicate type of the backup storage which is used as RestoreSource and validation of BackupStorageType in restore operator.
- Add EtcdClusterRef to EtcdRestore.RestoreSpec to reference an EtcdCluster resource whose metadata and spec will be used to create the new restored EtcdCluster CR.
- Add create-crd flag to etcd backup operator allowing user to disable automatic backup CRD creation.
- Add create-crd flag to etcd restore operator allowing user to disable automatic restore CRD creation.
- Add EtcdVersion and EtcdRevision to EtcdBackup.BackupStatus.
- BackupStatus: Add detailed error when backup fails.
- Rename BackupCRStatus to BackupStatus.
- EtcdBackup: BackupSpec passes in S3BackupSource.Path as the S3 path to save the backup.
- EtcdBackup: BackupSpec spec uses etcd endpoints to retrieve snapshot.
- Change default etcd version to
3.2.13
.
- EtcdBackup: BackupSpec removed ClusterName field in favor of etcd endpoints.
- EtcdCluster: ClusterSpec removed deprecated BaseImage field.
Note: This is a bug fix release.
When we bump the etcd version to 3.2, the images were only available on gcr.io . But now it is added on quay.io . We'd better use quay.io and keep it compatible to work for 3.1 versions of etcd.
- TLS etcd cluster feature for EtcdBackup
- Log collector program for collecting logs in e2e test.
- ClusterSpec: In PodPolicy, add generic
Affinity
field to substitute boolAntiAffinity
field. - ClusterSpec: Add
Repository
field to substituteBaseImage
field.
- Default base image is changed to
gcr.io/etcd-development/etcd
, default etcd version is3.2.11
. - Migrate dependency management tooling from glide to dep.
- Containerize e2e test in a pod instead of running on raw jenkin slave.
- ClusterSpec: Remove
PodPolicy.AutomountServiceAccountToken
field. No etcd pod will have service account token mounted.
- Ignore Terminating pods when polling etcd pods.
- ClusterSpec:
BaseImage
is deprecated. It will be automatically converted toRepository
in this release. - ClusterSpec: In PodPolicy,
AntiAffinity
is deprecated. It will be automatically converted toAffinity.PodAntiAffinity
terms with label selector on given cluster name and topology key on node in this release.
- All operator images by default uses user
etcd-operator
instead of root.
Existing backup and restore features in EtcdCluster API won’t be supported after 0.7.0 release. See Decoupling Backup and Restore Logic from Etcd Operator for more detail.
If applicable then see the upgrade guide on how to upgrade from v0.6.1
to v0.7.0
.
- Add
ServiceName
andClientPort
into ClusterStatus. - Add etcd backup operator for backing up an etcd cluster.
- Add etcd restore operator for restoring an etcd cluster.
- Remove
pv-provisioner
flag from etcd operator. - Remove etcd cluster Backup feature from etcd operator.
- Remove etcd cluster Restore from etcd operator.
The operator will no longer create a storage class specified by --pv-provisioner
by default. If applicable then see the upgrade guide on how to upgrade from v0.6.0
to v0.6.1
.
- backup binary supports serving backup defined by backupSpec. In addition, when backupSpec is specified, backup binary changes to serve http backup requests only mode.
- Add operator flag
--create-crd
. By default it istrue
and operator will create EtcdCluster CRD. It can be set tofalse
and operator won't create EtcdCluster CRD. - Add operator flag
--create-storage-class
. By default it isfalse
and operator won't create default storage class. It can be set totrue
and operator will create default storage class.
- An EtcdCluster CR with an invalid spec will not be marked as failed. Any changes that result in an invalid spec will be ignored and logged by the operator.
- Fix the problem that operator might keep failing on version conflict updating CR status.
- The operator flag
--pv-provisioner
is depercated. We recommend to use per cluster storageClass.
BREAKING CHANGE: operator level S3 backup is removed. See upgrade on how to upgrade from 0.5.x to 0.6.0.
- Add cluster events into EtcdCluster custom resource. See
doc/user/conditions_and_events.md
.
- Redefine status.conditions. See
doc/user/conditions_and_events.md
.
- Remove operator level S3 flag.
- Remove analytics flag. Disable Google analytics.
- Expose
/metrics
endpoint at port 8080 - Add cluster S3 spec
prefix
feature. Let user choose a prefix under the bucket. - Add
automountServiceAccountToken
to pod policy. Let users disable automounting of the Kubernetes access token into etcd-operator controlled pods. - Cluster backups can now be saved using Azure Blob Storage (ABS).
- Deprecate operator S3 flag. Add warning note for using it in this release. The flag will be removed in 0.6.0 release.
Upgrade notice for TLS cluster users: If you are using TLS-enabled etcd cluster, the SAN domain has been changed. See TLS docs. Before upgrading operator, you need to rotate certs on each secrets to allow both the old and new domains. Then restart each etcd pod -- the simplest way is to "upgrade" cluster version. Finally, it is safe to upgrade operator. It's highly recommended to save a backup before upgrade.
- A new
StorageClass
spec field, allowing more granular control over how etcd clusters are backed up to PVs.
- Default timeout for snapshots done by backup sidecar increased from 5 seconds to 1 minute
- Fix periodFullGC only executed once problem.
- [GH-1021] Use the cluster domain provided by kubelet instead of hardcoded
.cluster.local
.
BREAKING CHANGE: The cluster object will now be defined via a Custom Resource Definition(CRD) instead of a Third Party Resource(TPR). See the Changed
section below for details.
-
With k8s 1.7 and onwards TPRs have been deprecated and are replaced with CRD. See the k8s 1.7 blogpost or release notes for more details. For this release a live migration of the cluster spec from TPR to CRD is not supported. To preserve the cluster state during the upgrade you will need to create a backup of the cluster and recreate the cluster from the backup after upgrading the operator. See the upgrade guide for more detailed steps on how to do that.
-
Changes in the cluster object's type metadata:
- The
apiVersion
field has been changed frometcd.coreos.com/v1beta1
toetcd.database.coreos.com/v1beta2
- The
kind
field has been changed fromCluster
toEtcdCluster
- The
- GH-1232 the operator can now log critical actions like pod creation/deletion to a user specified path via the optional flag
debug-logfile-path
. The logs will only be generated if the cluster is self hosted and the flag is set. This can be used in conjunction with a persistent volume to persist the critical actions to disk for later inspection.
- enable alpha feature "tolerate unready endpoints" on etcd client and peer service
- Fix append-hosts init-container not run on some restart cases.
This is a bug-fix release. We have done a lot of testing against k8s 1.7 and making it stable on 1.7 .
- New self-hosted field
SkipBootMemberRemoval
allows users to skip the auto-deletion of the boot etcd member.
- Make sure etcd pod's FQDN is resolvable before running etcd commands .
BREAKING CHANGE: Re-naming of TLS spec and TLS secrets' fields.
TLS spec:
- member's
clientSecret
is changed toserverSecret
TLS secrets:
- member's
peerSecret
's fields change:- peer-crt.pem -> peer.crt
- peer-key.pem -> peer.key
- peer-ca-crt.pem -> peer-ca.crt
- member's
clientSecret
is changed toserverSecret
, its fields change:- client-crt.pem -> server.crt
- client-key.pem -> server.key
- client-ca-crt.pem -> server-ca.crt
operatorSecret
's fields change:- etcd-crt.pem -> etcd-client.crt
- etcd-key.pem -> etcd-client.key
- etcd-ca-crt.pem -> etcd-client-ca.crt
BREAKING CHANGE: Backup spec: CleanupBackupsOnClusterDelete
field is renamed to AutoDelete
.
Previous spec like this one
spec:
backup:
storageType: "PersistentVolume"
...
cleanupBackupsOnClusterDelete: true
should be changed to
spec:
backup:
storageType: "PersistentVolume"
...
autoDelete: true
- Adds ability for users to specify base image for etcd pods in a cluster.
Default base image is
quay.io/coreos/etcd-operator
.
- [GH-1138] Fixed operator stucks in managing selfhosted cluster when there are not enough nodes to start new etcd member.
- [GH-1196] Fixed etcd operator could not start S3 backup sidecar if given non-root user.
Bug fix release to fix self-hosted etcd issue [GH-1171] .
Notes for self-hosted etcd: The newly introduced TLS feature for self hosted etcd is a breaking change. Existing self hosted etcd cluster MUST be recreated for updating to this release.
- Self-hosted etcd: if
etcd-hosts.checkpoint
file exists under${datadir}/
, etcd pod will restore the hosts mapping from it before etcd bootstraps. - Add static TLS support for self-hosted etcd mode.
- The operator will now post Kubernetes events. To allow this the necessary RBAC rule for the resource
events
must be added to the clusterrole. See the rbac guide to see how to set up RBAC rules for the operator. If the rbac rule for 'events' is not present then the operator will continue to function normally but will also print out an error message on the failure to post an event. - Add revision field in backup status.
- Support getting a specific backup with verison and revision from the backup service.
- Self-hosted etcd: use FQDN for client/peer URL.
- Updated RBAC rules for resources
thirdpartyresources
andstorageclasses
to all verbs*
. We loose granularity early so that we have more flexibility to use other methods (e.g. Get) later.
- Update default etcd version to 3.1.8
- [GH-1108] selfHosted: fix backup unable to talk to etcd pods
Check https://github.com/coreos/etcd-operator/blob/master/doc/user/upgrade/upgrade_guide.md#v02x-to-v03x
- Added support for backup policy to be dynamically added, updated
- Added per cluster policy support for S3.
- Backup sidecar deployment created with
Recreate
strategy. - Spec.Backup.MaxBackups meaning change: 0 means unlimited backups; < 0 will be rejected.
- [GH-1068] Backup sidecar deployment stuck at upgrading
- Once operator is upgraded, all backup-enabled cluster will go through an upgrade process that
deletes backup sidecar's ReplicaSet and creates new Deployment for sidecar.
If upgrading failed for any reason, cluster TPR's
status.phase
will be FAILED. Recreate of the cluster TPR is needed on failure case.
- PodPolicy provides
EtcdEnv
option to add custom env to the etcd process. - PodPolicy provides
Labels
option to add custom labels to the etcd pod. - TLS feature: user can now create TLS-secured cluster via operator. See TLS guide.
- Self-hosted etcd pod's anti-affinity label selector is changed to select
{"app": "etcd"}
. That is, no two etcd pods should sit on the same node, even if they belongs to different clusters. - Using Deployment to manage backup sidecar instead of ReplicaSet.
- S3 backup path is changed to
${BUCKET_NAME}/v1/${NAMESPACE}/${CLUSTER_NAME}/
.
- Add "none" PV provisioner option. If operator flag "pv-provisioner" is set to "none", operator won’t create any storage class and users couldn’t make use of operator’s PV backup feature.
- Add headless service
${clusterName}
which selects etcd pods of given cluster. - Pod Tolerations.
- TLSSpec json tag changed as
omitempty
- Time related fields in spec, i.e. TransitionTime and CreationTime, is changed to type
string
. This should be backward compatible and no effect on operator upgrade. - Update default etcd version to 3.1.4
- Self-hosted etcd pod is started with "--metrics extensive" flag. This is only available in etcd 3.1+.
- Change client LB service name to
${clusterName}-client
. - Add hostname and subdomain to etcd pods, which makes them have A records formatted in
${memberName}.${clusterName}.${namespace}.svc.cluster.local
. For more info, see https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/ . We also change PeerURL of etcd members to use such hostnames.
- Individual etcd member's services were removed. Use hostname and subdomain of etcd pod instead.
- [GH-910] Operator keeps updating status even if there is no change.
- [GH-900] Fix looping of reconcile skip due to unready members
- [GH-890] Fix a race that when majority of members went down cluster couldn't recover
- Fix self-hosted cluster reboot case
- Backup creation time is added in backup status.
- Total size of backups time is added in backup service status.
- Cluster members that are ready and unready to serve requests are tracked via the ClusterStatus fields
Members.Ready
andMembers.Unready
- PodPolicy
resourceRequirements
field is renamed toresources
- Default etcd version is changed to
3.1.2
- Self-hosted etcd pod uses hostPath with path
/var/etcd/$ns-$member
- [GH-851] Fixed a race that caused nil pointer access panic
- [GH-823] Fixed backup service status not shown in TPR status
- Experimental client for interacting with backup service
- The operator panics itself when it gets stuck unexpectedly. It relies on Kubernetes to get restarted.
- Add resource requirements field in
Spec.Pod
. Users can specify resource requirements for the etcd container with this new field. - Add status endpoint to backup sidecar service.
- Service account of the etcd operator pod is passed to backup pod.
- Add backup service status into cluster status.
- Example deployments pin to the released version of the operator image
- Downward API of pod's namespace and name is required to start etcd operator pod.
See
example/deployment.yaml
.
- Drop etcd operator command line flags: "masterHost", "cert-file", "key-file", "ca-file".