Skip to content

Releases: libopenstorage/stork

v24.3.3.1

12 Dec 04:48
Compare
Choose a tag to compare

Fixes

Issue Number Issue Description Severity
PB-7868 The NFS backup location PVC failed when the NFS version was not specified in the mount option.

User impact: KDMP restore failed with an NFSv4 backup location.

Resolution: You can now restore KDMP backups taken on an NFSv4 backup location.

Affected versions: 24.3.3 and earlier
Major
PB-7944 For large resources, the resources are fetched twice, increasing the backup completion time.

User Impact: Large resource backups take longer to complete.

Resolution: Resources are now fetched only once and uploaded in the same reconciler context. Note: This fix is applicable only for S3 object store.

Affected Versions: 24.3.3 and earlier
Minor

v24.3.4

12 Dec 15:13
Compare
Choose a tag to compare

Fixes

Issue Number Issue Description Severity
PB-8630 In IBM environments, backups or restores of Persistent Volume Claims (PVCs) fail when using certain CSI provisioners (e.g., IBM File). These provisioners do not invoke the kubelet ownership and permission change functions, causing kopia to fail due to insufficient read permissions.

User Impact: Users with IBM File provisioners cannot perform KDMP backups because KDMP job pods lack the required read permissions.

Resolution: KDMP job pods now support the anyuid annotation to address permission issues. To enable this:
  1. Add the PROVISIONERS_TO_USE_ANYUID: openshift-storage.cephfs.csi.ceph.com,provisioner2 entry to the KDMP ConfigMap.
  2. Apply this ConfigMap on both the backup and restore clusters.

This ensures KDMP job pods run with the necessary permissions, resolving backup and restore failures.

Affected Versions: 24.3.3 and earlier
Minor

v24.3.3

22 Nov 04:46
Compare
Choose a tag to compare

Fixes

Issue Number Issue Description Severity
PB-8394 If a smaller pxd volume of a few MB and a larger volume of, let's say, 500 GB, are backed up, the status of the smaller backup throws a false error: Backup failed for volume: rpc error: code = Internal desc = Failed to get status of backup: Key not found.

User Impact: The backup appears to be in a failed state with the key not found error, but the backup was actually successful and uploaded to the cloud.

Resolution: Once the backup reaches a successful state, we stop checking its status.

Affected Versions: 24.3.2 and earlier
Minor
PB-7476 Node affinity was not set up in the kopia/NFS backup and restore job pods, allowing these job pods to be scheduled on nodes where the applications to be backed up were not running. This led to backup failures.

User Impact: If there are network restrictions on certain nodes and applications are not running on those nodes, there's a risk that the backup job pods may get scheduled on these restricted nodes, causing the backup to fail.

Resolution: Added node affinity for the job pods to ensure they are scheduled on the desired nodes where the application pods are running.

Affected Versions: 24.3.2 and earlier
Minor

Known issues (Errata)

Issue Number Issue Description Severity
PWX-38905 If a StorageClass is deleted on the source and an asynchronous DR operation is performed for resources (PV/PVC) that use the deleted StorageClass, the migration fails with the following error:
Error updating StorageClass on PV: StorageClass.storage.k8s.io <storageClassName> not found

Workaround: You need to recreate the deleted StorageClass to proceed with the DR operation.

Affected versions: 24.3.0, 24.3.1, 24.3.2, and 24.3.3
Major

v24.3.2

14 Oct 23:03
Compare
Choose a tag to compare

Fixes

Issue Number Issue Description Severity
PB-4394 KDMP restore failed when the snapshot size exceeded the PVC size.

User Impact: Users experienced failures during KDMP restore for filesystem-based storage provisions when the PVC content was larger than the PVC size.

Resolution: Modified the PVC size to match the snapshot size.
Affected Versions: 24.3.1 and earlier
Major
PB-8316 Backups were incorrectly marked as successful, not partial, even when some volume backups failed.

User Impact: Users were led to assume that all PVCs were successfully backed up, even when some had failed.

Resolution: Updated the in-memory value of failedVolCount and the backup object to accurately reflect the number of failed backups.
Affected Versions: 24.3.0 and 24.3.1
Major
PB-8360 Addition of IBM COS backup location failed with an UnsupportedOperation error when the bucket was unlocked.

User Impact: Users could not add an IBM COS backup location if it was unlocked.

Resolution: The UnsupportedOperation error is now ignored for unlocked IBM COS buckets, indicating the bucket is not locked.
Affected Versions: 24.3.1
Major
PB-7726 VM Backup failed while executing auto exec rules if the virt-launcher pod of the VMs was not in a running state.

User Impact: VM Backups failed when auto exec rules were applied.

Resolution: Auto exec rules are now only executed on running virt-launcher pods.
Affected Versions: 24.3.0 and 24.3.1
Major

Known issues (Errata)

Issue Number Issue Description Severity
PWX-38905 If a StorageClass is deleted on the source and an asynchronous DR operation is performed for resources (PV/PVC) that use the deleted StorageClass, the migration fails with the following error:
Error updating StorageClass on PV: StorageClass.storage.k8s.io <storageClassName> not found

Workaround: You need to recreate the deleted StorageClass to proceed with the DR operation.

Affected versions: 24.3.0, 24.3.1, and 24.3.2.
Major

v24.3.1

07 Oct 18:28
Compare
Choose a tag to compare

Improvements

Improvement Number Improvement Description
PWX-39128 If the AWS init gets stuck for a long time and prevents Stork from starting up, you can skip the AWS driver init from the Stork imports by adding the environment variable SKIP_AWS_DRIVER_INIT="true" to the stork pod.

Fixes

Issue Number Issue Description Severity
PWX-38383 In certain scenarios, Kubernetes etcd was overloaded, and stork pods went into CrashLoopBackOff state with the following error:
Controller manager: failed to wait for snapshot-schedule-controller caches to sync: timed out waiting for cache to be synced.

User Impact: Stork failed and restarted multiple times due to the overloading of Kubernetes etcd.

Resolution: We've added a --controller-cache-sync-timeout flag, using which you can tweak the value of the cache sync timeout based on your requirements. The default value is 2 minutes.
Example: --controller-cache-sync-timeout=10 - sets the controller cache sync timeout as 10 minutes instead of the default 2 minutes.
Affected Versions: 24.3.0 and earlier
Minor
PWX-36167 The Stork health monitor was incorrectly considering stale node entries with an offline status for pod eviction.

User Impact: If a node was repaired and returned with a different IP address, pods were inadvertently evicted from this online node due to the presence of stale node entries.

Resolution: If a node entry with an 'online' storage status shares the same scheduler ID as an 'offline' node entry, the system will disregard the offline node entry when considering pod evictions. This change ensures that pods are not inadvertently evicted from nodes that have been repaired and are now online.
Affected Versions: 24.3.0 and earlier
Minor

v24.3.0

11 Sep 09:39
Compare
Choose a tag to compare

Enhancements

  • Stork now supports partial backups. If backup fails for any of the PVCs, the successful backups of other PVCs are still saved, and the status is displayed as partial success. #1716
    Note: A partial backup requires at least one successful PVC backup.
  • Updated golang, aws-iam-authenticator, google-cloud-cli, and google-cloud-sdk versions to resolve security vulnerabilities. #1804 #1807

Bug fix

  • Issue: In a Synchronous DR setup, when you perform a failover operation using the storkctl perform failover command, the witness node might be deactivated instead of the source cluster.
    User Impact: After failover, the source cluster might remain in active state, and the PX volumes can still be mounted and used from the source cluster.
    Resolution: After failover, now the source cluster is deactivated by default, and the witness node remains unaffected. #1829

24.2.5

05 Aug 08:12
Compare
Choose a tag to compare

Bug fix

  • Issue: Strong hyperconvergence for pods is not working when using stork.libopenstorage.org/preferLocalNodeOnly annotation.
    User Impact: Pods remain in a pending state.
    Resolution: When the stork.libopenstorage.org/preferLocalNodeOnly annotation is used, the pods are now scheduled in the node where the volume replica lies, and the strong hyperconvergence works as expected. #1818

24.2.4

17 Jul 01:07
Compare
Choose a tag to compare

Bug Fix

  • Issue: During an OCP upgrade in a 3-node cluster, the MutatingWebhookConfiguration stork-webhooks-cfg is deleted if the leader Stork pod is evicted.
    User Impact: Applications that require Stork as the scheduler will experience disruptions, and OCP upgrades will get stuck on a 3-node cluster.
    Resolution: The MutatingWebhookConfiguration is now created after the leader election, ensuring stork-webhooks-cfg is always available. #1810
    Affected Versions: All

24.2.3

02 Jul 23:58
Compare
Choose a tag to compare

Note: For users currently on Stork versions 24.2.0, 24.2.1, or 24.2.2, Portworx by Pure Storage recommends upgrading to Stork 24.2.3.

Bug Fix

  • Issue: If the VolumeSnapshotSchedule has more status entries than the retain policy limit, Stork may continue creating new VolumeSnapshots, ignoring the retain policy. This can happen if the retain limit was lowered or if there was an error during snapshot creation.
    User Impact: Users saw more VolumeSnapshots than their retain policy was configured to allow.
    Resolution: Upgrade to Stork version 24.2.3. #1800
    Note: This fix doesn’t clean up the snapshots that were created before the upgrade. If required, you need to delete the old snapshots manually.
    Affected Versions: 24.2.0, 24.2.1, and 24.2.2.

24.2.2

14 Jun 04:32
Compare
Choose a tag to compare

Enhancement

  • Stork now uses the shared informer cache event handling mechanism instead of the watch API to reschedule unhealthy pods that are using Portworx volumes. #1795