Skip to content

Commit

Permalink
Added Estimated Queue Position in the Status field of the NAB object (#…
Browse files Browse the repository at this point in the history
…117)

Signed-off-by: Michal Pryc <[email protected]>
  • Loading branch information
mpryc authored Nov 27, 2024
1 parent 9fdcb88 commit d05a4b2
Show file tree
Hide file tree
Showing 15 changed files with 621 additions and 42 deletions.
12 changes: 12 additions & 0 deletions api/v1alpha1/nonadminbackup_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -103,10 +103,22 @@ type NonAdminBackupStatus struct {
// +optional
VeleroDeleteBackupRequest *VeleroDeleteBackupRequest `json:"veleroDeleteBackupRequest,omitempty"`

// +optional
QueueInfo *QueueInfo `json:"queueInfo,omitempty"`

Phase NonAdminBackupPhase `json:"phase,omitempty"`
Conditions []metav1.Condition `json:"conditions,omitempty"`
}

// QueueInfo holds the queue position for a specific VeleroBackup.
// It is used to estimate how many backups are scheduled before the given VeleroBackup in the OADP namespace.
// This number is not guaranteed to be accurate, but it should be close. It's inaccurate for cases when
// Velero pod is not running or being restarted after Backup object were created.
// It counts only VeleroBackups that are still subject to be handled by OADP/Velero.
type QueueInfo struct {
EstimatedQueuePosition int `json:"estimatedQueuePosition"` // Number of backups ahead in the queue (0 if not queued)
}

// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:resource:path=nonadminbackups,shortName=nab
Expand Down
20 changes: 20 additions & 0 deletions api/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

13 changes: 13 additions & 0 deletions config/crd/bases/oadp.openshift.io_nonadminbackups.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -619,6 +619,19 @@ spec:
- Created
- Deleting
type: string
queueInfo:
description: |-
QueueInfo holds the queue position for a specific VeleroBackup.
It is used to estimate how many backups are scheduled before the given VeleroBackup in the OADP namespace.
This number is not guaranteed to be accurate, but it should be close. It's inaccurate for cases when
Velero pod is not running or being restarted after Backup object were created.
It counts only VeleroBackups that are still subject to be handled by OADP/Velero.
properties:
estimatedQueuePosition:
type: integer
required:
- estimatedQueuePosition
type: object
veleroBackup:
description: VeleroBackup contains information of the related Velero
backup object.
Expand Down
2 changes: 2 additions & 0 deletions docs/design/Non_Admin_Controller_design.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,11 +92,13 @@ This design intends to enable non-admin users the ability to perform Backup and
- Listen to requests pertaining to Non-Admin Backup CRD
- Process requests pertaining to Non-Admin Backup CRD
- Update Non-Admin Backup CR status with the status/events from Velero Backup CR
- Update Non-Admin Backup CR status with the estimated queue position from the Velero Backup CRs
- Cascade Any actions performed on Non-Admin Backup CR to corresponding Velero backup CR
- **Non-Admin Restore (NAR) Controller:** The responsibilities of the NAR controller are:
- Listen to requests pertaining to Non-Admin Restore CRD
- Process requests pertaining to Non-Admin Restore CRD
- Update Non-Admin Backup CR status with the status/events from Velero Restore CR
- Update Non-Admin Backup CR status with the estimated queue position from the Velero Restore CRs
- Cascade Any actions performed on Non-Admin Restore CR to corresponding Velero restore CR
- **Non-Admin BackupStorageLocation (NABSL) controller:** The responsibilities of the NABSL controller are:
- Listen to requests pertaining to Non-Admin BSL CRD
Expand Down
100 changes: 96 additions & 4 deletions docs/design/nab_and_nar_status_update.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,11 @@ This document outlines the design around updating NonAdminBackup (NAB) and NonAd

## NonAdminBackup and NonAdminRestore Status

The `status` field of NAB and NAR objects contains the following fields:
The `status` field of NAB and NAR objects contains the following fields, which are updated by NAB and NAR controllers:
- `phase`
- `conditions`
- `veleroBackup` for NAB and `veleroRestore` for NAR, which contains name, namespace and status of the related Velero object.

which are updated by NAB and NAR controllers.
- `queueInfo` contains estimatedQueuePosition, which is best effort estimation of the position of the NAB/NAR in the Velero queue.

Any reconciliation function that depends on data stored in the `status` field must ensure it operates on the most recent version of that field from the cluster before proceeding.

Expand Down Expand Up @@ -118,6 +117,99 @@ status:
phase: Created
```
### Queue Info
`queueInfo` contains `estimatedQueuePosition`, which represents the number of other Velero backups that need to be processed by Velero before the current NonAdminBackup (NAB) or NonAdminRestore (NAR) is handled. This estimate is accurate when the Velero pod is running continuously. However, it may become very inaccurate if the Velero pod was restarted or started after the Velero backups already existed in the cluster.
In the future the queueInfo may be extended with more fields to provide more information about the Velero queue such as time or size of the backups in the queue.

```yaml
status:
conditions:
- lastTransitionTime: '2024-11-25T18:34:01Z'
message: backup accepted
reason: BackupAccepted
status: 'True'
type: Accepted
- lastTransitionTime: '2024-11-25T18:34:01Z'
message: Created Velero Backup object
reason: BackupScheduled
status: 'True'
type: Queued
phase: Created
queueInfo:
estimatedQueuePosition: 12
veleroBackup:
nacuuid: mongo-persistent-anotherte-0d0b7b2c-ee76-412d-a867-2c23b8aa51ab
name: mongo-persistent-anotherte-0d0b7b2c-ee76-412d-a867-2c23b8aa51ab
namespace: openshift-adp
status: {}
```

When the Backup is InProgress, the status will be updated with the `estimatedQueuePosition` being set to 1 and the `veleroBackup` phase being set to InProgress.

```yaml
status:
conditions:
- lastTransitionTime: '2024-11-27T10:47:49Z'
message: backup accepted
reason: BackupAccepted
status: 'True'
type: Accepted
- lastTransitionTime: '2024-11-27T10:47:50Z'
message: Created Velero Backup object
reason: BackupScheduled
status: 'True'
type: Queued
phase: Created
queueInfo:
estimatedQueuePosition: 1
veleroBackup:
nacuuid: mongo-persistent-c95bc62d-f40c-47b8-8a28-0dd7addb4930
name: mongo-persistent-anotherte-c95bc62d-f40c-47b8-8a28-0dd7addb4930
namespace: openshift-adp
status:
expiration: '2024-12-27T10:48:45Z'
formatVersion: 1.1.0
phase: InProgress
startTimestamp: '2024-11-27T10:48:45Z'
version: 1
```
After the Backup is successfull, the `veleroBackup` phase will be set to Completed with additional information about the backup and the `estimatedQueuePosition` will be set to 0.

```yaml
status:
conditions:
- lastTransitionTime: '2024-11-27T10:47:49Z'
message: backup accepted
reason: BackupAccepted
status: 'True'
type: Accepted
- lastTransitionTime: '2024-11-27T10:47:50Z'
message: Created Velero Backup object
reason: BackupScheduled
status: 'True'
type: Queued
phase: Created
queueInfo:
estimatedQueuePosition: 0
veleroBackup:
nacuuid: mongo-persistent-anotherte-c95bc62d-f40c-47b8-8a28-0dd7addb4930
name: mongo-persistent-anotherte-c95bc62d-f40c-47b8-8a28-0dd7addb4930
namespace: openshift-adp
status:
completionTimestamp: '2024-11-27T10:48:50Z'
expiration: '2024-12-27T10:48:45Z'
formatVersion: 1.1.0
hookStatus: {}
phase: Completed
progress:
itemsBackedUp: 56
totalItems: 56
startTimestamp: '2024-11-27T10:48:45Z'
version: 1
```


## Status Update scenarios

The following graph shows the lifecycle of a NonAdminBackup.
Expand Down Expand Up @@ -159,7 +251,7 @@ flowchart TD
createVB -->|No| createNewVB[Create VeleroBackup]
createNewVB --> setCreatedPhase[NAB Phase: **Created**]
setCreatedPhase --> setQueuedCondition[NAB Condition:: Queued=True<br>Reason: BackupScheduled<br>Message: Created Velero Backup object]
createVB -->|Yes| updateFromVB[Update NAB Status from VeleroBackup:<br>Phase, Start Time, Completion Time,<br>Expiration, Errors, Warnings,<br>Progress, ValidationErrors]
createVB -->|Yes| updateFromVB[Update NAB Status from VeleroBackup:<br>Phase, Start Time, Completion Time,<br>Expiration, Errors, Warnings,<br>Progress, ValidationErrors<br>Queue Info: estimatedQueuePosition]
setQueuedCondition -->|Update Status if Changed<br>▶ Continue ║No Requeue║| endCreateUpdate[End Create/Update]
updateFromVB -->|Update Status if Changed<br>▶ Continue ║No Requeue║| endCreateUpdate
Expand Down
3 changes: 3 additions & 0 deletions internal/common/constant/constant.go
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,9 @@ const NameDelimiter = "-"
// TrueString defines a constant for the True string
const TrueString = "True"

// NamespaceString defines a constant for the Namespace string
const NamespaceString = "Namespace"

// MaximumNacObjectNameLength represents Generated Non Admin Object Name and
// must be below 63 characters, because it's used within object Label Value
const MaximumNacObjectNameLength = validation.DNS1123LabelMaxLength
73 changes: 73 additions & 0 deletions internal/common/function/function.go
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,79 @@ func GetVeleroBackupByLabel(ctx context.Context, clientInstance client.Client, n
}
}

// GetActiveVeleroBackupsByLabel retrieves all VeleroBackup objects based on a specified label within a given namespace.
// It returns a slice of VeleroBackup objects or nil if none are found.
func GetActiveVeleroBackupsByLabel(ctx context.Context, clientInstance client.Client, namespace, labelKey, labelValue string) ([]velerov1.Backup, error) {
var veleroBackupList velerov1.BackupList
labelSelector := client.MatchingLabels{labelKey: labelValue}

if err := clientInstance.List(ctx, &veleroBackupList, client.InNamespace(namespace), labelSelector); err != nil {
return nil, err
}

// Filter out backups with a CompletionTimestamp
var activeBackups []velerov1.Backup
for _, backup := range veleroBackupList.Items {
if backup.Status.CompletionTimestamp == nil {
activeBackups = append(activeBackups, backup)
}
}

if len(activeBackups) == 0 {
return nil, nil
}

return activeBackups, nil
}

// GetBackupQueueInfo determines the queue position of the specified VeleroBackup.
// It calculates how many queued Backups exist in the namespace that were created before this one.
func GetBackupQueueInfo(ctx context.Context, clientInstance client.Client, namespace string, targetBackup *velerov1.Backup) (nacv1alpha1.QueueInfo, error) {
var queueInfo nacv1alpha1.QueueInfo

// If the target backup has no valid CreationTimestamp, it means that it's not yet reconciled by OADP/Velero.
// In this case, we can't determine its queue position, so we return nil.
if targetBackup == nil || targetBackup.CreationTimestamp.IsZero() {
return queueInfo, nil
}

// If the target backup has a CompletionTimestamp, it means that it's already served.
if targetBackup.Status.CompletionTimestamp != nil {
queueInfo.EstimatedQueuePosition = 0
return queueInfo, nil
}

// List all Backup objects in the namespace
var backupList velerov1.BackupList
if err := clientInstance.List(ctx, &backupList, client.InNamespace(namespace)); err != nil {
return queueInfo, err
}

// Extract the target backup's creation timestamp
targetTimestamp := targetBackup.CreationTimestamp.Time

// The target backup is always in queue at least in the first position
// 0 is reserved for the backups that are already served.
queueInfo.EstimatedQueuePosition = 1

// Iterate through backups and calculate position
for i := range backupList.Items {
backup := &backupList.Items[i]

// Skip backups that have CompletionTimestamp set. This means that the Velero won't be further processing this backup.
if backup.Status.CompletionTimestamp != nil {
continue
}

// Count backups created earlier than the target backup
if backup.CreationTimestamp.Time.Before(targetTimestamp) {
queueInfo.EstimatedQueuePosition++
}
}

return queueInfo, nil
}

// GetVeleroDeleteBackupRequestByLabel retrieves a DeleteBackupRequest object based on a specified label within a given namespace.
// It returns the DeleteBackupRequest only when exactly one object is found, throws an error if multiple backups are found,
// or returns nil if no matches are found.
Expand Down
Loading

0 comments on commit d05a4b2

Please sign in to comment.