Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Debug Information Paths to NAB/NAR CR status #154

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,6 @@ go.work
*.swp
*.swo
*~

# artifacts generated when debugging
debug.*
1 change: 1 addition & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ RUN go mod download
COPY cmd/main.go cmd/main.go
COPY api/ api/
COPY internal/ internal/
COPY pkg/ pkg/

# Build
# the GOARCH has not a default value to allow the binary be built according to the host where the command
Expand Down
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,10 @@ vet: ## Run go vet against code.
test: manifests generate fmt vet envtest ## Run tests.
KUBEBUILDER_ASSETS="$(shell $(ENVTEST) use $(ENVTEST_K8S_VERSION) --bin-dir $(LOCALBIN) -p path)" go test $$(go list ./... | grep -v /e2e) -coverprofile cover.out

# Generates kubebuilder assets dir to assist running test in IDEs which would not call `use` on envtest to create assets required for for suite_test
generate-kubebuilder_assets: envtest
$(shell $(ENVTEST) use $(ENVTEST_K8S_VERSION) --bin-dir $(LOCALBIN) -p path)

# Utilize Kind or modify the e2e tests to load the image locally, enabling compatibility with other vendors.
.PHONY: test-e2e # Run the e2e tests against a Kind k8s instance that is spun up.
test-e2e:
Expand Down
6 changes: 6 additions & 0 deletions api/v1alpha1/nonadminbackup_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,12 @@ type NonAdminBackupStatus struct {
// phase is a simple one high-level summary of the lifecycle of an NonAdminBackup.
Phase NonAdminPhase `json:"phase,omitempty"`

// logsPath is path to backup logs in the storage location
Copy link
Member

@shubham-pampattiwar shubham-pampattiwar Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make a struct here ?
like OperationInfo / DebugInfo / InfoPath / DebugFiles or something like that, and then add these paths under that struct ?'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depends how big you think this is gonna get

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if there are to be that many, we just provide one base url, and then user/cli can populate the rest. its common enough that I think specifying each is wasteful of etcd.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do add a struct, do we make it inline? such that it appears the same level as Phase.. or is the intent to make it a nested status field?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, we should just provide the base url (no separate url for logs and resources as well) and document the dir structure for users, wdyt ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine by me. I do wanna make sure everyone is heard tho.. perhaps next week's scrum?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, works for me.

LogsPath string `json:"logsPath,omitempty"`

// resourceListPath is path to resource list of backup in storage location
ResourceListPath string `json:"resourceListPath,omitempty"`

Conditions []metav1.Condition `json:"conditions,omitempty"`
}

Expand Down
6 changes: 6 additions & 0 deletions api/v1alpha1/nonadminrestore_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,12 @@ type NonAdminRestoreStatus struct {
// phase is a simple one high-level summary of the lifecycle of an NonAdminRestore.
Phase NonAdminPhase `json:"phase,omitempty"`

// logsPath is path to restore logs in the storage location
LogsPath string `json:"logsPath,omitempty"`

// resourceListPath is path to resource list of restore in storage location
ResourceListPath string `json:"resourceListPath,omitempty"`

Conditions []metav1.Condition `json:"conditions,omitempty"`
}

Expand Down
7 changes: 7 additions & 0 deletions config/crd/bases/oadp.openshift.io_nonadminbackups.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -601,6 +601,9 @@ spec:
- type
type: object
type: array
logsPath:
description: logsPath is path to backup logs in the storage location
type: string
phase:
description: phase is a simple one high-level summary of the lifecycle
of an NonAdminBackup.
Expand All @@ -624,6 +627,10 @@ spec:
required:
- estimatedQueuePosition
type: object
resourceListPath:
description: resourceListPath is path to resource list of backup in
storage location
type: string
veleroBackup:
description: VeleroBackup contains information of the related Velero
backup object.
Expand Down
7 changes: 7 additions & 0 deletions config/crd/bases/oadp.openshift.io_nonadminrestores.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -534,6 +534,9 @@ spec:
- type
type: object
type: array
logsPath:
description: logsPath is path to restore logs in the storage location
type: string
phase:
description: phase is a simple one high-level summary of the lifecycle
of an NonAdminRestore.
Expand All @@ -557,6 +560,10 @@ spec:
required:
- estimatedQueuePosition
type: object
resourceListPath:
description: resourceListPath is path to resource list of restore
in storage location
type: string
veleroRestore:
description: VeleroRestore contains information of the related Velero
restore object.
Expand Down
207 changes: 207 additions & 0 deletions docs/design/Debug_DownloadURL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
# Debug content Download URL Exposure Design

<!-- _Note_: The preferred style for design documents is one sentence per line.
*Do not wrap lines*.
This aids in review of the document as changes to a line are not obscured by the reflowing those changes caused and has a side effect of avoiding debate about one or two space after a period. -->

## Abstract

We want to provide a way for Non Admins (NAs) to debug their non admin backup (NAB) and/or non admin restore (NAR).

## Background

A backup/restore operation may not always succeed in part, or in full. Non Admin User would like a way to diagnose their backup or restore based on information available on object store uploaded by Velero during its backup or restore process.

## Goals

- Provide a path or url to debug information on NAB and NAR statuses, allowing credentialed object store users to access debug information.

## Non Goals

- Create a CLI
- Automatically retrieve debug information on behalf of the user

## High-Level Design

When a NAC velero backup or restore is completed, its associated NAB/NAR are updated to completion status.

During this time, NAC will also inject into status a path (ie. S3 url) to a relevant debugging file on the storage location that velero had uploaded.

## Detailed Design

[Currently](https://github.com/vmware-tanzu/velero/blob/2197cab3dbf1038b7afe363284d7b9e18c9300dc/pkg/apis/velero/v1/download_request_types.go#L32-L45) velero exposes following DownloadTargetKind that can be used in a DownloadRequest spec.

```go
DownloadTargetKindBackupLog DownloadTargetKind = "BackupLog"
DownloadTargetKindBackupContents DownloadTargetKind = "BackupContents"
DownloadTargetKindBackupVolumeSnapshots DownloadTargetKind = "BackupVolumeSnapshots"
DownloadTargetKindBackupItemOperations DownloadTargetKind = "BackupItemOperations"
DownloadTargetKindBackupResourceList DownloadTargetKind = "BackupResourceList"
DownloadTargetKindBackupResults DownloadTargetKind = "BackupResults"
DownloadTargetKindRestoreLog DownloadTargetKind = "RestoreLog"
DownloadTargetKindRestoreResults DownloadTargetKind = "RestoreResults"
DownloadTargetKindRestoreResourceList DownloadTargetKind = "RestoreResourceList"
DownloadTargetKindRestoreItemOperations DownloadTargetKind = "RestoreItemOperations"
DownloadTargetKindCSIBackupVolumeSnapshots DownloadTargetKind = "CSIBackupVolumeSnapshots"
DownloadTargetKindCSIBackupVolumeSnapshotContents DownloadTargetKind = "CSIBackupVolumeSnapshotContents"
DownloadTargetKindBackupVolumeInfos DownloadTargetKind = "BackupVolumeInfos"
DownloadTargetKindRestoreVolumeInfo DownloadTargetKind = "RestoreVolumeInfo"
```

These are the expected kinds of items we can get from object store for debugging.

The path to these items in an object store is defined in [pkg/persistence/object_store_layout.go](https://github.com/vmware-tanzu/velero/blob/0a280e57863c70495a2dfb65787615a68a0e7b03/pkg/persistence/object_store_layout.go) of velero repo.

We will use this knowledge to populate NAB/NAR status fields with path to log and resourceList.

In the future we may have a CLI that can gather resources available in DownloadTargetKind using the values added to NAB/NAR statuses from this enhancement.

### Usage examples

In a completed NAB, you see in its status.resourceListPath has a value of `s3://velero-6109f5e9711c8c58131acdd2f490f451/velero-e2e-00093375-ecd8-41d8-81fc-8ce9d4983ad6/backups/mongo-blockdevice-e2e-f26ab76c-34d1-11ef-93a5-0a580a834d36/mongo-blockdevice-e2e-f26ab76c-34d1-11ef-93a5-0a580a834d36-resource-list.json.gz`
shubham-pampattiwar marked this conversation as resolved.
Show resolved Hide resolved

Given a user have credentials to the object storage, they will be able to run following commands to get resource lists in the backup.
```
❯ aws s3 cp s3://velero-6109f5e9711c8c58131acdd2f490f451/velero-e2e-00093375-ecd8-41d8-81fc-8ce9d4983ad6/backups/mongo-blockdevice-e2e-f26ab76c-34d1-11ef-93a5-0a580a834d36/mongo-blockdevice-e2e-f26ab76c-34d1-11ef-93a5-0a580a834d36-resource-list.json.gz .

❯ gzip --uncompress --to-stdout mongo-blockdevice-e2e-f26ab76c-34d1-11ef-93a5-0a580a834d36-resource-list.json.gz | jq
{
"apps.openshift.io/v1/DeploymentConfig": [
"mongo-persistent/todolist"
],
"apps/v1/Deployment": [
"mongo-persistent/mongo"
],
"apps/v1/ReplicaSet": [
"mongo-persistent/mongo-7645dc59b7"
],
"authorization.openshift.io/v1/RoleBinding": [
"mongo-persistent/system:deployers",
"mongo-persistent/system:image-builders",
"mongo-persistent/system:image-pullers"
],
"discovery.k8s.io/v1/EndpointSlice": [
"mongo-persistent/mongo-hf89b",
"mongo-persistent/todolist-frpft"
],
"image.openshift.io/v1/ImageStream": [
"mongo-persistent/todolist-mongo-go"
],
"rbac.authorization.k8s.io/v1/RoleBinding": [
"mongo-persistent/system:deployers",
"mongo-persistent/system:image-builders",
"mongo-persistent/system:image-pullers"
],
"route.openshift.io/v1/Route": [
"mongo-persistent/todolist-route"
],
"security.openshift.io/v1/SecurityContextConstraints": [
"mongo-persistent-scc"
],
"v1/ConfigMap": [
"mongo-persistent/kube-root-ca.crt",
"mongo-persistent/openshift-service-ca.crt"
],
"v1/Endpoints": [
"mongo-persistent/mongo",
"mongo-persistent/todolist"
],
"v1/Event": [
"mongo-persistent/mongo-7645dc59b7-ddnpr.17dcfbf31d2411b3",
"mongo-persistent/mongo-7645dc59b7-ddnpr.17dcfbf45c7ee237",
"mongo-persistent/mongo-7645dc59b7-ddnpr.17dcfbf4ecd8fb4f",
"mongo-persistent/mongo-7645dc59b7-ddnpr.17dcfbf4fc16952d",
"mongo-persistent/mongo-7645dc59b7-ddnpr.17dcfbf4fc16e3bc",
"mongo-persistent/mongo-7645dc59b7-ddnpr.17dcfbf518b65eac",
"mongo-persistent/mongo-7645dc59b7-ddnpr.17dcfbf51a288d26",
"mongo-persistent/mongo-7645dc59b7-ddnpr.17dcfbf5215c6989",
"mongo-persistent/mongo-7645dc59b7-ddnpr.17dcfbf526ddd2e3",
"mongo-persistent/mongo-7645dc59b7-ddnpr.17dcfbf527b6e869",
"mongo-persistent/mongo-7645dc59b7-ddnpr.17dcfbf5499768e5",
"mongo-persistent/mongo-7645dc59b7-ddnpr.17dcfbf550e736c1",
"mongo-persistent/mongo-7645dc59b7-ddnpr.17dcfbf556dbbde4",
"mongo-persistent/mongo-7645dc59b7-ddnpr.17dcfbf557a59c91",
"mongo-persistent/mongo-7645dc59b7.17dcfbf31d2357e0",
"mongo-persistent/mongo.17dcfbf31911596c",
"mongo-persistent/mongo.17dcfbf32101157f",
"mongo-persistent/mongo.17dcfbf36dcc9869",
"mongo-persistent/mongo.17dcfbf36dd6c3b0",
"mongo-persistent/mongo.17dcfbf439418098",
"mongo-persistent/todolist-1-deploy.17dcfbf32432b1f8",
"mongo-persistent/todolist-1-deploy.17dcfbf35206b4ea",
"mongo-persistent/todolist-1-deploy.17dcfbf353d7fdbf",
"mongo-persistent/todolist-1-deploy.17dcfbf358a11021",
"mongo-persistent/todolist-1-deploy.17dcfbf3598d5f3d",
"mongo-persistent/todolist-1-wqhpb.17dcfbf35f204990",
"mongo-persistent/todolist-1-wqhpb.17dcfbf38b531ba6",
"mongo-persistent/todolist-1-wqhpb.17dcfbf38c971e09",
"mongo-persistent/todolist-1-wqhpb.17dcfbf3919831c9",
"mongo-persistent/todolist-1-wqhpb.17dcfbf3927a3b4e",
"mongo-persistent/todolist-1-wqhpb.17dcfbfcfd3a2577",
"mongo-persistent/todolist-1-wqhpb.17dcfbfd07869b18",
"mongo-persistent/todolist-1-wqhpb.17dcfbfd0ca71c96",
"mongo-persistent/todolist-1-wqhpb.17dcfbfd0da3efdc",
"mongo-persistent/todolist-1.17dcfbf35ebded07",
"mongo-persistent/todolist.17dcfbf320766931"
],
"v1/Namespace": [
"mongo-persistent"
],
"v1/PersistentVolume": [
"pvc-2c2ff5b8-3b06-467c-8b0e-99a40cde4993"
],
"v1/PersistentVolumeClaim": [
"mongo-persistent/mongo"
],
"v1/Pod": [
"mongo-persistent/mongo-7645dc59b7-ddnpr",
"mongo-persistent/todolist-1-deploy",
"mongo-persistent/todolist-1-wqhpb"
],
"v1/ReplicationController": [
"mongo-persistent/todolist-1"
],
"v1/Secret": [
"mongo-persistent/builder-dockercfg-h7p9c",
"mongo-persistent/builder-token-l79l7",
"mongo-persistent/default-dockercfg-nnjfj",
"mongo-persistent/default-token-7d745",
"mongo-persistent/deployer-dockercfg-dvrn8",
"mongo-persistent/deployer-token-dl9m2",
"mongo-persistent/mongo-persistent-sa-dockercfg-p9zfk",
"mongo-persistent/mongo-persistent-sa-token-55qxx"
],
"v1/Service": [
"mongo-persistent/mongo",
"mongo-persistent/todolist"
],
"v1/ServiceAccount": [
"mongo-persistent/builder",
"mongo-persistent/default",
"mongo-persistent/deployer",
"mongo-persistent/mongo-persistent-sa"
]
}
```

## Alternatives Considered

Generating a short lived signed url

- While this approach do not require user to have bucket credentials, the URL retrieved will be short lived such that it will only be practical to use if programmatically used by a CLI, which is out of scope of current design.
kaovilai marked this conversation as resolved.
Show resolved Hide resolved

## Security Considerations

We will assume that user who have object store credentials are the authorized personnel to access this data path and the object store bucket as a whole.
Other users who share this bucket are also assumed to be similarly privileged and have access to all other shared users data in the object store.
kaovilai marked this conversation as resolved.
Show resolved Hide resolved

## Compatibility

A new status field(s) will need to be added to NAB/NAR to expose debug info path.
kaovilai marked this conversation as resolved.
Show resolved Hide resolved
This includes updates to the CRD within oadp-operator project as well.

## Implementation
<!-- A description of the implementation, timelines, and any resources that have agreed to contribute. -->

## Open Issues
<!-- A discussion of issues relating to this proposal for which the author does not know the solution. This section may be omitted if there are none. -->
31 changes: 26 additions & 5 deletions internal/controller/nonadminbackup_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ import (
"github.com/migtools/oadp-non-admin/internal/common/function"
"github.com/migtools/oadp-non-admin/internal/handler"
"github.com/migtools/oadp-non-admin/internal/predicate"
"github.com/migtools/oadp-non-admin/pkg/velero/storagelocation/paths"
)

// NonAdminBackupReconciler reconciles a NonAdminBackup object
Expand Down Expand Up @@ -631,6 +632,7 @@ func (r *NonAdminBackupReconciler) createVeleroBackupAndSyncWithNonAdminBackup(c
// Included Namespaces are set by the controller and can not be overridden by the user
// nor admin user
backupSpec.IncludedNamespaces = []string{nab.Namespace}
// If not empty, has to be a NaBSL, so override to a NaBSL with that name.
if backupSpec.StorageLocation != constant.EmptyString {
nonAdminBsl := &nacv1alpha1.NonAdminBackupStorageLocation{}

Expand Down Expand Up @@ -694,7 +696,11 @@ func (r *NonAdminBackupReconciler) createVeleroBackupAndSyncWithNonAdminBackup(c
// Ensure that the NonAdminBackup's NonAdminBackupStatus is in sync
// with the VeleroBackup. Any required updates to the NonAdminBackup
// Status will be applied based on the current state of the VeleroBackup.
updated := updateNonAdminBackupVeleroBackupStatus(&nab.Status, veleroBackup)
updated, err := updateNonAdminBackupVeleroBackupStatus(r.Client, &nab.Status, veleroBackup)
if err != nil {
logger.Error(err, statusUpdateError)
return false, err
}

if updated || updatedPhase || updatedCondition || updatedQueueInfo {
if err := r.Status().Update(ctx, nab); err != nil {
Expand Down Expand Up @@ -744,9 +750,9 @@ func updateNonAdminPhase(phase *nacv1alpha1.NonAdminPhase, newPhase nacv1alpha1.

// updateNonAdminBackupVeleroBackupStatus sets the VeleroBackup status field in NonAdminBackup object status and returns true
// if the VeleroBackup fields are changed by this call.
func updateNonAdminBackupVeleroBackupStatus(status *nacv1alpha1.NonAdminBackupStatus, veleroBackup *velerov1.Backup) bool {
func updateNonAdminBackupVeleroBackupStatus(c client.Client, status *nacv1alpha1.NonAdminBackupStatus, veleroBackup *velerov1.Backup) (bool, error) {
if status == nil || veleroBackup == nil {
return false
return false, nil
}

if status.VeleroBackup == nil {
Expand All @@ -758,11 +764,26 @@ func updateNonAdminBackupVeleroBackupStatus(status *nacv1alpha1.NonAdminBackupSt
}

if reflect.DeepEqual(*status.VeleroBackup.Status, veleroBackup.Status) {
return false
return false, nil
}

status.VeleroBackup.Status = veleroBackup.Status.DeepCopy()
return true

// adds logsPath and resourceListPath to status if velero backup is completed.
if status.VeleroBackup.Status.Phase == velerov1.BackupPhaseCompleted {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not seem to comply with design. Here we only add info if backup succeeds. Should not check for finished state phases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the info isn't usable until backup completes. If it is unclear in design will fix.

If backup fails, (I guess partially fail could work), these urls won't work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence made me think that the goal is to add more info for debugging unsuccessful backups/restores
https://github.com/kaovilai/oadp-non-admin/blob/29286ec5cbc3021ba93f5013196d60e2fa98a611/docs/design/Debug_DownloadURL.md?plain=1#L13

maybe adding to Non Goals that backup/restore phases that will not contain the info

// logsPath is <protocol>://<bucket>/<prefix>/backups/<backup-name>/<backup-name>-logs.gz
var err error
status.LogsPath, err = paths.BackupLogs(c, veleroBackup)
if err != nil {
return false, err
}
// resourceListPath is <protocol>://<bucket>/<prefix>/backups/<backup-name>/<backup-name>-resource-list.json.gz
status.ResourceListPath, err = paths.BackupResourceList(c, veleroBackup)
if err != nil {
return false, err
}
}
return true, nil
}

// updateNonAdminBackupDeleteBackupRequestStatus sets the VeleroDeleteBackupRequest status field in NonAdminBackup object status and returns true
Expand Down
Loading
Loading