Skip to content

Commit

Permalink
initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
pisces-period committed Jul 21, 2021
1 parent ce0a813 commit 51f5433
Show file tree
Hide file tree
Showing 6 changed files with 557 additions and 0 deletions.
59 changes: 59 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# this image is a port from https://github.com/dsever/dockerfiles-1/postgres-backup-s3
# with a few differences:
# - bump alpine to 3.12
# - comment out ENV directives
# - Python2.7 is deprecated, Python3 stack is installed instead
# - merge backup/restore Dockerfiles into a single one
#
# NOTE: This image uses pip3 to install AWS-CLI.
# Amazon does provide binaries for AWS-CLI, but none for Alpine.
# Setting up the environment for an Alpine binary build requires too many
# packages, bloats the image and defeats the purpose of using Alpine at all.
# The same goes for environment variables.
#
# pull base image
FROM alpine:3.13.5
# set labels
LABEL maintainer="original: Johannes Schickling <[email protected]>, update: Yuri Neves <[email protected]>, Dubravko Sever <[email protected]>" \
app="defectdojo-postgresql-s3" \
description="Periodic PostgreSQL Backup to AWS S3" \
sourcerepo="https://github.com/dsever/dockerfiles-1"
# update APK repositories
# install Python3 and Py3-PIP (Python2.7 is deprecated)
# install AWS CLI
# install Go-Cron Linux
# set appropriate permissions to executable file and remove APK cache
RUN apk update && \
apk add openssl \
postgresql curl python3 py3-pip && \
pip3 install --upgrade pip && \
pip3 install awscli && \
python3 -m pip install --upgrade awscli && \
curl -L --insecure https://github.com/odise/go-cron/releases/download/v0.0.6/go-cron-linux.gz | zcat > /usr/local/bin/go-cron && \
chmod u+x /usr/local/bin/go-cron && \
apk del curl && \
rm -rf /var/cache/apk/*
# default environment variables
# these are kept as comment for historical reasons
# these variables should be injected at container runtime
# ENV POSTGRESQL_DATABASE **None**
# ENV POSTGRESQL_HOST **None**
# ENV POSTGRESQL_PORT 5432
# ENV POSTGRESQL_USER **None**
# ENV POSTGRESQL_PASSWORD **None**
# ENV AWS_ACCESS_KEY_ID **None**
# ENV AWS_SECRET_ACCESS_KEY **None**
# ENV AWS_DEFAULT_REGION eu-central-1
# ENV S3_S3V4 no
# ENV SCHEDULE **None**
# ENV AES_KEY **None*
# ENV RESTORE_TO **None**
# ENV CMD **None*
# copy scripts over
COPY . .
# ideally, run the application as un unprivileged user
# however, I didn't have the chance to test it with this
# user, so I'm commenting the line.
# USER 5000
# run scripts
CMD ["sh", "run.sh"]
79 changes: 79 additions & 0 deletions Known Issues.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
### Restoring PostgreSQL Database Locks me Out of the Web GUI ###

#### Symptom ####

When I restore the PostgreSQL database, I can no longer log in to the Web GUI.

#### Root Cause ####

The admin password is stored in Kubernetes as a secret called `DD_ADMIN_PASSWORD`. This secret is re-created (with different values) every time DefectDojo instance is installed/re-installed.

Moreover, this password is written to the PostgreSQL database and reinstated whenever a data restore is triggered.

#### Solution ####

Source the __kubeconfig__ file relative to the target environment and run the following commands:

```
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
defectdojo-celery-beat-6cb4c6897c-5t281 1/1 Running 0 29d
defectdojo-celery-worker-5787df4578-4r5at 1/1 Running 0 29d
defectdojo-django-848c45f9d4-tcx87 2/2 Running 1 37d
defectdojo-initializer-2021-01-27-09-10-rmhld 0/1 Completed 0 44d
defectdojo-postgresql-0 2/2 Running 0 44d
defectdojo-rabbitmq-0 1/1 Running 0 30d
hostpathtest-7666c596b7-au7mx 1/1 Running 0 37d
$ kubectl exec defectdojo-django-${POD_IDENTIFIER} -c uwsgi -- ./manage.py changepassword
```

Change the password and try to log in to the Web GUI again.

Consider creating a separate super user account that remains consistent across database restores.

### Restoring PostgreSQL Database Crashes the Web GUI ###

#### Symptom ####

When I restore the PostgreSQL database, I can log in to DefectDojo, but I'm unable to browse to any section. I get an error message.

#### Root Cause ####

Most likely, you are attempting to restore the PostgreSQL database into a newer version of DefectDojo that requires a database migration.

#### Solution ####

Source the __kubeconfig__ file relative to the target environment and run the following commands:

```
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
defectdojo-celery-beat-6cb4c6897c-5t281 1/1 Running 0 29d
defectdojo-celery-worker-5787df4578-4r5at 1/1 Running 0 29d
defectdojo-django-848c45f9d4-tcx87 2/2 Running 1 37d
defectdojo-initializer-2021-01-27-09-10-rmhld 0/1 Completed 0 44d
defectdojo-postgresql-0 2/2 Running 0 44d
defectdojo-rabbitmq-0 1/1 Running 0 30d
hostpathtest-7666c596b7-au7mx 1/1 Running 0 37d
$ kubectl exec defectdojo-django-${POD_IDENTIFIER} -c uwsgi -- ./manage.py migrate
Operations to perform:
Apply all migrations: admin, auditlog, auth, authtoken, contenttypes, django_celery_results, dojo, sessions, sites, social_django, tagging, tastypie, watson
Running migrations:
Applying dojo.0071_product_type_enhancement... OK
Applying dojo.0072_composite_index... OK
Applying dojo.0073_sheets_textfields... OK
Applying dojo.0074_notifications_close_engagement... OK
Applying dojo.0075_import_history... OK
Applying dojo.0076_authorization... OK
Applying dojo.0077_delete_dupulicates... OK
Applying dojo.0078_cvssv3_rename_verbose_name... OK
```

Log out of the appplication and log back in.

Always check DefectDojo [release notes](https://github.com/DefectDojo/django-DefectDojo/releases/) before restoring the database into a new major release version.
186 changes: 186 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
#### DefectDojo PostgreSQL & AWS s3 API Integration Tool ####

The purpose of this integration tool is to provide DefectDojo PostgreSQL database backup, restore and data retention functionality.

OBS - The tool assumes an AWS S3 bucket to save/fetch database dumps. You can tweak the script to a cloud storage option of your choosing, or an NFS share or path on the localhost.

#### Design ####

This integration has been designed in Bash (shellcheck-compliant), with the following Software Design paradigms in mind:

- Don't Repeat Yourself (DRY)
- Procedural Programming
- Containerization Support

#### The Challenge ####

By default, DefectDojo Helm Charts do not provide support for database management operations. This is partly due to the fact that the application ships with multiple database options to choose from, each with its own best-practices and procedures for backup and restore operations.

#### Our Solution ####

Our tool programmatically interacts with the DefectDojo PostgreSQL database to snapshot point-in-time dumps and automatically store them in AWS s3 buckets.

Conversely, the tool can be configured to automatically download a point-in-time dump from s3 and restore the database to a previous state (for disaster recovery purposes).

#### Disclaimer ####

This command-line tool is based on the following product versions at the time of this build:

- DefectDojo v2.0.3
- AWS-CLI 1.19.27

API endpoints might change in future versions of these products and might break functionality of this code.

#### Usage ####

This tool is designed to be automatically scheduled and executed in CI/CD pipelines. However, you can run it as a standalone.

The following environment variables need to be set in your shell environment:

- CMD: which operation to execute (backup | restore)
- AWS_ACCESS_KEY_ID: AWS access key ID
- AWS_SECRET_ACCESS_KEY: AWS secret access key
- S3_BUCKET: DefectDojo AWS s3 bucket name
- S3_PREFIX: DefectDojo environment (dev | stage | prod)
- POSTGRESQL_DATABASE: DefectDojo database name (defaults to defectdojo)
- POSTGRESQL_USER: DefectDojo database admin user (defaults to defectdojo)
- POSTGRESQL_PASSWORD: DefectDojo database admin password (random generated )
- POSTGRESQL_PORT: DefectDojo database port (defaults to 5432)
- POSTGRESQL_HOST: DefectDojo database host
- AES_KEY: For backup operations, an optional AES encrypted string to encrypt the database dump

#### Ad-Hoc Backup ####

This manual assumes that you are running DefectDojo on kubernetes. The steps would be similar for Docker-compose deployments.

The process of manual PostgreSQL database backup can be generally described as follows:

* sourcing the appropriate kubeconfig file, relative to the environment you are maintaining
* port-forwarding a local port on your machine to the PostgreSQL port on the target pod(s) in the cluster
* running this container

This container image contains the necessary scripts to:

* backup the database with [pg_dump](https://www.postgresql.org/docs/9.3/app-pgdump.html)
* upload the database dump to AWS s3 with [awscli](https://aws.amazon.com/cli/)

#### Ad-Hoc Backup Checklist ####

What you need:

* kubeconfig file or similar, to access k8s cluster resources in the target namespace
* Docker daemon running on your local machine
* AWS access key ID
* AWS secret access key
* Optionally (but highly advised), an AES encrypted string to encrypt the database dump
* PostgreSQL password

#### Backing up the Database ####

Source the __kubeconfig__ file and run the following command to connect an arbitrary local port (any port of your choosing) to the database port (5432, as of this writing):

```
$ export KUBECONFIG=${PATH/TO/KUBECONFIG}
$ kubectl port-forward pods/defectdojo-postgresql-0 3000:5432
Forwarding from 127.0.0.1:3000 -> 5432
Forwarding from [::1]:3000 -> 5432
```

Now, if you haven't yet, build the Docker image and run the following command (in a separate terminal):

```
# if you haven't yet, build the image and give it a name and tag of your liking
$ docker build -t ${IMAGE}:${TAG}
# run the backup command
$ docker run --rm --network host -e CMD=backup -e AWS_ACCESS_KEY_ID="${AWS_ACCESS_KEY_ID}" -e AWS_SECRET_ACCESS_KEY="${AWS_SECRET_ACCESS_KEY}" -e S3_BUCKET="${BUCKET}" -e S3_PREFIX="${PREFIX}" -e POSTGRESQL_DATABASE="defectdojo" -e POSTGRESQL_USER="defectdojo" -e POSTGRESQL_PASSWORD="${POSTGRESQL_PASSWORD}" -e POSTGRESQL_PORT=${LOCAL_PORT} -e POSTGRESQL_HOST="{localhost|host.docker.internal}" -e AES_KEY="${AES_KEY}" ${IMAGE}:${TAG}
Creating dump of defectdojo database from host.docker.internal...
OpenSSL 1.1.1j 21 Jul 2021
SQL backup uploaded successfully
```

Replace the placeholders with appropriate values.

The database dump process can take several minutes depending on the size of the database.

If you are following this procedure on MacOS, use `host.docker.internal` or `gateway.docker.internal` instead of `localhost`.

#### Validating the Backup ####

To validate that the database dump has been stored on AWS, run the following commands:

```
$ aws configure list
Name Value Type Location
---- ----- ---- --------
profile <not set> None None
access_key ******************** env
secret_key ******************** env
region eu-central-1 None AWS_DEFAULT_REGION
$ aws s3 ls s3://${S3_BUCKET}/${S3_PREFIX}/${YYYY}/${MM}/${DD}/
2021-03-05 12:14:36 2125904 defectdojo_12:14:33Z.sql.gz.dat
```

To get access to a shell session with awscli installed by default (and with credentials pre-configured), you can run the same previous docker command, adding the `--rm -it --entrypoint /bin/sh` flags to the `docker run` command.

The best way to assess the effectiveness of this process is by conducting a full backup/restore drill in development/staging environments regularly.

#### Restoring the Database ####

The process of manual PostgreSQL database restore can be generally described as follows:

* sourcing the appropriate kubeconfig file, relative to the environment you are maintaining
* port-forwarding a local port on your machine to the PostgreSQL port on the target pod(s) in the cluster
* running this container

This container image contains the necessary scripts to:

* download the database dump from AWS s3 with [awscli](https://aws.amazon.com/cli/)
* restore the database with [psql](https://www.postgresql.org/docs/13/app-psql.html)

#### Manual Restore Checklist ####

What you need:

* kubeconfig file or similar, to access k8s cluster resources in the target namespace
* Docker daemon running on your local machine
* AWS access key ID
* AWS secret access key
* The AES encrypted string to decrypt the database dump (only applicable if/when database dump is encrypted)
* PostgreSQL password

#### Restoring the Database ####

Source the __kubeconfig__ file and run the following command to connect an arbitrary local port (any port of your choosing) to the database port (5432, as of this writing):

```
$ export KUBECONFIG=${PATH/TO/KUBECONFIG}
$ kubectl port-forward pods/defectdojo-postgresql-0 3000:5432
Forwarding from 127.0.0.1:3000 -> 5432
Forwarding from [::1]:3000 -> 5432
```

Now, run the following command in a separate terminal:

```
$ docker run --rm --network host -e CMD=restore -e RESTORE_TO=${PIT_FILE} -e AWS_ACCESS_KEY_ID="${AWS_ACCESS_KEY_ID}" -e AWS_SECRET_ACCESS_KEY="${AWS_SECRET_ACCESS_KEY}" -e S3_BUCKET="${BUCKET}" -e S3_PREFIX="${PREFIX}" -e POSTGRESQL_DATABASE="defectdojo" -e POSTGRESQL_USER="defectdojo" -e POSTGRESQL_PASSWORD="${POSTGRESQL_PASSWORD}" -e POSTGRESQL_PORT=${LOCAL_PORT} -e POSTGRESQL_HOST="{localhost|host.docker.internal}" -e AES_KEY="${AES_KEY}" ${IMAGE}:${TAG}
Requesting backup file 2021/03/10/defectdojo_14:41:57Z.sql.gz.dat
Fetching defectdojo_14:41:57Z.sql.gz.dat from S3
Restoring defectdojo_14:41:57Z.sql.gz.dat
Restore complete
```

Replace the placeholders with appropriate values. The image and tag are the same as the ones you used to build this image before the backup operation.

The database restore process can take several minutes depending on the size of the database.

If you are following this procedure on MacOS, use `host.docker.internal` or `gateway.docker.internal` instead of `localhost`.

#### Validating the Restore ####

The best way to properly validate that the database has been restored is to head over to the DefectDojo GUI and see if your products, engagements, users and objects are there.
72 changes: 72 additions & 0 deletions backup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
#!/bin/sh

set -e

# check if necessary environment variables are in place

if [ -z "${AWS_ACCESS_KEY_ID}" ]; then
echo "You need to set the AWS_ACCESS_KEY_ID environment variable."
exit 1
fi

if [ -z "${AWS_SECRET_ACCESS_KEY}" ]; then
echo "You need to set the AWS_SECRET_ACCESS_KEY environment variable."
exit 1
fi

if [ -z "${S3_BUCKET}" ]; then
echo "You need to set the S3_BUCKET environment variable."
exit 1
fi

if [ -z "${POSTGRESQL_DATABASE}" ]; then
echo "You need to set the POSTGRESQL_DATABASE environment variable."
exit 1
fi

if [ -z "${POSTGRESQL_HOST}" ]; then
echo "You need to set the POSTGRESQL_HOST environment variable."
exit 1
fi

if [ -z "${POSTGRESQL_USER}" ]; then
echo "You need to set the POSTGRESQL_USER environment variable."
exit 1
fi

if [ -z "${POSTGRESQL_PASSWORD}" ]; then
echo "You need to set the POSTGRESQL_PASSWORD environment variable."
exit 1
fi

if [ -z "${POSTGRESQL_PORT}" ]; then
POSTGRESQL_PORT=5432
fi

export PGPASSWORD=${POSTGRESQL_PASSWORD}

echo "Creating dump of ${POSTGRESQL_DATABASE} database from ${POSTGRESQL_HOST}..."
# print OpenSSL version
openssl version
# online dump the PostgreSQL database
pg_dump -h "${POSTGRESQL_HOST}" -p "${POSTGRESQL_PORT}" -U "${POSTGRESQL_USER}" "${POSTGRESQL_DATABASE}" | gzip > dump.sql.gz
# in case of problems, pg_dump stderr message pipes to gzip,
# causing the error message to be archived instead.
# attempt to sanitize it, assuming error if dump size is less than 1KiB
DUMPSIZE=$(stat -c %s "dump.sql.gz")
echo "${DUMPSIZE}"
if [ "${DUMPSIZE}" -le 1000 ]; then
echo "Database dump less than 1K in size, assuming an error"
exit 1
fi
# if AES_KEY does not exist, upload unencrypted database dump to s3
if [ -z "${AES_KEY}" ]; then
echo "Uploading dump to ${S3_BUCKET}"
(aws s3 cp - s3://"${S3_BUCKET}"/"${S3_PREFIX}"/"$(date +"%Y")"/"$(date +"%m")"/"$(date +"%d")"/"${POSTGRESQL_DATABASE}"_"$(date +"%H:%M:%SZ")".sql.gz < dump.sql.gz) || exit 2
# if AES_KEY exists, upload encrypted database dump to s3
else
openssl enc -in dump.sql.gz -out dump.sql.gz.dat -e -aes256 -pbkdf2 -md sha256 -k "${AES_KEY}"
(aws s3 cp - s3://"${S3_BUCKET}"/"${S3_PREFIX}"/"$(date +"%Y")"/"$(date +"%m")"/"$(date +"%d")"/"${POSTGRESQL_DATABASE}"_"$(date +"%H:%M:%SZ")".sql.gz.dat < dump.sql.gz.dat) || exit 2
fi
# print success message
echo "SQL backup uploaded successfully"
Loading

0 comments on commit 51f5433

Please sign in to comment.