Skip to content

Commit

Permalink
Docs/general update nov2023 (#67)
Browse files Browse the repository at this point in the history
  • Loading branch information
blankdots authored Nov 16, 2023
2 parents 610900b + dfb3354 commit a0532d8
Show file tree
Hide file tree
Showing 15 changed files with 75 additions and 7,434 deletions.
3 changes: 2 additions & 1 deletion .github/workflows/auto-merge-update.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,15 @@ permissions:
jobs:
dependabot:
runs-on: ubuntu-latest
if: ${{ github.actor == 'neicnordic' && github.head_ref == 'create-pull-request/patch' }}
if: ${{ github.head_ref == 'create-pull-request/patch' }}
steps:
- name: Wait other jobs are passed or failed
uses: kachick/wait-other-jobs@v2
timeout-minutes: 30
with:
github-token: "${{ secrets.GITHUB_TOKEN }}"
- name: Enable auto-merge for docs update PRs
if: contains('Update from neicnordic/sensitive-data-archive', github.event.head_commit.message)
run: gh pr review --approve "$PR_URL" && gh pr merge --auto --merge "$PR_URL"
env:
PR_URL: ${{github.event.pull_request.html_url}}
Expand Down
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Recommended provisioning methods provided for production are:

* on a [Kubernetes cluster](https://github.com/neicnordic/sda-helm/), using `kubernetes` and `helm` charts;
* on a [Kubernetes cluster](https://github.com/neicnordic/sensitive-data-archive/tree/main/charts), using `kubernetes` and `helm` charts;
* on a [Docker Swarm cluster](https://github.com/neicnordic/LocalEGA-deploy-swarm), using `gradle` and `docker swarm`.

## Architecture
Expand All @@ -11,30 +11,30 @@ SDA is divided into several components, which can be deployed either for Federat

### Core Components

Source code for core components (unless specified otherwise) is available at: https://github.com/neicnordic/sda-pipeline
Source code for core components is available at: https://github.com/neicnordic/sensitive-data-archive

| Component | Role |
|---------------|------|
| inbox | SFTP, S3 or HTTPS server, acting as a dropbox, where user credentials are fetched from CentralEGA or via ELIXIR AAI. https://github.com/neicnordic/sda-s3proxy/ or https://github.com/neicnordic/sda-inbox-sftp |
| inbox | SFTP, S3 or HTTPS server, acting as a dropbox, where user credentials are fetched from CentralEGA or via LifeScience AAI. [s3inbox](https://github.com/neicnordic/sensitive-data-archive/tree/main/sda/cmd/s3inbox/s3inbox.md) or [sftp-inbox](https://github.com/neicnordic/sensitive-data-archive/tree/main/sda-sftp-inbox/README.md) |
| intercept | The intercept service relays message between the queue provided from the federated service and local queues. **(Required for Federated EGA use case)** |
| ingest | Split the Crypt4GH header and move the remainder to the storage backend. No cryptographic task, nor access to the decryption keys. |
| verify | Decrypt the stored files and checksum them against their embedded checksum. |
| archive | Storage backend: as a regular file system or as a S3 object store. |
| finalize | Handle the so-called _Accession ID_ to filename mappings from CentralEGA. |
| mapper | The mapper service register mapping of accessionIDs (IDs for files) to datasetIDs. |
| data out API | Provides a download/data access API for streaming archived data either in encrypted or decrypted format - source at: https://github.com/neicnordic/sda-doa |
| download | Provides a download/data access API for streaming (decrypted) archived data - source at: [https://github.com/neicnordic/sensitive-data-archive](https://github.com/neicnordic/sensitive-data-archive/tree/main/sda-download/README.md) |

### Associated components

| Component | Role |
|---------------|------|
| db | A Postgres database with appropriate schemas and isolations https://github.com/neicnordic/sda-db/ |
| mq | A (local) RabbitMQ message broker with appropriate accounts, exchanges, queues and bindings, connected to the CentralEGA counter-part. https://github.com/neicnordic/sda-mq/ |
| db | A [Postgres database](https://github.com/neicnordic/sensitive-data-archive/tree/main/postgresql) with appropriate schemas and isolations |
| mq | A [(local) RabbitMQ](https://github.com/neicnordic/sensitive-data-archive/tree/main/rabbitmq) message broker with appropriate accounts, exchanges, queues and bindings, connected to the CentralEGA counter-part. |


### Stand-alone components

| Component | Role |
|---------------|------|
| metadata | Component used in standalone version of SDA. Provides an interface and backend to submit Metadata and associated with a file in the Archive. https://github.com/neicnordic/sda-metadata-mirror/ with UI https://github.com/neicnordic/FormSubmission_UI |
| orchestrate | Component that automates ingestion in stand-alone deployments of SDA Pipeline https://github.com/neicnordic/sda-orchestration |
19 changes: 8 additions & 11 deletions docs/connection.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,16 +10,15 @@ The RabbitMQ message brokers of each SDA instance are the **only**
components with the necessary credentials to connect to Central EGA
message broker.

We call `CEGAMQ` and `LocalMQ` (Local Message Broker, also known as
`sda-mq`), the RabbitMQ message brokers of, respectively, `Central EGA`
and `SDA`/`LocalEGA`.
We call `CEGAMQ` and `LocalMQ` (Local Message Broker, sometimes know as `sda-mq`),
the RabbitMQ message brokers of, respectively, `Central EGA` and `SDA`/`LocalEGA`.

Local Message Broker
--------------------

> NOTE:
> Source code repository for MQ component is available at:
> [https://github.com/neicnordic/sda-mq](https://github.com/neicnordic/sda-mq)
> [sensitive-data-archive RabbitMQ](https://github.com/neicnordic/sensitive-data-archive/tree/main/rabbitmq)

### Configuration
Expand Down Expand Up @@ -85,7 +84,6 @@ following queues, in the default `vhost`:
Name | Purpose
:----------------|:---------------------------------------
archived | Archived files.
backup | Signal files to backup
completed | Files are backed up
error | User-related errors
files | Receive notification for ingestion from `CEGAMQ` or Orchestrator
Expand Down Expand Up @@ -147,7 +145,7 @@ Central EGA and any Local EGAs. Central EGA's messages are
JSON-formatted.

The JSON schemas can be found in:
<https://github.com/neicnordic/sda-pipeline/tree/master/schemas>
<https://github.com/neicnordic/sensitive-data-archive/tree/main/sda/schemas>

When a `Submission Inbox` sends an `upload` message to CentralEGA it contains the
following:
Expand Down Expand Up @@ -210,7 +208,7 @@ of messages:
> sha256 checksum will be calculated by `Ingest` service.
The message received from Central EGA to start ingestion at a Federated EGA node.
Processed by the the sda-pipeline `ingest` service.
Processed by the the `ingest` service.

```javascript
{
Expand Down Expand Up @@ -258,9 +256,8 @@ adding the [Accession ID]{.title-ref}.
```

`Finalize` service should receive the message below and assign the
`Accession ID` to the corresponding file and send a message to `backup`
queue for the backup services or in case there is no backup service to
the `completed` queue.
`Accession ID` to the corresponding file and send a message to the `completed` queue
when the `accession ID` has been set (in case of Federated EGA this also means backup copy has been done).

```javascript
{
Expand All @@ -275,7 +272,7 @@ the `completed` queue.
}
```

The message sent from the sda-pipeline `finalize` service to the `backup` service via `completed` queue.
The message sent from the `finalize` service to the `completed` queue.

```javascript
{
Expand Down
2 changes: 1 addition & 1 deletion docs/dataout.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ SDA-download
Recommended provisioning method for production is:

- on a `kubernetes cluster` using the [helm
chart](https://github.com/neicnordic/sda-helm/).
chart](https://github.com/neicnordic/sensitive-data-archive/tree/main/charts).

`sda-download` focuses on enabling deployment of a stand-alone version
of SDA, with features such as:
Expand Down
10 changes: 5 additions & 5 deletions docs/db.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
Database Setup
==============

We use a Postgres database (version 13+ ) to store intermediate data, in
We use a Postgres database (version 15+ ) to store intermediate data, in
order to track progress in file ingestion. The `lega` database schema is
documented below.

> NOTE:
> Source code repository for DB component is available at:
> <https://github.com/neicnordic/sda-db>
> <https://github.com/neicnordic/sensitive-data-archive/tree/main/postgresql>
The database container will initialize and create the necessary database
structure and functions if started with an empty area. Procedures for
*backing up the database* are important but considered out of scope for
the secure data archive project.

Look at [the SQL
definitions](https://github.com/neicnordic/sda-db/tree/master/initdb.d)
definitions](https://github.com/neicnordic/sensitive-data-archive/tree/main/postgresql/initdb.d)
if you are also interested in the database triggers.

Configuration
Expand Down Expand Up @@ -82,8 +82,8 @@ both the database initialization scripts (and bumping the bootstrapped
schema version) as well as creating the corresponding migration script
to perform the changes on a database in use.

Migration scripts should be placed in `/migratedb.d/` in the sda-db repo
(<https://github.com/neicnordic/sda-db>). We recommend naming them
Migration scripts should be placed in `/migratedb.d/` in the *sensitive-data-archive* repo
(<https://github.com/neicnordic/sensitive-data-archive/tree/main/postgresql>). We recommend naming them
corresponding to the schema version they provide migration to. There is
an "empty" migration script (`01.sql`) that can be used as a
template.
23 changes: 8 additions & 15 deletions docs/deploy.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,22 +9,15 @@ Swarm](https://docs.docker.com/engine/swarm/) for production.

The production deployment repositories are:

- [Kubernetes Helm charts](https://github.com/neicnordic/sda-helm/);
- [Kubernetes Helm charts](https://github.com/neicnordic/sensitive-data-archive/tree/main/charts);
- [Docker Swarm
deployment](https://github.com/neicnordic/LocalEGA-deploy-swarm/).

The following container images are used in the deployments:
`neicnordic/sensitive-data-archive`, provides the SDA services as well as PostgreSQL and RabbitMQ configuration. The following container image is used in the deployments where the tag separates between services:

- `neicnordic/sda-pipeline`, provides the LocalEGA services (minimal
container with static binary and support files).
- `neicnordic/sda-mq`, provides the broker (mq) service (based on
*rabbitmq:3.8.16-management-alpine*;
- `neicnordic/sda-db`, provides the database service (based on
*postgres:13-alpine3.14*);
- `neicnordic/sda-inbox-sftp`, provides the inbox service via sftp
(based on Apache Mina, container base
*openjdk:13-alpine*);
- `neicnordic/sda-doa`, provides the data out service (Data Out API);
- `neicnordic/sda-s3-proxy`, provides the inbox service via a s3 proxy
(S3 proxy inbox, minimal container with static binary and support
files).
- `ghcr.io/neicnordic/sensitive-data-archive:<version>-postgres` - PostgreSQL database
- `ghcr.io/neicnordic/sensitive-data-archive:<version>-rabbitmq` - RabbitMQ message broker
- `ghcr.io/neicnordic/sensitive-data-archive:<version>-sftp-inbox` - sftp inbox
- `ghcr.io/neicnordic/sensitive-data-archive:<version>-auth` - authentication service
- `ghcr.io/neicnordic/sensitive-data-archive:<version>-download` - download service
- `ghcr.io/neicnordic/sensitive-data-archive:<version>` - all other services such as: `finalize`, `ingest`, `intercept`, `verify`, `mapper` and `s3inbox`
44 changes: 23 additions & 21 deletions docs/dictionary/wordlist.txt
Original file line number Diff line number Diff line change
@@ -1,20 +1,3 @@
BACKUPPUBKEY
CHUNKSIZE
CLIENTKEY
GetArchived
GetHeader
GetHeaderForStableId
GetInboxPath
UpdateDatasetEvent
InsertFile
Keyfile
LIBPQ
MapFilesToDataset
MarkCompleted
MarkReady
SSLMODE
SetArchived
StoreHeader
aaf
aai
aaiconnectprofile
Expand All @@ -33,10 +16,12 @@ apis
atitle
auth
backend
backuppubkey
bbug
blockquote
bmi
bugfix
buildvcs
cacert
ccacd
cega
Expand All @@ -47,8 +32,9 @@ cgktxeg
chacha
checksumed
checksums
chunksize
clientcert
clinetkey
clientkey
cmd
cn
conffile
Expand All @@ -75,6 +61,7 @@ decryptedchecksums
decryptor
dev
discoverable
dns
doi
dsn
ebi
Expand All @@ -93,7 +80,12 @@ filepath
filesystem
fjddcmrvlawqmvrbly
formsubmission
getarchived
getheader
getheaderforstableid
getinboxpath
gh
ghcr
golang
gradle
hostname
Expand All @@ -103,7 +95,10 @@ htslib
https
ietf
img
init
initd
initdb
insertfile
isolations
jks
jku
Expand All @@ -121,9 +116,13 @@ kubernetes
latters
lega
libpq
lifescience
localega
localmq
logstash
mapfilestodataset
markcompleted
markready
microservice
microservices
migratedb
Expand Down Expand Up @@ -164,6 +163,7 @@ posix
postgres
postgresql
pre
prefetchcount
prepended
publickey
rabbitmq
Expand All @@ -187,14 +187,18 @@ schemas
sda
sda's
secretkey
setaccessionid
setarchived
sftp
sha
smth
somedir
src
sshd
ssl
sslmode
startcoordinate
storeheader
svg
sysdevs
tada
Expand All @@ -210,6 +214,7 @@ ui
uio
unencrypted
unioslo
updatedatasetevent
uppsala
useif
userinfo
Expand All @@ -226,6 +231,3 @@ wyenrumyh
yaml
yihkqimti
yml
PREFETCHCOUNT
SetAccessionID
DNS
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Overall architecture

The main components and interaction partners of the NeIC Sensitive Data Archive deployment in a Federated EGA setup, are illustrated in the figure below. The different colored backgrounds represent different zones of separation in the federated deployment.

![](https://docs.google.com/drawings/d/e/2PACX-1vSCqC49WJkBduQ5AJ1VdwFq-FJDDcMRVLaWQmvRBLy7YihKQImTi41WyeNruMyH1DdFqevQ9cgKtXEg/pub?w=1440&amp;h=810)
![](https://docs.google.com/drawings/d/e/2PACX-1vSCqC49WJkBduQ5AJ1VdwFq-FJDDcMRVLaWQmvRBLy7YihKQImTi41WyeNruMyH1DdFqevQ9cgKtXEg/pub?w=960&h=540)

The components illustrated can be classified by which archive sub-process they take part in:

Expand Down
21 changes: 15 additions & 6 deletions docs/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,28 @@ Installation
============

The sources for SDA can be downloaded and installed from the [NeIC
Github repo](https://github.com/neicnordic/sda-pipeline).
Github repo](https://github.com/neicnordic/sensitive-data-archive).

In order to build binaries:
```bash
$ git clone https://github.com/neicnordic/sda-pipeline.git
$ go build
$ git clone https://github.com/neicnordic/sensitive-data-archive.git
$ cd sda
$ for p in cmd/*; do go build -buildvcs=false -o "${p/cmd\//sda-}" "./$p"; done
```

To be able to develop the source code
```bash
$ git clone https://github.com/neicnordic/sensitive-data-archive.git
$ go work init
$ go work use ./sda
$ cd sda
```

The recommended method is however to use one of our deployment
strategies:

- [Kubernetes Helm charts](https://github.com/neicnordic/sda-helm/);
- [Docker
Swarm](https://github.com/neicnordic/LocalEGA-deploy-swarm/).
- [Kubernetes Helm charts](https://github.com/neicnordic/sensitive-data-archive/tree/main/charts);
- [Docker Swarm](https://github.com/neicnordic/LocalEGA-deploy-swarm/).

Configuration
-------------
Expand Down
1 change: 0 additions & 1 deletion docs/static/components.svg

This file was deleted.

Loading

0 comments on commit a0532d8

Please sign in to comment.