Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs/bugs rephrasing hacakathon #79

Merged
merged 9 commits into from
Dec 1, 2023
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/new-question.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: Support Issue
about: Ask for support on running and/or developing LocalEGA
about: Ask for support on running and/or developing FederatedEGA
labels: Support

---
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Source code for core components is available at: https://github.com/neicnordic/s

| Component | Role |
|---------------|------|
| inbox | SFTP, S3 or HTTPS server, acting as a dropbox, where user credentials are fetched from CentralEGA or via LifeScience AAI. [s3inbox](https://github.com/neicnordic/sensitive-data-archive/tree/main/sda/cmd/s3inbox/s3inbox.md) or [sftp-inbox](https://github.com/neicnordic/sensitive-data-archive/tree/main/sda-sftp-inbox/README.md) |
| inbox | SFTP, S3 or HTTPS server, acting as a dropbox, where user credentials are fetched from CentralEGA or via [Life Science AAI](https://lifescience-ri.eu/). [s3inbox](https://github.com/neicnordic/sensitive-data-archive/tree/main/sda/cmd/s3inbox/s3inbox.md) or [sftp-inbox](https://github.com/neicnordic/sensitive-data-archive/tree/main/sda-sftp-inbox/README.md) |
| intercept | The intercept service relays message between the queue provided from the federated service and local queues. **(Required for Federated EGA use case)** |
| ingest | Split the Crypt4GH header and move the remainder to the storage backend. No cryptographic task, nor access to the decryption keys. |
| verify | Decrypt the stored files and checksum them against their embedded checksum. |
Expand Down
30 changes: 15 additions & 15 deletions docs/connection.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
Interfacing with CEGA ⇌ SDA
===========================

All Local EGA instances are connected to Central EGA using
All `FederatedEGA` instances are connected to `CentralEGA` using
[RabbitMQ](http://www.rabbitmq.com), a Message Broker, that allows the
components to send and receive messages, which are queued, not lost, and
resent on network failure or connection problems.

The RabbitMQ message brokers of each SDA instance are the **only**
components with the necessary credentials to connect to Central EGA
components with the necessary credentials to connect to `CentralEGA`
message broker.

We call `CEGAMQ` and `LocalMQ` (Local Message Broker, sometimes know as `sda-mq`),
the RabbitMQ message brokers of, respectively, `Central EGA` and `SDA`/`LocalEGA`.
the RabbitMQ message brokers of, respectively, `CentralEGA` and `SDA`/`FederatedEGA`.

Local Message Broker
--------------------
Expand Down Expand Up @@ -49,7 +49,7 @@ The following environment variables can be used to configure the broker:
> would need to be set up to send and recive messages between other
> services.

Central EGA connection
CentralEGA connection
----------------------

`CEGAMQ` declares a `vhost` for each SDA instance. It also creates the
Expand Down Expand Up @@ -102,7 +102,7 @@ Service will wait for messages to arrive.

> NOTE:
> More information can be found also at
> [localEGA](https://localega.readthedocs.io/en/latest/amqp.html#message-interface-api-cega-connect-lega).
> [localEGA repository](https://localega.readthedocs.io/en/latest/amqp.html#message-interface-api-cega-connect-lega) - repository that provides functionality for `FederatedEGA` use case.

`CEGAMQ` receives notifications from `LocalMQ` using a *shovel*.
Everything that is published to its `to_cega` exchange gets forwarded to
Expand All @@ -118,30 +118,30 @@ workflow to CentralEGA, using the following routing keys:
| files.verified | For files ready to request accessionID |

Note that we do not need at the moment a queue to store the completed
message, nor the errors, as we forward them to Central EGA.
message, nor the errors, as we forward them to `CentralEGA`.

![RabbitMQ setup](./static/CEGA-LEGA.png)

Connecting SDA to Central EGA
Connecting SDA to CentralEGA
-----------------------------

Central EGA only has to prepare a user/password pair along with a
`CentralEGA` only has to prepare a user/password pair along with a
`vhost` in their RabbitMQ.

When Central EGA has communicated these details to the given Local EGA
instance, the latter can contact Central EGA using the federated queue
When `CentralEGA` has communicated these details to the given `FederatedEGA`
instance, the latter can contact `CentralEGA` using the federated queue
and the shovel mechanism in their local broker.

CentralEGA should then see 2 incoming connections from that new LocalEGA
`CentralEGA` should then see 2 incoming connections from that new `FederatedEGA`
instance, on the given `vhost`.

The exchanges and routing keys will be the same as all the other
LocalEGA instances, since the clustering is done per `vhost`.
`FederatedEGA` instances, since the clustering is done per `vhost`.

### Message Format

It is necessary to agree on the format of the messages exchanged between
Central EGA and any Local EGAs. Central EGA's messages are
`CentralEGA` and any `FederatedEGA`s. `CentralEGA`'s messages are
JSON-formatted.

The JSON schemas can be found in:
Expand Down Expand Up @@ -200,14 +200,14 @@ of messages:
- `type=cancel`: an ingestion cancellation
- `type=accession`: contains an accession id
- `type=mapping`: contains a dataset to accession ids mapping
- `type=heartbeat`: A mean to check if the Local EGA instance is
- `type=heartbeat`: A mean to check if the `FederatedEGA` instance is
"alive"

> IMPORTANT:
> The `encrypted_checksums` key is optional. If the key is not present the
> sha256 checksum will be calculated by `Ingest` service.

The message received from Central EGA to start ingestion at a Federated EGA node.
The message received from `CentralEGA` to start ingestion at a Federated EGA node.
Processed by the the `ingest` service.

```javascript
Expand Down
4 changes: 1 addition & 3 deletions docs/dataout.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,9 +84,7 @@ and can't expose REST API (but still can receive RabbitMQ messages).
Handling Permissions
--------------------

Data Out API can be run with connection to an AAI or without. In the
case connection to an AAI provider is not possible the
`PASSPORT_PUBLIC_KEY_PATH` and `CRYPT4GH_PRIVATE_KEY_PATH` need to be
Data Out API can be run with connection to an AAI or without. If connection to an AAI provider is not possible, the `PASSPORT_PUBLIC_KEY_PATH` and `CRYPT4GH_PRIVATE_KEY_PATH` need to be
set.

> NOTE:
Expand Down
5 changes: 2 additions & 3 deletions docs/db.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,7 @@ documented below.
> <https://github.com/neicnordic/sensitive-data-archive/tree/main/postgresql>

The database container will initialize and create the necessary database
structure and functions if started with an empty area. Procedures for
*backing up the database* are important but considered out of scope for
structure and functions if started with an empty area. Procedures for *backing up the database* are important, however considered out of scope for
the secure data archive project.

Look at [the SQL
Expand Down Expand Up @@ -65,7 +64,7 @@ changes are required that risk being time consuming on large databases,
it may be best to split that work in small chunks.

Doing so helps in both demonstrating progress as well as avoiding
rollbacks of the entire process (and thus working needing to be done) if
rollbacks of the entire process, in case that
something fails. Each schema migration is done in a transaction.

Schema versions are integers. There is no strong coupling between
Expand Down
3 changes: 2 additions & 1 deletion docs/dictionary/wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,6 @@ confpath
controlledaccessgrants
copyheader
creds
cryptograhy
cryptographic
cscfi
dac
Expand All @@ -58,6 +57,7 @@ decrypt
decryptable
decrypted
decryptedchecksums
decrypting
decryptor
dev
discoverable
Expand All @@ -74,6 +74,7 @@ egas
endcoordinate
envs
exportrequests
federatedega
fega
fileid
filepath
Expand Down
19 changes: 8 additions & 11 deletions docs/encryption.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,27 +5,24 @@ The secure data archive uses public key cryptography extensively for
maintaining data privacy throughout the various stages.

- Files uploaded to the secure data archive are pre-encrypted (on the
user side) with public key based cryptograhy (this is in addition to
user side) with public key based Cryptography (this is in addition to
any transport encryption provided for the connection, e.g. TLS or
the encryption provided by ssh for the sftp inbox service).
- During the ingestion process, the files are decrypted and
re-encrypted with another key to provide for the archiving.
the encryption provided by SSH for the SFTP inbox service).
- During the ingestion process, the files are decrypted and checksum is computed;
- Finally, if the data is requested, it is again decrypted and
possibly reencrypted with a suitable key for the user (again, in
addition to any transport encryption).
possibly re-encrypted with a suitable key for the user (again, in addition to any transport encryption).

Files submitted are in the `Crypt4GH` file format, which provides the
Files submitted should be in the `Crypt4GH` file format, which provides the
ability to decrypt parts of encrypted files without having to start
decrypt all data up to the desired area (useful for e.g. streaming).
decrypting all data up to the desired area (useful for e.g. streaming).

The details of the file format used are provided at
[Crypt4GH file format](http://samtools.github.io/hts-specs/crypt4gh.pdf), and summarized below.

A random session key (of 256 bits) is generated to seed a ChaCha20
engine, with Poly1305 authentication mode. For each segment of at most
64kB of data, a nonce is randomly generated and prepended to the
segment. Using the two latters, the original file is segmented and each
segment is encrypted.
segment. Using the latter two, the original file is segmented and each segment is encrypted.

The header is prepended to the encrypted data, it also contains, the
word `crypt4gh`, the format version, the number of header packets, and
Expand All @@ -47,7 +44,7 @@ The advantages of the format are, among others:
- Re-arranging the file to chunk a portion requires only to decrypt
the header, re-encrypt with an edit list, and select the cipher
segments surrounding the portion. The file itself is not decrypted
and reencrypted.
and re-encrypted.

In order to encrypt files using this standard we recommend the following
tools:
Expand Down
2 changes: 1 addition & 1 deletion docs/guides/deploy-k8s.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ This chart deploys a pre-configured database ([PostgreSQL](https://www.postgresq

### sda-mq - RabbitMQ component for Sensitive Data Archive (SDA) installation

This chart deploys a pre-configured message broker ([RabbitMQ](https://www.rabbitmq.com/)) designed to work [European Genome-Phenome Archive](https://ega-archive.org/) federated messaging interface between Central EGA and Local/Federated EGAs.
This chart deploys a pre-configured message broker ([RabbitMQ](https://www.rabbitmq.com/)) designed to work [European Genome-Phenome Archive](https://ega-archive.org/) federated messaging interface between `CentralEGA` and Local/Federated EGAs.

### sda-svc - Components for Sensitive Data Archive (SDA) installation

Expand Down
2 changes: 1 addition & 1 deletion docs/guides/deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
> If you have feedback to give on the content, please contact us on
> [github](https://github.com/neicnordic/neic-sda)!

Different nodes of the Federated EGA network, and projects using the standalone SDA
Different nodes of the Federated EGA network, and projects using the stand-alone SDA
have made different decisions in how to deploy the system.
Adaptations needs to be made depending on the system to deploy on,
as well as the requirements of your deployment.
Expand Down
2 changes: 1 addition & 1 deletion docs/guides/federated-or-standalone.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Federated or Standalone Archive
# Federated or Stand-alone Archive

> TODO:
> This guide is a stub and has yet to be finished.
Expand Down
2 changes: 1 addition & 1 deletion docs/guides/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Next step is to make sure that the remote connections (CEGA RabbitMQ) are workin

## End-to-end testing

NOTE: This guide assumes that there exists a test instance account with Central EGA. Make sure that the account is approved and added to the submitters group.
NOTE: This guide assumes that there exists a test instance account with `CentralEGA`. Make sure that the account is approved and added to the submitters group.

### Upload file(s)

Expand Down
Loading