Update from neicnordic/sensitive-data-archive at 09:00 on 2023-12-01

neicnordic · Dec 1, 2023 · dafff4e · dafff4e
1 parent d8e4584
commit dafff4e
Show file tree

Hide file tree

Showing 6 changed files with 42 additions and 37 deletions.
diff --git a/docs/services/finalize.md b/docs/services/finalize.md
@@ -1,6 +1,6 @@
 # finalize Service
 
-Handles the so-called _Accession ID (stable ID)_ to filename mappings from Central EGA.
+Handles the so-called _Accession ID (stable ID)_ to filename mappings from `CentralEGA`.
 At the same time the service fulfills the replication requirement of having distinct backup copies.
 For more information see [Federated EGA Node Operations v2](https://ega-archive.org/assets/files/EGA-Node-Operations-v2.pdf) document.
 
@@ -11,15 +11,18 @@ If a backup location is configured it will perform backup of a file.
 When running, `finalize` reads messages from the configured RabbitMQ queue (commonly: `accession`).
 For each message, these steps are taken (if not otherwise noted, errors halt progress and the service moves on to the next message):
 
-1. The message is validated as valid JSON that matches the `ingestion-accession` schema. If the message can’t be validated it is discarded with an error message in the logs.
+1. The message is validated as valid JSON that matches the `ingestion-accession` schema. 
+    - If the message can’t be validated it is discarded with an error message in the logs.
 2. If the service is configured to perform backups i.e. the `ARCHIVE_` and `BACKUP_` storage backend are set. Archived files will be copied to the backup location.
    1. The file size on disk is requested from the storage system.
    2. The database file size is compared against the disk file size.
    3. A file reader is created for the archive storage file, and a file writer is created for the backup storage file.
 3. The file data is copied from the archive file reader to the backup file writer.
 4. If the type of the `DecryptedChecksums` field in the message is `sha256`, the value is stored.
-5. A new RabbitMQ `complete` message is created and validated against the `ingestion-completion` schema. If the validation fails, an error message is written to the logs.
-6. The file accession ID in the message is marked as *ready* in the database. On error the service sleeps for up to 5 minutes to allow for database recovery, after 5 minutes the message is Nacked, re-queued and an error message is written to the logs.
+5. A new RabbitMQ `complete` message is created and validated against the `ingestion-completion` schema. 
+    - If the validation fails, an error message is written to the logs.
+6. The file accession ID in the message is marked as *ready* in the database. 
+    - On error the service sleeps for up to 5 minutes to allow for database recovery, after 5 minutes the message is Nacked, re-queued and an error message is written to the logs.
 7. The complete message is sent to RabbitMQ. On error, a message is written to the logs.
 8. The original RabbitMQ message is Ack'ed.
 

diff --git a/docs/services/ingest.md b/docs/services/ingest.md
@@ -11,28 +11,28 @@ When running, `ingest` reads messages from the configured RabbitMQ queue (common
 For each message, these steps are taken (if not otherwise noted, errors halt progress and the service moves on to the next message):
 
 1. The message is validated as valid JSON that matches the `ingestion-trigger` schema.
-If the message can’t be validated it is discarded with an error message in the logs.
+    - If the message can’t be validated it is discarded with an error message in the logs.
 2. If the message is of type `cancel`, the file will be marked as `disabled` and the next message in the queue will be read.
 3. A file reader is created for the filepath in the message.
-If the file reader can’t be created an error is written to the logs, the message is Nacked and forwarded to the error queue.
+    - If the file reader can’t be created an error is written to the logs, the message is Nacked and forwarded to the error queue.
 4. The file size is read from the file reader.
-On error, the error is written to the logs, the message is Nacked and forwarded to the error queue.
+    - On error, the error is written to the logs, the message is Nacked and forwarded to the error queue.
 5. A uuid is generated, and a file writer is created in the archive using the uuid as filename.
-On error the error is written to the logs and the message is Nacked and then re-queued.
+    - On error the error is written to the logs and the message is Nacked and then re-queued.
 6. The filename is inserted into the database along with the user id of the uploading user. In case the file is already existing in the database, the status is updated.
-Errors are written to the error log.
-Errors writing the filename to the database do not halt ingestion progress.
+    - Errors are written to the error log.
+    - Errors writing the filename to the database do not halt ingestion progress.
 7. The header is read from the file, and decrypted to ensure that it’s encrypted with the correct key.
-If the decryption fails, an error is written to the error log, the message is Nacked, and the message is forwarded to the error queue.
+    - If the decryption fails, an error is written to the error log, the message is Nacked, and the message is forwarded to the error queue.
 8. The header is written to the database.
-Errors are written to the error log.
+    - Errors are written to the error log.
 9. The header is stripped from the file data, and the remaining file data is written to the archive.
-Errors are written to the error log.
+    - Errors are written to the error log.
 10. The size of the archived file is read.
-Errors are written to the error log.
+    - Errors are written to the error log.
 11. The database is updated with the file size, archive path, and archive checksum, and the file is set as *archived*.
-Errors are written to the error log.
-This error does not halt ingestion.
+    - Errors are written to the error log.
+    - This error does not halt ingestion.
 12. A message is sent back to the original RabbitMQ broker containing the upload user, upload file path, database file id, archive file path and checksum of the archived file.
 
 ## Communication

diff --git a/docs/services/intercept.md b/docs/services/intercept.md
@@ -1,6 +1,6 @@
 # intercept Service
 
-The `intercept` service relays messages between Central EGA and Federated EGA nodes.
+The `intercept` service relays messages between `CentralEGA` and Federated EGA nodes.
 
 ## Service Description
 
@@ -10,7 +10,8 @@ For each message, these steps are taken:
 1. The message type is read from the message `type` field.
    1. If the message `type` is not known, an error is logged and the message is Ack'ed.
 2. The correct queue for the message is decided based on message type.
-3. The message is sent to the queue. This has no error handling as the resend-mechanism hasn't been finished.
+3. The message is sent to the queue. 
+   - This has no error handling as the resend-mechanism hasn't been finished.
 4. The message is Ack'ed.
 
 ## Communication

diff --git a/docs/services/mapper.md b/docs/services/mapper.md
@@ -11,11 +11,11 @@ When running, `mapper` reads messages from the configured RabbitMQ queue (common
 For each message, these steps are taken (if not otherwise noted, errors halt progress and the service moves on to the next message):
 
 1. The message is validated as valid JSON that matches the `dataset-mapping` schema.  
-If the message can’t be validated it is discarded with an error message is logged.
+    - If the message can’t be validated it is discarded with an error message is logged.
 2. AccessionIDs from the message are mapped to a datasetID (also in the message) in the database.  
-On error the service sleeps for up to 5 minutes to allow for database recovery, after 5 minutes the message is Nacked, re-queued and an error message is written to the logs.
+    - On error the service sleeps for up to 5 minutes to allow for database recovery, after 5 minutes the message is Nacked, re-queued and an error message is written to the logs.
 3. The uploaded files related to each AccessionID is removed from the inbox  
-If this fails an error will be written to the logs.
+    - If this fails an error will be written to the logs.
 4. The RabbitMQ message is Ack'ed.
 
 ## Communication

diff --git a/docs/services/sda.md b/docs/services/sda.md
@@ -17,5 +17,5 @@ The SDA submission pipeline has four main steps:
 
 There are also three additional support services:
 
-1. [Intercept](cmd/intercept/intercept.md) relays messages from Central EGA to the system.
+1. [Intercept](cmd/intercept/intercept.md) relays messages from `CentralEGA` to the system.
 2. [s3inbox](cmd/s3inbox/s3inbox.md) proxies uploads to the an S3 compatible storage backend.
diff --git a/docs/services/verify.md b/docs/services/verify.md
@@ -11,27 +11,28 @@ For each message, these steps are taken (if not otherwise noted, errors halt pro
 Unless explicitly stated, error messages are *not* written to the RabbitMQ error queue, and messages are not NACK or ACKed.):
 
 1. The message is validated as valid JSON that matches the `ingestion-verification` schema.
-If the message can’t be validated it is discarded with an error message in the logs.
+    - If the message can’t be validated it is discarded with an error message in the logs.
 2. The service attempts to fetch the header for the file id in the message from the database.
-If this fails a NACK will be sent for the RabbitMQ message, the error will be written to the logs, and sent to the RabbitMQ error queue.
+    - If this fails a NACK will be sent for the RabbitMQ message, the error will be written to the logs, and sent to the RabbitMQ error queue.
 3. The file size of the encrypted file is fetched from the archive storage system.
-If this fails an error will be written to the logs.
+    - If this fails an error will be written to the logs.
 4. The archive file is then opened for reading.
-If this fails an error will be written to the logs and to the RabbitMQ error queue.
+    - If this fails an error will be written to the logs and to the RabbitMQ error queue.
 5. A decryptor is opened with the archive file.
-If this fails an error will be written to the logs.
+    - If this fails an error will be written to the logs.
 6. The file size, md5 and sha256 checksum will be read from the decryptor.
-If this fails an error will be written to the logs.
+    - If this fails an error will be written to the logs.
 7. If the `re_verify` boolean is not set in the RabbitMQ message, the message processing ends here, and continues with the next message.
-Otherwise the processing continues with verification:
-  1. A verification message is created, and validated against the `ingestion-accession-request` schema.
-  If this fails an error will be written to the logs.
-  2. The file is marked as *verified* in the database (*COMPLETED* if you are using database schema <= `3`).
-  If this fails an error will be written to the logs.
-  3. The verification message created in step 7.1 is sent to the `verified` queue.
-  If this fails an error will be written to the logs.
-  4. The original RabbitMQ message is ACKed.
-  If this fails an error is written to the logs, but processing continues to the next step.
+
+    - Otherwise the processing continues with verification:
+      1. A verification message is created, and validated against the `ingestion-accession-request` schema.
+          - If this fails an error will be written to the logs.
+      2. The file is marked as *verified* in the database (*COMPLETED* if you are using database schema <= `3`).
+          - If this fails an error will be written to the logs.
+      3. The verification message created in step 7.1 is sent to the `verified` queue.
+          - If this fails an error will be written to the logs.
+      4. The original RabbitMQ message is ACKed.
+          - If this fails an error is written to the logs, but processing continues to the next step.
 
 ## Communication