From 192aad0d6e502b2a7c8be8b62a47f8c2fef18663 Mon Sep 17 00:00:00 2001
From: cb-Hades <81743695+cb-Hades@users.noreply.github.com>
Date: Thu, 11 Jul 2024 09:37:47 +0200
Subject: [PATCH] Update docs #11, #5

---
 docs/source/hqtb/hqtb-config.rst  | 14 ++++++++++++++
 docs/source/hqtb/run-pipeline.rst | 14 +++++++-------
 2 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/docs/source/hqtb/hqtb-config.rst b/docs/source/hqtb/hqtb-config.rst
index 377f371..129e5fa 100644
--- a/docs/source/hqtb/hqtb-config.rst
+++ b/docs/source/hqtb/hqtb-config.rst
@@ -4,6 +4,20 @@
 Below, the configuration file with the underlying defaults, is shown.
 
 .. code-block:: yaml 
+    # Configuration file for the SPECIMEN HQTB pipeline
+
+    # Meaning of the default parameters:
+    #    The value __USER__ indicates parameters required to be specified by the user
+    #    The value USER indicates parameters required only in specific cases
+
+    # Meta info:
+    #    model:     USER
+    #    organism:  USER
+    #    date:      USER
+    #    author:    USER
+
+    # Input for the pipeline
+    # ----------------------
 
     # Information about the genome to be used to generate the new model
     subject:
diff --git a/docs/source/hqtb/run-pipeline.rst b/docs/source/hqtb/run-pipeline.rst
index e76d6c6..9e76b4f 100644
--- a/docs/source/hqtb/run-pipeline.rst
+++ b/docs/source/hqtb/run-pipeline.rst
@@ -71,8 +71,7 @@ If you are just starting a new project and do not have all the data ready to go,
 
 | The function above creates the following directory structure for your project.
 | The 'contains' column lists what is supposed to be inside the according folder. 
-  The tags manual/semi/automated report how these files are added to the folder (automated = by the setup function, manual = by the user).
-  ``TODO``: Was bedeutet semi?
+  The tags manual/semi/automated report how these files are added to the folder (automated = by the setup function; semi = multiple steps neccessary, some by the program, some by the user; manual = by the user).
   The tags required/optional report whether this input is necessary to run the pipeline or if it is an optional input.
 
 .. table::
@@ -103,8 +102,9 @@ If you are just starting a new project and do not have all the data ready to go,
 
 .. note::
 
-    Regarding the annotated_genomes folder, the program currently only supports the file types ``GBFF`` and ``FAA`` + ``FNA``.
-    ``TODO``: Für welche Dateien in contains gilt das?
+    Regarding the annotated_genomes folder, the program currently only supports 
+    the file types ``GBFF`` and ``FAA`` + ``FNA`` (from the NCBI and PROKKA annotation pipelines respectively)
+    as genome annotation formats.
 
 Further details for collecting the data:
 
@@ -118,8 +118,7 @@ Further details for collecting the data:
     - One way to build a DIAMOND reference database is to download a set of reference sequences from the NCBI database, e.g. in the **FAA** format.
     - Use the function :code:`specimen.util.util.create_DIAMOND_db_from_folder('/User/path/input/directory', '/User/Path/for/output/', name = 'database', extention = 'faa')` to create a DIAMOND database
     - To speed up the mapping, create an additional mapping file from the e.g. ``GBFF`` files from NCBI using :code:`specimen.util.util.create_NCBIinfo_mapping('/User/path/input/directory', '/User/Path/for/output/', extention = 'gbff')`
-    - To ensure correct mapping to KEGG, an additional information file can be created by constructing a CSV file with the following columns: 'NCBI genome', 'organism', 'locus_tag' (start) and 'KEGG.organism'
-      ``TODO``: Was ist hier mit start gemeint?
+    - To ensure correct mapping to KEGG, an additional information file can be created by constructing a CSV file with the following columns: 'NCBI genome', 'organism', 'locus_tag' (only the part until the seperator '_', the part, that is the same for all locus tags) and 'KEGG.organism'
 
         - The information of the first three columns can be taken from the previous two steps while
         - For the last column the user needs to check, if the genomes have been entered into KEGG and have an organism identifier.
@@ -127,7 +126,8 @@ Further details for collecting the data:
 
 - medium:   
 
-    The media, either for analysis or gap filling can be entered into the pipeline via a config file (each). ``TODO``: Muss wirklich für jedes Medium eine neue Datei erstellt werden?
+    The media, either for analysis or gap filling can be entered into the pipeline via a config file. 
+    The same media file can be used for both or one file for each step can be entered into the pipeline. 
     The config files are from the `refineGEMs <https://github.com/draeger-lab/refinegems/tree/dev-2>`__ :footcite:p:`bauerle2023genome` toolbox and access its in-build medium database. 
     Additionally, the config files allow for manual adjustment / external input.