From 192aad0d6e502b2a7c8be8b62a47f8c2fef18663 Mon Sep 17 00:00:00 2001 From: cb-Hades <81743695+cb-Hades@users.noreply.github.com> Date: Thu, 11 Jul 2024 09:37:47 +0200 Subject: [PATCH] Update docs #11, #5 --- docs/source/hqtb/hqtb-config.rst | 14 ++++++++++++++ docs/source/hqtb/run-pipeline.rst | 14 +++++++------- 2 files changed, 21 insertions(+), 7 deletions(-) diff --git a/docs/source/hqtb/hqtb-config.rst b/docs/source/hqtb/hqtb-config.rst index 377f371..129e5fa 100644 --- a/docs/source/hqtb/hqtb-config.rst +++ b/docs/source/hqtb/hqtb-config.rst @@ -4,6 +4,20 @@ Below, the configuration file with the underlying defaults, is shown. .. code-block:: yaml + # Configuration file for the SPECIMEN HQTB pipeline + + # Meaning of the default parameters: + # The value __USER__ indicates parameters required to be specified by the user + # The value USER indicates parameters required only in specific cases + + # Meta info: + # model: USER + # organism: USER + # date: USER + # author: USER + + # Input for the pipeline + # ---------------------- # Information about the genome to be used to generate the new model subject: diff --git a/docs/source/hqtb/run-pipeline.rst b/docs/source/hqtb/run-pipeline.rst index e76d6c6..9e76b4f 100644 --- a/docs/source/hqtb/run-pipeline.rst +++ b/docs/source/hqtb/run-pipeline.rst @@ -71,8 +71,7 @@ If you are just starting a new project and do not have all the data ready to go, | The function above creates the following directory structure for your project. | The 'contains' column lists what is supposed to be inside the according folder. - The tags manual/semi/automated report how these files are added to the folder (automated = by the setup function, manual = by the user). - ``TODO``: Was bedeutet semi? + The tags manual/semi/automated report how these files are added to the folder (automated = by the setup function; semi = multiple steps neccessary, some by the program, some by the user; manual = by the user). The tags required/optional report whether this input is necessary to run the pipeline or if it is an optional input. .. table:: @@ -103,8 +102,9 @@ If you are just starting a new project and do not have all the data ready to go, .. note:: - Regarding the annotated_genomes folder, the program currently only supports the file types ``GBFF`` and ``FAA`` + ``FNA``. - ``TODO``: Für welche Dateien in contains gilt das? + Regarding the annotated_genomes folder, the program currently only supports + the file types ``GBFF`` and ``FAA`` + ``FNA`` (from the NCBI and PROKKA annotation pipelines respectively) + as genome annotation formats. Further details for collecting the data: @@ -118,8 +118,7 @@ Further details for collecting the data: - One way to build a DIAMOND reference database is to download a set of reference sequences from the NCBI database, e.g. in the **FAA** format. - Use the function :code:`specimen.util.util.create_DIAMOND_db_from_folder('/User/path/input/directory', '/User/Path/for/output/', name = 'database', extention = 'faa')` to create a DIAMOND database - To speed up the mapping, create an additional mapping file from the e.g. ``GBFF`` files from NCBI using :code:`specimen.util.util.create_NCBIinfo_mapping('/User/path/input/directory', '/User/Path/for/output/', extention = 'gbff')` - - To ensure correct mapping to KEGG, an additional information file can be created by constructing a CSV file with the following columns: 'NCBI genome', 'organism', 'locus_tag' (start) and 'KEGG.organism' - ``TODO``: Was ist hier mit start gemeint? + - To ensure correct mapping to KEGG, an additional information file can be created by constructing a CSV file with the following columns: 'NCBI genome', 'organism', 'locus_tag' (only the part until the seperator '_', the part, that is the same for all locus tags) and 'KEGG.organism' - The information of the first three columns can be taken from the previous two steps while - For the last column the user needs to check, if the genomes have been entered into KEGG and have an organism identifier. @@ -127,7 +126,8 @@ Further details for collecting the data: - medium: - The media, either for analysis or gap filling can be entered into the pipeline via a config file (each). ``TODO``: Muss wirklich für jedes Medium eine neue Datei erstellt werden? + The media, either for analysis or gap filling can be entered into the pipeline via a config file. + The same media file can be used for both or one file for each step can be entered into the pipeline. The config files are from the `refineGEMs `__ :footcite:p:`bauerle2023genome` toolbox and access its in-build medium database. Additionally, the config files allow for manual adjustment / external input.