Merge pull request #29 from Linardos/main

Clarified docs and fixed typo
mlcommons · Feb 4, 2025 · 89c660a · 89c660a
2 parents 55c4009 + ee1cd25
commit 89c660a
Show file tree

Hide file tree

Showing 2 changed files with 53 additions and 10 deletions.
diff --git a/docs/customize.md b/docs/customize.md
@@ -25,7 +25,7 @@ Model configuration is expected to be in the following format:
 ```yaml
 model_config:
     model_name:  # Name of the model to use
-    labeling_paradigm:  # Labeling paradigm for the model, either 'unlabeled', 'patient', or 'custom'
+    labeling_paradigm:  # Labeling paradigm for the model, either 'unlabeled', 'patient', or 'custom'. Read in the "Custom-Labels" section for clarification on these three.
     architecture:  # Architecture of the model, customizing given model. Specifics are defined in the config of the given model.
     losses: # Loss functions to use (see below).
         - name:  # Name of the loss function
@@ -53,6 +53,50 @@ model_config:
     # This model config can support additional parameters that are specific to the model, for example:
     - save_eval_images_every_n_epochs:  # Save evaluation images every n epochs, useful to assess training progress of generative models. Implemented in i.e. DCGAN.
 ```
+Regarding the "labeling_paradigm"
+### Custom Labels
+
+Custom labels are defined based on the folder structure specified by the user. For example, you can create directories such as `0`, `1`, `2`, etc., where each number represents a distinct class. These classes are arbitrary and can be assigned as per the user's requirements. Inside each class folder, you should organize **patient-specific subfolders** containing the corresponding data.
+
+**Example Structure:**
+```plaintext
+/data
+  ├── 0
+  │    ├── patient_001
+  │    └── patient_002
+  ├── 1
+  │    ├── patient_003
+  │    └── patient_004
+  └── 2
+       ├── patient_005
+       └── patient_006
+```
+### Patient-Level Labels
+
+For patient-level labels, the folder structure consists of a main directory where each subfolder represents a specific patient. In this setup, each patient is treated as a separate class, similar to how labeling is handled in GANDLF.
+
+**Example Structure:**
+```plaintext
+/data
+  ├── patient_001
+  ├── patient_002
+  ├── patient_003
+  └── patient_004
+```
+
+### Unlabeled Data
+
+For unlabeled data, you are required to maintain a patient-wise folder structure, similar to the patient-level labeling setup. However, in this case, class labels are ignored. The data is simply loaded without any label association.
+
+**Example Structure:**
+```plaintext
+/data
+  ├── patient_001
+  ├── patient_002
+  ├── patient_003
+  └── patient_004
+```
+
 
 ## Optimizers
 GaNDLF-Synth interfaces GaNDLF core framework for optimizers. See the [optimizers directory](https://github.com/mlcommons/GaNDLF/blob/master/GANDLF/optimizers/__init__.py) for available optimizers. They support optimizer-specific configurable parameters, interfacing [Pytorch Optimizers](https://pytorch.org/docs/stable/optim.html).

diff --git a/docs/usage.md b/docs/usage.md
@@ -1,14 +1,14 @@
 ## Introduction
 
-The usual DL workflow consists of the following steps:
+To train generative DL models, the usual workflow consists of the following steps:
 
 1. Prepare the data
 2. Split data into training, validation, and testing
 3. Customize the training parameters
 4. Train the model
 5. Perform inference
 
-GaNDLF-Synth supports all of these steps. Some of the steps are treated as optional due to the nature of the generation tasks. For example, sometimes you do not want to split the data into training, validation, and testing, but rather use all the data for training. GaNDLF-Synth provides the necessary tools to perform these tasks, using both custom features and the ones provided by GaNDLF. We describe all the functionalities in the following sections. For more details on the functionalities directly from GaNDLF, please refer to the [GaNDLF documentation](https://docs.mlcommons.org/GaNDLF).
+GaNDLF-Synth supports all of these steps, but some are optional depending on the nature of the generation task. For example, sometimes it's preferred to avoid splitting the data, and instead use the whole dataset for training. GaNDLF-Synth provides the necessary tools to perform these tasks, using both custom features and the ones provided by GaNDLF. We describe all the functionalities in the following sections. For more details on the functionalities specific to GaNDLF, refer to the [GaNDLF documentation](https://docs.mlcommons.org/GaNDLF).
 
 ## Installation
 
@@ -17,7 +17,7 @@ Please follow the [installation instructions](./setup.md#installation) to instal
 ## Preparing the Data
 ## Constructing the Data CSV
 
-This application can leverage multiple channels/modalities for training while using a multi-class segmentation file. The expected format is shown in example CSVs in [samples directory](https://github.com/mlcommons/GaNDLF-Synth/blob/main/samples) for both labeled and unlabeled data. The CSV file needs to be structured with the following header format (which shows a CSV with `N` subjects, each having `X` channels/modalities that need to be processed):
+GaNDLF-Synth can leverage multiple channels/modalities for training using a multi-class segmentation file. The expected format is shown in example CSVs in [samples directory](https://github.com/mlcommons/GaNDLF-Synth/blob/main/samples) for both labeled and unlabeled data. The CSV file needs to be structured with the following header format (which shows a CSV with `N` subjects, each having `X` channels/modalities that need to be processed):
 
 #### Unlabeled Data
 ```csv
@@ -43,11 +43,11 @@ $ROOT-PATH-TO-DATA-FOLDER/$CLASS-FOLDER-NAME-2/3/t2w.nii.gz,$ROOT-PATH-TO-DATA-F
 
 **Notes:**
 
-- For labeled data, the CSV has additonal columns for the labels assigned to given set of channels. It also has a column for the label mapping, showing the class name assigned to the label value.
+- For labeled data, the CSV needs to have additonal columns to include the labels corresponding to a given set of channels. It also needs a column for the label mapping, showing the class name assigned to the label value.
 
 ### Using the `gandlf-synth construct-csv` command
 
-To make the process of creating the CSV easier, we have provided a `gandlf-synth construct-csv` command. The data has to be arranged in different formats, depeinding on labeling paradigm. Modality names are used as examples.
+To make the process of creating the CSV easier, we have provided a `gandlf-synth construct-csv` command. The data has to be arranged in different formats, depending on the labeling paradigm. Modality names are used as examples.
 
 #### Unlabeled Data
 
@@ -120,8 +120,7 @@ $DATA_DIRECTORY
 
 #### Per Patient Labeled Data
 
-The structure is similar to the unlabeled data, the labels are assigned per patient. 
-
+The folder structure is the same as that of unlabeled data. In the csv, the labels are assigned per patient. 
 
 The following command shows how the script works:
 
@@ -152,7 +151,7 @@ Adapting GaNDLF to your needs boils down to modifying a YAML-based configuration
 
 
 
-## Running GaNDLF (Training/Inference)
+## Running GaNDLF-Synth (Training/Inference)
 
 You can use the following code snippet to run GaNDLF:
 
@@ -164,7 +163,7 @@ You can use the following code snippet to run GaNDLF:
   -c ./experiment_0/model.yaml \ # model configuration - needs to be a valid YAML (check syntax using https://yamlchecker.com/)
   -dt ./experiment_0/train.csv \ # main data CSV used for training (or inference if performing image-to-image reconstruction)
   -m-dir ./experiment_0/model_dir/ \ # model directory where the output of the training will be stored, created if not present
-  --t \ # enable training (if not enabled, inference is performed)
+  -t \ # enable training (if not enabled, inference is performed)
   # -v-csv ./experiment_0/val.csv \ # [optional] validation data CSV (if the model performs validation step)
   # -t-csv ./experiment_0/test.csv \ # [optional] testing data CSV (if the model performs testing step)
   # -vr 0.1 \ # [optional] ratio of validation data to extract from the training data CSV. If -v-csv flag is set, this is ignored