Finished vingettes

ruthkr · Mar 24, 2024 · c681858 · c681858
1 parent 1f57219
commit c681858
Show file tree

Hide file tree

Showing 4 changed files with 97 additions and 66 deletions.
diff --git a/vignettes/data-requirement.Rmd b/vignettes/data-requirement.Rmd
@@ -24,21 +24,21 @@ knitr::include_graphics("figures/input_diagram.png")
 
 ## A single data frame input
 
-The single data frame input or the data frame in the list needs to contain gene expression time-course data with all replicates. The illustrated diagram below shows the required structure of the `input`.
+The single data frame input or the data frame in the list must contain gene expression time-course data with all replicates. The illustrative diagram below shows the required structure of the `input`.
 
 ```{r fig-input-table, echo=FALSE, fig.align='center', out.width='75%'}
 knitr::include_graphics("figures/input_table.png")
 ```
 
-This data frame must contain **reference** and **query** expression data which users wish to compare, and the following five columns:
+This data frame must contain **reference** and **query** expression data, and the following five columns:
 
 - `gene_id`: locus name or unique ID of each gene.
-- `accession`: accession or name of the reference and query data to compare.
+- `accession`: accession or name of the reference and query data.
 - `timepoint`: time points of the gene expression data.
-- `expression_value`: desired expression values or measure of the abundance of gene or transcripts which users wish to compare. This value can be RPM, RPKM, FPKM, TPM, TMM, DESeq, SCnorm, GeTMM, ComBat-Seq, and raw reads counts.
+- `expression_value`: desired expression values or measure of the abundance of gene or transcripts. This value can be RPM, RPKM, FPKM, TPM, TMM, or raw read counts.
 - `replicate`: biological replicate ID for an expression value at a particular time point.
 
-Below we can see a real example of how the `input` data should look like:
+The example below shows how the `input` data should look like:
 
 ```{r brapa-data-kable, echo=FALSE}
 # Load a data frame from the sample data
@@ -51,13 +51,13 @@ b_rapa_data[, .SD[1:2], by = accession][, .(gene_id, accession, timepoint, expre
 
 ## A list of data frames as an input
 
+If users do not have the data **reference** and **query** joined with the IDs mapped into one single data frame, there is an option of having the input data of a list of data frames. As shown in the illustrative diagram below, the list must contain both **reference** and **query** data frames with the columns as required in the single data frame input (see previous section). 
+
 ```{r fig-input-list-tables, echo=FALSE, fig.align='center', out.width='85%'}
 knitr::include_graphics("figures/input_list_tables.png")
 ```
 
-If users do not have the data **reference** and **query** joined with the IDs mapped into one single data frame, there is an option of having the input data of a list of data frames. As shown in the illustrative diagram above, the list must contain both **reference** and **query** data frames with the columns as required in the single data frame input (see previous section). 
-
-Below we can see an example of how the `input` list of data frames should look like:
+The example below shows how the `input` list of data frames should look like:
 
 ```{r brapa-data-list-df-read, include=FALSE}
 # Load a data frame from the sample data
@@ -108,14 +108,18 @@ list_df
 #> 6: AT2G45660      Col0        10         72.98476 ERR_ds_klepikova_SRR1661475
 ```
 
-Note here that the elements of the list needs to be named **reference** and **query**, the order of the element will not effect the registration process. 
+Note here that the elements of the list must be named **reference** and **query**, the order of the element will not effect the registration process. 
 
 ## A list of vectors as an input
 
+As an alternative of a list of data frame as input data, as shown above, users can also have a list of numerical vectors as their input. The illustrative diagram below shows how the input should look like:
+
 ```{r fig-input-list-vectors, echo=FALSE, fig.align='center', out.width='38%'}
 knitr::include_graphics("figures/input_list_vectors.png")
 ```
 
+The example below shows how the `input` list of vectors can be defined as an input: 
+
 ```{r brapa-data-list-num}
 # Define expression value vectors
 ref_expressions <- c(1.9, 3.1, 7.8, 31.6, 33.7, 31.5, 131.4, 107.5, 116.7, 112.5, 109.7, 57.4, 50.9)
@@ -128,3 +132,5 @@ list_vector <- list(
 
 list_vector
 ```
+
+Note that since there is no ID specified on the vectors, `greatR` will assign a unique ID for the reference and query pair data (see more in the [register data > Using other inputs ](https://ruthkr.github.io/greatR/articles/register-data.html#registration-results-1) article).
diff --git a/vignettes/process-results.Rmd b/vignettes/process-results.Rmd
@@ -39,8 +39,8 @@ registration_results <- system.file("extdata/brapa_arabidopsis_registration.rds"
 The function `summary()` returns a list with S3 class `summary.res_greatR` containing four different objects:
 
 - `summary` is a data frame containing the summary of the registration results (default S3 print).
-- `registered_genes` is a vector of gene accessions or IDs which were successfully registered.
-- `non_registered_genes` is a vector of non-registered gene accessions or IDs.
+- `registered_genes` is a vector of gene IDs which are successfully registered.
+- `non_registered_genes` is a vector of non-registered gene IDs.
 - `reg_params` is a data frame containing the distribution of registration parameters.
 
 ```{r get-summary-results, fig.align='center'}
@@ -51,7 +51,7 @@ reg_summary$summary |>
   knitr::kable()
 ```
 
-The list of gene accessions which were registered or non-registered can be viewed by calling:
+The list of gene IDs which are registered or non-registered can be viewed by calling:
 
 ```{r print-accession-of-registered-genes}
 reg_summary$registered_genes
@@ -65,16 +65,24 @@ reg_summary$non_registered_genes
 
 The function `plot()` allows users to plot the bivariate distribution of the registration parameters. Non-registered genes can be ignored by selecting `type = "registered"` instead of the default `type = "all"`. Similarly, the marginal distribution type can be changed from `type_dist = "histogram"` (default) to `type_dist = "density"`.
 
-```{r plot-summary-results, fig.align='center', fig.height=4, fig.width=4.5, warning=FALSE}
+```r
 plot(
   reg_summary,
   type = "registered"
 )
 ```
 
+```{r plot-summary-results, echo=FALSE, fig.align='center', fig.height=4, fig.width=4.5, warning=FALSE}
+plot(
+  reg_summary,
+  type = "registered",
+  scatterplot_size = c(4, 3.5)
+)
+```
+
 ## Plotting registration results
 
-The function `plot()` allows users to plot the registration results of the genes of interest.
+The function `plot()` allows users to plot the registration results of the genes of interest (by default only up to the first 25 genes are shown, for more control over this, use the `genes_list` argument).
 
 ```{r plot-results, fig.align='center', fig.height=8, fig.width=7, warning=FALSE}
 # Plot registration result
@@ -84,24 +92,24 @@ plot(
 )
 ```
 
-Notice that the plot includes a label indicating if the particular genes are registered or non-registered, as well as the registration parameters in case the registration was successful.
+Notice that the plot includes a label indicating if the particular genes are registered or non-registered, as well as the registration parameters in case the registration is successful.
 
-For more details on the other function parameters, go to `plot()`.
+For more details on the other function arguments, go to `plot()`.
 
 ## Analysing similarity of expression profiles over time before and after registering
 
 ### Calculate sample distance
 
-After registering the data, users can compare the overall similarity between datasets before and after registering using the function `calculate_distance()`.
+After registering the data, users can compare the overall similarity between datasets before and after registering using the function `calculate_distance()`. By default all genes are considered in this calculation, this can be changed by using the `genes_list` argument.
 
 ```{r get-sample-distance}
 sample_distance <- calculate_distance(registration_results)
 ```
 
 The function `calculate_distance()` returns a list with S3 class `dist_greatR` of two data frames:
 
-- `result` distance between scaled reference and query expressions using registered time points.
-- `original` distance between scaled reference and query expressions using original time points.
+- `result` is the distance between scaled reference and query expressions using time points after registration.
+- `original` is the distance between scaled reference and query expressions using original time points before registration.
 
 ### Plot heatmap of sample distances
 

diff --git a/vignettes/register-data-manually.Rmd b/vignettes/register-data-manually.Rmd
@@ -14,11 +14,11 @@ knitr::opts_chunk$set(
 )
 ```
 
-This article will show users how to register data using some pre-defined shift and stretch parameters. This demo will use one of the genes from the sample data provided by the package. Users can check whether their sample data can be registered using the manually specified shift and stretch values.
+This article will show users how to register data using some specified shift and stretch parameters. This demo will use one of the genes from the sample data provided by the package.
 
 ## Load sample data
 
-`greatR` package provides an example of data frame containing two different species *A. thaliana* and *B. rapa* with two and three different replicates, respectively. This data frame can be read as follows:
+`greatR` provides an example of data frame containing two different species *A. thaliana* and *B. rapa* with two and three different replicates, respectively. This data frame can be read as follows:
 
 ```{r load-greatR, message=FALSE}
 # Load the package
@@ -34,11 +34,13 @@ b_rapa_data <- system.file("extdata/brapa_arabidopsis_data.csv", package = "grea
 
 ## Registering without optimisation
 
+The illustrative table below shows the major differences between runing `register()` with and without optimisation.
+
 ```{r reg-data-manual, echo=FALSE, fig.align='center', out.width='65%'}
 knitr::include_graphics("figures/registration_params_table.png")
 ```
 
-Here, we will only use a single gene with `gene_id = "BRAA03G023790.3C"` from the sample data.
+Here, we will only use a single gene with `gene_id = "BRAA03G023790.3C"` from the sample data, but this feature can also be used when registering multiple genes.
 
 ```{r get-subset-data, message=FALSE, warning=FALSE}
 gene_BRAA03G023790.3C_data <- b_rapa_data[gene_id == "BRAA03G023790.3C"]
@@ -87,11 +89,11 @@ As we can see, using the given stretch and shift parameter above, the *B. rapa*
 
 ## Registering multiple gene with different pre-defined registration parameters
 
-Users can also register multiple different genes using their pre-defined parameters. Similar to registration process above, users need to set `use_optimisation = FALSE` to disable the automated optimisation process.
+Users can also specify a list of parameters rather than a single value. Similar to the registration process above, users need to set `use_optimisation = FALSE` to disable the automated optimisation process.
 
 ```{r register-multiple-gene-manual, message=FALSE, warning=FALSE, eval=FALSE}
 registration_results <- register(
-  b_rapa_data,
+  gene_BRAA03G023790.3C_data,
   reference = "Ro18",
   query = "Col0",
   scaling_method = "z-score",
@@ -100,13 +102,11 @@ registration_results <- register(
   use_optimisation = FALSE
 )
 #> ── Validating input data ───────────────────────────────────────────────────────
-#> ℹ Will process 10 genes.
+#> ℹ Will process 1 gene.
 #> ℹ Using estimated standard deviation, as no `exp_sd` was provided.
 #> ℹ Using `scaling_method` = "z-score".
 #>
 #> ── Starting manual registration ────────────────────────────────────────────────
 #> ℹ Using `overlapping_percent` = 50% as a registration criterion.
-#> ✔ Applying registration for genes (10/10) [12.4s]
+#> ✔ Applying registration for genes (1/1) [1.3s]
 ```
-
-Similar to the previous registration process, users can analyse further the registration result by following the [processing registration results](https://ruthkr.github.io/greatR/articles/process-results.html) vignette.