Skip to content

Commit

Permalink
Finished vingettes
Browse files Browse the repository at this point in the history
  • Loading branch information
ruthkr committed Mar 24, 2024
1 parent 1f57219 commit c681858
Show file tree
Hide file tree
Showing 4 changed files with 97 additions and 66 deletions.
24 changes: 15 additions & 9 deletions vignettes/data-requirement.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -24,21 +24,21 @@ knitr::include_graphics("figures/input_diagram.png")

## A single data frame input

The single data frame input or the data frame in the list needs to contain gene expression time-course data with all replicates. The illustrated diagram below shows the required structure of the `input`.
The single data frame input or the data frame in the list must contain gene expression time-course data with all replicates. The illustrative diagram below shows the required structure of the `input`.

```{r fig-input-table, echo=FALSE, fig.align='center', out.width='75%'}
knitr::include_graphics("figures/input_table.png")
```

This data frame must contain **reference** and **query** expression data which users wish to compare, and the following five columns:
This data frame must contain **reference** and **query** expression data, and the following five columns:

- `gene_id`: locus name or unique ID of each gene.
- `accession`: accession or name of the reference and query data to compare.
- `accession`: accession or name of the reference and query data.
- `timepoint`: time points of the gene expression data.
- `expression_value`: desired expression values or measure of the abundance of gene or transcripts which users wish to compare. This value can be RPM, RPKM, FPKM, TPM, TMM, DESeq, SCnorm, GeTMM, ComBat-Seq, and raw reads counts.
- `expression_value`: desired expression values or measure of the abundance of gene or transcripts. This value can be RPM, RPKM, FPKM, TPM, TMM, or raw read counts.
- `replicate`: biological replicate ID for an expression value at a particular time point.

Below we can see a real example of how the `input` data should look like:
The example below shows how the `input` data should look like:

```{r brapa-data-kable, echo=FALSE}
# Load a data frame from the sample data
Expand All @@ -51,13 +51,13 @@ b_rapa_data[, .SD[1:2], by = accession][, .(gene_id, accession, timepoint, expre

## A list of data frames as an input

If users do not have the data **reference** and **query** joined with the IDs mapped into one single data frame, there is an option of having the input data of a list of data frames. As shown in the illustrative diagram below, the list must contain both **reference** and **query** data frames with the columns as required in the single data frame input (see previous section).

```{r fig-input-list-tables, echo=FALSE, fig.align='center', out.width='85%'}
knitr::include_graphics("figures/input_list_tables.png")
```

If users do not have the data **reference** and **query** joined with the IDs mapped into one single data frame, there is an option of having the input data of a list of data frames. As shown in the illustrative diagram above, the list must contain both **reference** and **query** data frames with the columns as required in the single data frame input (see previous section).

Below we can see an example of how the `input` list of data frames should look like:
The example below shows how the `input` list of data frames should look like:

```{r brapa-data-list-df-read, include=FALSE}
# Load a data frame from the sample data
Expand Down Expand Up @@ -108,14 +108,18 @@ list_df
#> 6: AT2G45660 Col0 10 72.98476 ERR_ds_klepikova_SRR1661475
```

Note here that the elements of the list needs to be named **reference** and **query**, the order of the element will not effect the registration process.
Note here that the elements of the list must be named **reference** and **query**, the order of the element will not effect the registration process.

## A list of vectors as an input

As an alternative of a list of data frame as input data, as shown above, users can also have a list of numerical vectors as their input. The illustrative diagram below shows how the input should look like:

```{r fig-input-list-vectors, echo=FALSE, fig.align='center', out.width='38%'}
knitr::include_graphics("figures/input_list_vectors.png")
```

The example below shows how the `input` list of vectors can be defined as an input:

```{r brapa-data-list-num}
# Define expression value vectors
ref_expressions <- c(1.9, 3.1, 7.8, 31.6, 33.7, 31.5, 131.4, 107.5, 116.7, 112.5, 109.7, 57.4, 50.9)
Expand All @@ -128,3 +132,5 @@ list_vector <- list(
list_vector
```

Note that since there is no ID specified on the vectors, `greatR` will assign a unique ID for the reference and query pair data (see more in the [register data > Using other inputs ](https://ruthkr.github.io/greatR/articles/register-data.html#registration-results-1) article).
28 changes: 18 additions & 10 deletions vignettes/process-results.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,8 @@ registration_results <- system.file("extdata/brapa_arabidopsis_registration.rds"
The function `summary()` returns a list with S3 class `summary.res_greatR` containing four different objects:

- `summary` is a data frame containing the summary of the registration results (default S3 print).
- `registered_genes` is a vector of gene accessions or IDs which were successfully registered.
- `non_registered_genes` is a vector of non-registered gene accessions or IDs.
- `registered_genes` is a vector of gene IDs which are successfully registered.
- `non_registered_genes` is a vector of non-registered gene IDs.
- `reg_params` is a data frame containing the distribution of registration parameters.

```{r get-summary-results, fig.align='center'}
Expand All @@ -51,7 +51,7 @@ reg_summary$summary |>
knitr::kable()
```

The list of gene accessions which were registered or non-registered can be viewed by calling:
The list of gene IDs which are registered or non-registered can be viewed by calling:

```{r print-accession-of-registered-genes}
reg_summary$registered_genes
Expand All @@ -65,16 +65,24 @@ reg_summary$non_registered_genes

The function `plot()` allows users to plot the bivariate distribution of the registration parameters. Non-registered genes can be ignored by selecting `type = "registered"` instead of the default `type = "all"`. Similarly, the marginal distribution type can be changed from `type_dist = "histogram"` (default) to `type_dist = "density"`.

```{r plot-summary-results, fig.align='center', fig.height=4, fig.width=4.5, warning=FALSE}
```r
plot(
reg_summary,
type = "registered"
)
```

```{r plot-summary-results, echo=FALSE, fig.align='center', fig.height=4, fig.width=4.5, warning=FALSE}
plot(
reg_summary,
type = "registered",
scatterplot_size = c(4, 3.5)
)
```

## Plotting registration results

The function `plot()` allows users to plot the registration results of the genes of interest.
The function `plot()` allows users to plot the registration results of the genes of interest (by default only up to the first 25 genes are shown, for more control over this, use the `genes_list` argument).

```{r plot-results, fig.align='center', fig.height=8, fig.width=7, warning=FALSE}
# Plot registration result
Expand All @@ -84,24 +92,24 @@ plot(
)
```

Notice that the plot includes a label indicating if the particular genes are registered or non-registered, as well as the registration parameters in case the registration was successful.
Notice that the plot includes a label indicating if the particular genes are registered or non-registered, as well as the registration parameters in case the registration is successful.

For more details on the other function parameters, go to `plot()`.
For more details on the other function arguments, go to `plot()`.

## Analysing similarity of expression profiles over time before and after registering

### Calculate sample distance

After registering the data, users can compare the overall similarity between datasets before and after registering using the function `calculate_distance()`.
After registering the data, users can compare the overall similarity between datasets before and after registering using the function `calculate_distance()`. By default all genes are considered in this calculation, this can be changed by using the `genes_list` argument.

```{r get-sample-distance}
sample_distance <- calculate_distance(registration_results)
```

The function `calculate_distance()` returns a list with S3 class `dist_greatR` of two data frames:

- `result` distance between scaled reference and query expressions using registered time points.
- `original` distance between scaled reference and query expressions using original time points.
- `result` is the distance between scaled reference and query expressions using time points after registration.
- `original` is the distance between scaled reference and query expressions using original time points before registration.

### Plot heatmap of sample distances

Expand Down
18 changes: 9 additions & 9 deletions vignettes/register-data-manually.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,11 @@ knitr::opts_chunk$set(
)
```

This article will show users how to register data using some pre-defined shift and stretch parameters. This demo will use one of the genes from the sample data provided by the package. Users can check whether their sample data can be registered using the manually specified shift and stretch values.
This article will show users how to register data using some specified shift and stretch parameters. This demo will use one of the genes from the sample data provided by the package.

## Load sample data

`greatR` package provides an example of data frame containing two different species *A. thaliana* and *B. rapa* with two and three different replicates, respectively. This data frame can be read as follows:
`greatR` provides an example of data frame containing two different species *A. thaliana* and *B. rapa* with two and three different replicates, respectively. This data frame can be read as follows:

```{r load-greatR, message=FALSE}
# Load the package
Expand All @@ -34,11 +34,13 @@ b_rapa_data <- system.file("extdata/brapa_arabidopsis_data.csv", package = "grea

## Registering without optimisation

The illustrative table below shows the major differences between runing `register()` with and without optimisation.

```{r reg-data-manual, echo=FALSE, fig.align='center', out.width='65%'}
knitr::include_graphics("figures/registration_params_table.png")
```

Here, we will only use a single gene with `gene_id = "BRAA03G023790.3C"` from the sample data.
Here, we will only use a single gene with `gene_id = "BRAA03G023790.3C"` from the sample data, but this feature can also be used when registering multiple genes.

```{r get-subset-data, message=FALSE, warning=FALSE}
gene_BRAA03G023790.3C_data <- b_rapa_data[gene_id == "BRAA03G023790.3C"]
Expand Down Expand Up @@ -87,11 +89,11 @@ As we can see, using the given stretch and shift parameter above, the *B. rapa*

## Registering multiple gene with different pre-defined registration parameters

Users can also register multiple different genes using their pre-defined parameters. Similar to registration process above, users need to set `use_optimisation = FALSE` to disable the automated optimisation process.
Users can also specify a list of parameters rather than a single value. Similar to the registration process above, users need to set `use_optimisation = FALSE` to disable the automated optimisation process.

```{r register-multiple-gene-manual, message=FALSE, warning=FALSE, eval=FALSE}
registration_results <- register(
b_rapa_data,
gene_BRAA03G023790.3C_data,
reference = "Ro18",
query = "Col0",
scaling_method = "z-score",
Expand All @@ -100,13 +102,11 @@ registration_results <- register(
use_optimisation = FALSE
)
#> ── Validating input data ───────────────────────────────────────────────────────
#> ℹ Will process 10 genes.
#> ℹ Will process 1 gene.
#> ℹ Using estimated standard deviation, as no `exp_sd` was provided.
#> ℹ Using `scaling_method` = "z-score".
#>
#> ── Starting manual registration ────────────────────────────────────────────────
#> ℹ Using `overlapping_percent` = 50% as a registration criterion.
#> ✔ Applying registration for genes (10/10) [12.4s]
#> ✔ Applying registration for genes (1/1) [1.3s]
```

Similar to the previous registration process, users can analyse further the registration result by following the [processing registration results](https://ruthkr.github.io/greatR/articles/process-results.html) vignette.
Loading

0 comments on commit c681858

Please sign in to comment.