Ensure use of sentence-based line breaks in vignettes

openplantpathology · May 4, 2024 · 698b2b7 · 698b2b7
1 parent aea21b3
commit 698b2b7
Show file tree

Hide file tree

Showing 3 changed files with 42 additions and 41 deletions.
diff --git a/Hagis.Rproj b/Hagis.Rproj
@@ -15,3 +15,5 @@ LaTeX: pdfLaTeX
 BuildType: Package
 PackageUseDevtools: Yes
 PackageInstallArgs: --no-multiarch --with-keep.source
+
+MarkdownWrap: Sentence
diff --git a/vignettes/betadiversity.Rmd b/vignettes/betadiversity.Rmd
@@ -44,7 +44,7 @@ head(P_sojae_survey) # survey sample data
 head(sample_meta) # metatada about the sample collection locations
 ```
 
-This removes the "MPS17_" from the isolates, so that they will be read as numeric instead of character.
+This removes the "MPS17\_" from the isolates, so that they will be read as numeric instead of character.
 The next step removes the "Rps " from the gene names, so that they will be read as numeric instead of character.
 
 ```{r clean-data}
@@ -98,7 +98,7 @@ P_sojae_survey.matrix.jaccard <-
   vegdist(P_sojae_survey.matrix, "jaccard", na.rm = TRUE)
 ```
 
-After performing the principal coordinates analysis, we see that the scree plot says that about 70% of the variation in Jaccard distances are explained within the first two dimensions (_i.e._, axes).
+After performing the principal coordinates analysis, we see that the scree plot says that about 70% of the variation in Jaccard distances are explained within the first two dimensions (*i.e.*, axes).
 This is good.
 Usually a good rule of thumb is that if the second dimension is roughly half variation explained in the first dimension you don't need to look further at the third or n+1 dimensions.
 
@@ -177,7 +177,7 @@ These code are presented as an example for further downstream analysis that can
 
 In these examples we will artificially split the dataset into two, so that these analyses can be shown.
 When performing your own analyses you will likely have two geographic locations to compare already.
-Make sure you can differentiate these populations with the metadata file used previously (_i.e._, column in the dataset that specifies where the isolate came from; USA, Brazil, China, Australia, etc.).
+Make sure you can differentiate these populations with the metadata file used previously (*i.e.*, column in the dataset that specifies where the isolate came from; USA, Brazil, China, Australia, etc.).
 
 # Permutation Based Analysis of Variance (PERMANOVA) and Beta-dispersion Analyses
 
@@ -231,7 +231,7 @@ We can plot the dispersion for each group using the `plot()` function.
 As expected, since we have identified no significant differences, the two groups dispersion overlap a great deal and are not distinct from each other.
 Again this shows that pathotype dispersion between the groups is homogeneous and not different in this instance.
 
-If we were working with a data set that had groups with significantly different dispersion we would expect to see a significant ANOVA p-value (p < 0.05) as well as significance when using the Tukey HSD test.
+If we were working with a data set that had groups with significantly different dispersion we would expect to see a significant ANOVA p-value (p \< 0.05) as well as significance when using the Tukey HSD test.
 Lastly, the plotted dispersion will form distinct, separate, groups which can be observed.
 
 Differences in beta-dispersion may indicate separate pathotype groups which should be further investigated with Permutation Based Analysis of Variance (PERMANOVA) and Analysis of Similarity (ANOSIM) analysis.
@@ -248,16 +248,16 @@ pathotype.adonis <- adonis(P_sojae_survey.matrix.jaccard ~ groups)
 pathotype.adonis
 ```
 
-The PERMANOVA identified no significant differences between the groups centroids, or means (_p_ =  `r pathotype.adonis[[1]][[6]][[1]]`).
+The PERMANOVA identified no significant differences between the groups centroids, or means (*p* = `r pathotype.adonis[[1]][[6]][[1]]`).
 In addition to identifying significance between group centroids, the PERMANOVA also calculates how much of the variance can be explained by the specified groups (see the $R^2$ column in the PERMANOVA output).
-In this case, the $R^2$ is `r pathotype.adonis[[1]][[6]][[1]]`, so  `r round(pathotype.adonis[[1]][[6]][[1]], 3) * 100`% of the variance is explained by the groups used in analysis.
+In this case, the $R^2$ is `r pathotype.adonis[[1]][[6]][[1]]`, so `r round(pathotype.adonis[[1]][[6]][[1]], 3) * 100`% of the variance is explained by the groups used in analysis.
 Based on the PERMANOVA results we can conclude that these two groups are not different from each other and likely have similar pathotypes to each other.
 
 ## Analysis of Similarity (ANOSIM)
 
 ANOSIM statistic (R) ranges from between -1 and 1.
 Positive numbers suggest that there is more similarity within groups than there is between groups.
-Values close to zero indicate no difference between groups (_i.e._, similarities are the same between groups).
+Values close to zero indicate no difference between groups (*i.e.*, similarities are the same between groups).
 
 ```{r anosim}
 pathotype.anosim <- anosim(P_sojae_survey.matrix.jaccard, groups)
@@ -267,4 +267,4 @@ pathotype.anosim
 
 ANOSIM statistic (R) was `r pathotype.anosim$statistic`, so there are more similarities between groups than there are within groups.
 This is evidence that the groups are not different from one another.
-Likewise the significance is >0.05 so there is no significant difference between groups' similarities.
+Likewise the significance is \>0.05 so there is no significant difference between groups' similarities.
diff --git a/vignettes/hagis.Rmd b/vignettes/hagis.Rmd
@@ -21,7 +21,7 @@ data.table::setDTthreads(1L)
 
 ## Getting Started With {hagis}
 
-The following examples are based on a dataset from Michigan State University _Phytophthora sojae_ surveys for soybean phytophthora root rot pathotyping efforts.
+The following examples are based on a dataset from Michigan State University *Phytophthora sojae* surveys for soybean phytophthora root rot pathotyping efforts.
 
 First you'll want to load in your data set, for right now let's use a practice data set made for the {hagis} package, named `P_sojae_survey`.
 The data set is available in your R session automatically when you load the {hagis} package.
@@ -45,31 +45,31 @@ head(P_sojae_survey)
 ```
 
 This practice data set contains 21 isolates', `Isolate`, virulence data on a set of 14 differential soybean cultivars, `Line`.
-This package uses the _percentage_ of susceptible, inoculated, plants to determine effective resistance genes, pathotype diversity and frequency, as well as individual isolates pathotypes.
+This package uses the *percentage* of susceptible, inoculated, plants to determine effective resistance genes, pathotype diversity and frequency, as well as individual isolates pathotypes.
 
 To help ensure that the proper data are used in calculations, the user is asked to provide some information that instruct {hagis} about what data to use.
 
-***
+------------------------------------------------------------------------
 
 ## Function Arguments Used in {hagis}
 
 We have striven to make {hagis} as intuitive to use as possible.
 Part of that means that we have used the same arguments for the three main functions, `summarize_gene()`, `calculate_complexities()` and `calculate_diversities()`.
 Each of these functions take the same arguments:
 
-* `x` this is your data set name, _e.g._, `P_sojae_survey` from the example above, allows for the function to identify where it will be pulling these columns (and their associated row values) from to use (_i.e._ your data collection Excel spreadsheet)
+-   `x` this is your data set name, *e.g.*, `P_sojae_survey` from the example above, allows for the function to identify where it will be pulling these columns (and their associated row values) from to use (*i.e.* your data collection Excel spreadsheet)
 
-* `cutoff` this value sets the cutoff for susceptible reactions.
-For example, `cutoff = 60` means that all genes with 60% or more of the plants rated susceptible will be treated as susceptible.
-You can change this to whatever percentage you require for your study.
+-   `cutoff` this value sets the cutoff for susceptible reactions.
+    For example, `cutoff = 60` means that all genes with 60% or more of the plants rated susceptible will be treated as susceptible.
+    You can change this to whatever percentage you require for your study.
 
-* `control` specifies the value used in the `gene` column to denote a susceptible control used in the study
+-   `control` specifies the value used in the `gene` column to denote a susceptible control used in the study
 
-* `sample` specifies the column header for the column which identifies the isolates tested
+-   `sample` specifies the column header for the column which identifies the isolates tested
 
-* `gene` specifies the column header for the column which identifies the genes tested
+-   `gene` specifies the column header for the column which identifies the genes tested
 
-* `perc_susc` specifies the column header for the column which identifies the percent susceptible plants for each gene
+-   `perc_susc` specifies the column header for the column which identifies the percent susceptible plants for each gene
 
 Ordinarily you would use functions in {hagis} or other R packages like this:
 
@@ -100,11 +100,11 @@ hagis_args <- list(
 
 Now that we have a list of arguments, we can now save time entering the same data for each function and also avoid typos or entering different cutoff values, etc. between the functions.
 
-***
+------------------------------------------------------------------------
 
 ## Determination of Effective Resistance Genes
 
-Below is an example of tables and graphics that can be produced using the `summarize_gene()` function to identify effective resistance genes tested against the sampled _Phytophthora sojae_ population.
+Below is an example of tables and graphics that can be produced using the `summarize_gene()` function to identify effective resistance genes tested against the sampled *Phytophthora sojae* population.
 
 The `summarize_gene()` function allows you to produce a detailed table showing the number of virulent isolates (`N_virulent_isolates`), as well as offering a percentage of the isolates tested which are pathogenic on each gene (`percent_pathogenic`).
 
@@ -114,7 +114,7 @@ Rps.summary <- do.call(summarize_gene, hagis_args)
 Rps.summary
 ```
 
-Using the _pander_ library we can make the table much more attractive in RMarkdown.
+Using the *pander* library we can make the table much more attractive in RMarkdown.
 
 ```{r pander-print-Rps, echo=TRUE}
 library(pander)
@@ -136,23 +136,23 @@ autoplot(Rps.summary, type = "percentage")
 autoplot(Rps.summary, type = "count")
 ```
 
-***
+------------------------------------------------------------------------
 
 ## Pathotype Complexities
 
 Pathotype frequency, distribution as well as statistics such as mean pathotype complexity can be calculated using the `calculate_complexities()` function.
 This function will return a `list()` of two `data.table()` objects, `grouped_complexities` and `individual_complexities`.
 `grouped_complexities` returns a `list()` as a `data.table()` object showing the frequency and distribution of pathotype complexities for the sampled population.
 `individual_complexities()` returns a `list()` as a `data.table()` object showing each individual isolates pathotype complexity.
-An isolates pathotype complexity refers to the number of resistance genes that it is able to overcome and cause disease on, _i.e._, a pathotype complexity of "7" would mean that isolate can cause disease on 7 different resistance genes.
+An isolates pathotype complexity refers to the number of resistance genes that it is able to overcome and cause disease on, *i.e.*, a pathotype complexity of "7" would mean that isolate can cause disease on 7 different resistance genes.
 
 ```{r complexities, echo=TRUE, message=FALSE, warning=FALSE}
 complexities <- do.call(calculate_complexities, hagis_args)
 
 complexities
 ```
 
-Once again, using _pander_ we can make these tables much more attractive in RMarkdown.
+Once again, using *pander* we can make these tables much more attractive in RMarkdown.
 Since `complexities` is a `list()` object, we can refer to each object directly by name and print them as follows.
 
 ```{r pander-print-complexities}
@@ -179,7 +179,7 @@ autoplot(complexities, type = "percentage")
 autoplot(complexities, type = "count")
 ```
 
-***
+------------------------------------------------------------------------
 
 ## Diversity Indices, Frequency of Unique Pathotypes and Individual Isolate Pathotypes
 
@@ -189,21 +189,21 @@ Likewise, individual isolates' pathotypes, number of isolates used in the study,
 
 Five diversity indices are calculated when calling `calculate_diversities()`.
 
-* Simple diversity index, which will show the proportion of unique pathotypes to total samples.
-As the values gets closer to 1, there is greater diversity in pathoypes within the population.
-Simple diversity is calculated as: $$D = \frac{Np}{Ns}$$ where $Np$ is the number of pathotypes and $Ns$ is the number of samples.
+-   Simple diversity index, which will show the proportion of unique pathotypes to total samples.
+    As the values gets closer to 1, there is greater diversity in pathoypes within the population.
+    Simple diversity is calculated as: $$D = \frac{Np}{Ns}$$ where $Np$ is the number of pathotypes and $Ns$ is the number of samples.
 
-* Gleason diversity index, an alternate version of Simple diversity index, is less sensitive to sample size than the Simple index.
-$$D = \frac{ (Np - 1) }{ log(Ns)}$$ Where $Np$ is the number of pathotypes and $Ns$ is the number of samples.
+-   Gleason diversity index, an alternate version of Simple diversity index, is less sensitive to sample size than the Simple index.
+    $$D = \frac{ (Np - 1) }{ log(Ns)}$$ Where $Np$ is the number of pathotypes and $Ns$ is the number of samples.
 
-* Shannon diversity index is typically between 1.5 and 3.5, as richness and evenness of the population increase, so does the Shannon index value.
-$$D = -\sum_{i = 1}^{R} p_i \log p_i$$ Where $p_i$ is the proportional abundance of species $i$.
+-   Shannon diversity index is typically between 1.5 and 3.5, as richness and evenness of the population increase, so does the Shannon index value.
+    $$D = -\sum_{i = 1}^{R} p_i \log p_i$$ Where $p_i$ is the proportional abundance of species $i$.
 
-* Simpson diversity index values range from 0 to 1, 1 represents high diversity and 0 represents no diversity.
-Where diversity is calculated as: $$D = \sum_{i = 1}^{R} p_i^2$$
+-   Simpson diversity index values range from 0 to 1, 1 represents high diversity and 0 represents no diversity.
+    Where diversity is calculated as: $$D = \sum_{i = 1}^{R} p_i^2$$
 
-* Evenness ranges from 0 to 1, as the Evenness value approaches 1, there is a more even distribution of each pathotype's frequency within the population.
-Where Evenness is calculated as: $$D = \frac{H'}{log(Np)}$$ where $H'$ is the Shannon diversity index and $Np$ is the number of pathotypes.
+-   Evenness ranges from 0 to 1, as the Evenness value approaches 1, there is a more even distribution of each pathotype's frequency within the population.
+    Where Evenness is calculated as: $$D = \frac{H'}{log(Np)}$$ where $H'$ is the Shannon diversity index and $Np$ is the number of pathotypes.
 
 ```{r calculate-diversities, echo=TRUE}
 diversity <- do.call(calculate_diversities, hagis_args)
@@ -227,8 +227,7 @@ diversities_table(diversity)
 ```
 
 To generate a table of individual pathotypes, use `individual_pathotypes()`.
-Here again,
-{hagis} provides a `pander` object for ease of use.
+Here again, {hagis} provides a `pander` object for ease of use.
 
 ### Table of Individual Pathotypes
 
@@ -275,7 +274,7 @@ Rps.plot
 
 ### Make a Horizontal Plot
 
-If your _Rps_ gene names are too long, flipping the axis can make the graph more legible without rotating the x-axis labels.
+If your *Rps* gene names are too long, flipping the axis can make the graph more legible without rotating the x-axis labels.
 
 ```{r horizontal-plot}
 Rps.plot <- Rps.plot +
@@ -286,7 +285,7 @@ Rps.plot
 
 ### Use Colors in Autoplot Objects
 
-You can use named, _e.g._, "red", "yellow", "blue", colors in R or you can use custom hexadecimal color codes.
+You can use named, *e.g.*, "red", "yellow", "blue", colors in R or you can use custom hexadecimal color codes.
 Illustrated below is using Michigan State University (MSU) Green, hex code #18453b, using `theme_bw()` with a serif font.
 
 ```{r use-Colors}