README.Rmd

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "120%"
)
```

# topr

See full documentation at [https://wuxi-nextcode.github.io/topr/](https://wuxi-nextcode.github.io/topr/)

### Installing from github using devtools

```{r, eval=FALSE}
devtools::install_github("wuxi-nextcode/topr")
```

## Example
In this example we demonstrate the basic usage of the topr library.

### Load packages

First load the topr package, the tidyverse package is recommended in general, but not required for this example

```{r}
library(topr)
library(tidyverse)
library(ggrepel)
```


### Loading and exploring prebuilt datasets 

Load the gwas_CD dataset, which is a subset of association results (SNPs with P<1e-03) for Crohn´s disease from the UK biobank.

It is highly recommended to theck the number of datapoints in your dataset before you plot, since a very large dataset will take a long time to plot.

```{r}
paste("Number of SNPs in the dataset: [", length(CD_UKBB$POS),"]", sep = "")
```

### Manhattan plots

Get an overview of association results for crohn's disease (CD) in a Manhattan plot

```{r fig.width=10, fig.height=4.5}
manhattan(CD_UKBB)
```

### QQ plots

```{r fig.width=5, fig.height=4.5,eval=FALSE}
qqtopr(CD_UKBB,n_variants=length(CD_UKBB$POS))
```


### Label the top SNPs with the name of their nearest gene

Use the annotate argument in the manhattan function to label the top SNPs with p-values below the annotate threshold with their nearest gene

```{r fig.width=10, fig.height=4.5}
manhattan(CD_UKBB, annotate = 1e-09)
```

## Highlight genes of interest

```{r fig.width=10, fig.height=5}
manhattan(CD_UKBB, annotate = 1e-09, highlight_genes = c("NOD2","IL23R","JAK2"))
```

### View one chromsome only

Take a closer look at the results by chromosome. Here we plot the results on chromosome 7 only.

```{r fig.width=10, fig.height=4.5}
manhattan(CD_UKBB, annotate = 1e-09, chr = "7")
```

### Regionplot

Zoom in further on the chromosome plot with the regionplot function.

Zoom in on a gene of interest, e.g IKZF1:

```{r fig.width=10, fig.height=7}
regionplot(CD_UKBB, gene="IKZF1", annotate= 1e-09)
```

Zoom in on the top hit on a chromosome

```{r fig.width=10, fig.height=7}
CHR <- "chr1"
top_hit <- get_top_hit(CD_UKBB,chr=CHR)
regionplot(CD_UKBB, chr = CHR, xmin=top_hit$POS-250000 ,xmax= top_hit$POS+250000)
```

### Display multiple phenotypes/datasets on the same plot

Display the output from more than one GWAS on the same plot

### Manhattan multiple phenotypes

```{r fig.width=13, fig.height=6}
manhattan(list(CD_UKBB,CD_FINNGEN,UC_UKBB),legend_labels = c("CD UKBB","CD Finngen","UC UKBB"),title="IBD")
```


```{r fig.width=10, fig.height=6}
regionplot(list(CD_UKBB,CD_FINNGEN),gene="NOD2", annotate=c(1e-15, 1e-09), legend_labels = c("UKBB", "Finngen"), title="Crohn's disease (CD)")
```


```{r fig.width=13, fig.height=6}
manhattan(list(CD_UKBB,CD_FINNGEN,UC_UKBB),legend_labels = c("CD UKBB","CD Finngen","UC UKBB"),title="IBD")
```

#### The ntop argument

Use the ntop argument to set the number of datasets displayed at the top (default value is 3)

```{r fig.width=13, fig.height=8}
manhattan(list(CD_UKBB,CD_FINNGEN,UC_UKBB),ntop=2,legend_labels = c("CD UKBB","CD Finngen","UC UKBB"),title="IBD")
```

## Useful functions
```{r, eval=FALSE}
get_top_hit(CD_UKBB, chr="chr16")
dat1 <- get_best_snp_per_MB(CD_UKBB,thresh = 1e-07, region=1000000)
#get overlapping SNPS overlapping in two datasets
overlapping_snps <- dat1 %>% get_overlapping_snps_by_pos(CD_FINNGEN)

overlapping_snps_matched <- overlapping_snps %>% match_alleles()
overl_snps_matched_pos_allele_dat1 <- overlapping_snps_matched %>% flip_to_positive_allele_for_dat1()
snpset1 <- overl_snps_matched_pos_allele_dat1 %>% annotate_with_nearest_gene(protein_coding_only = T)

#or do all this in one go, by calling the create_snpset functon
snpset1 <- create_snpset(CD_FINNGEN, CD_UKBB, thresh = 1e-06)
snpset2 <- create_snpset(CD_UKBB, CD_FINNGEN, thresh= 1e-06)
```


```{r fig.width=14, fig.height=6, eval=FALSE}
e1 <- effect_plot(snpset1, pheno_x="CD Finngen", pheno_y="CD UKBB",color=get_topr_colors()[1], gene_label_thresh = 1)
e2 <- effect_plot(snpset2, pheno_x="CD UKBB", pheno_y="CD Finngen", color=get_topr_colors()[2], gene_label_thresh=1,annotate_with = "ID")
grid.arrange(e1,e2)
```


### Plot apperance: setting text sizes
```{r fig.width=12, fig.height=6}
manhattan(list(CD_UKBB,CD_FINNGEN), annotate=1e-09,axis_title_size = 20,axis_text_size = 16,label_size = 5, title_text_size = 16, legend_text_size = 20)
```

```{r fig.width=12, fig.height=7}
regionplot(list(CD_UKBB,CD_FINNGEN), gene="IKZF1", vline=50274703,title="CD UKBB", title_text_size = 16,axis_title_size = 20,axis_text_size = 20,legend_text_size = 20)
```

Setting alpha, size and shape