Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Covid #81

Open
wants to merge 5 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ Suggests:
roxygen2 (>= 3.0.0),
testthat (>= 2.1.0),
pkgdown (>= 0.1.0),
png,
assertthat
VignetteBuilder: knitr
Encoding: UTF-8
Expand Down
12 changes: 11 additions & 1 deletion _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,11 @@ articles:
- '`v1_introduction`'
- '`v2_data`'
- '`web_only/v21_singlecell`'
- '`web_only/v22_covid`'
- '`web_only/v3_basic_analysis`'
- '`web_only/load_mixcr`'
- '`web_only/load_10x`'
- '`web_only/load_sra`'
- '`web_only/v4_overlap`'
- '`web_only/v5_gene_usage`'
- '`web_only/v6_diversity`'
Expand All @@ -28,11 +30,17 @@ articles:
- '`web_only/v11_db`'
navbar:
structure:
left: [articles, reference, covid19]
left: [articles, reference, covid19, reproduced]
right: [im_link, twitter, github]
components:
home: ~
news: ~
reproduced:
text: "Reproduced Articles"
href: https://github.com/immunomind/reproduced
menu:
- text: 'COVID-19 Immune Repertoire Analysis'
href: articles/web_only/v22_covid.html
covid19:
text: "COVID-19"
href: https://github.com/immunomind/covid19
Expand All @@ -50,6 +58,8 @@ navbar:
href: articles/v2_data.html
- text: 'How-to: Loading MiXCR Data'
href: articles/web_only/load_mixcr.html
- text: 'How-to: Working with Data from SRA'
href: articles/web_only/load_sra.html
- text: 'How-to: Loading 10x Genomics Data'
href: articles/web_only/load_10x.html
- text: 'How-to: Single-cell and paired chain data'
Expand Down
Binary file added vignettes/images/2018x15_pub.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/images/37x15_pub.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/images/biologicalreplicates_pub.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/images/cdr3lengths.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/images/clonotype_abundances.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/images/clonotype_tracking.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/images/dataformatgraphic.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/images/distrib_public_clonotypes.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/images/experimentpipeline.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/images/kmers_distrib_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/images/kmers_vis.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/images/kmersdistrib_full.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/images/num_unique_clonotypes.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/images/pipe-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/images/pipe-3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/images/pipe-4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/images/pipeline-wide.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/images/public_clonotype_pca_female.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/images/rare_clonal_prop.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/images/relative_abundance.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/images/sra-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/images/sra-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/images/sra-3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added vignettes/images/topclonalprop.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
129 changes: 129 additions & 0 deletions vignettes/web_only/load_sra.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
---
title: "Working with Data from SRA"
author: '<a href="https://immunomind.io">ImmunoMind</a>'
date: "[email protected]"
output:
html_document:
fig_height: 8
fig_width: 10
theme: spacelab
toc: yes
pdf_document:
toc: yes
word_document:
toc: yes
---


<!--
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{Working with Data from SRA}
%\VignettePackage{immunarch}
-->


```{r setup, include=FALSE, echo=FALSE}
# knitr::knit_hooks$set(optipng = knitr::hook_optipng)
# knitr::opts_chunk$set(optipng = '-o7')

knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(fig.align = "center")
knitr::opts_chunk$set(fig.width = 12)
knitr::opts_chunk$set(fig.height = 5)

library(immunarch)

embed_png <- function(path, dpi = NULL) {
meta <- attr(png::readPNG(path, native = TRUE, info = TRUE), "info")
if (!is.null(dpi)) meta$dpi <- rep(dpi, 2)
knitr::asis_output(paste0(
"<img src='", path, "'",
" width=", round(meta$dim[1] / (meta$dpi[1] / 96)),
" height=", round(meta$dim[2] / (meta$dpi[2] / 96)),
" />"
))
}
knitr::opts_chunk$set(comment = "#>", collapse = TRUE)
```

This how-to describes how to download raw read data from the <a href="https://www.ncbi.nlm.nih.gov/sra">Sequence Read Archive</a> for immune repertoire analysis. The Sequence Read Archive (SRA, or Short Read Archive) is a bioinformatics database that provides a public repository for DNA sequencing data, especially the "short reads" generated by high-throughput sequencing.

As an example we'll use data from <a href= "https://doi.org/10.1101/2020.05.18.100545">this</a> longitudinal study of T-cell dynamics in COVID-19. We also demonstrate specific settings necesssary when you are using a cloud instance like AWS. By the end you'll be ready to process it with MIXCR or other methods to pre-process your data for analysis.

# Setting up tools for SRA

First, follow the instructions <a href="https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc">here</a> to install the tools for SRA.

In this example, we are using Linux, but you can follow the instructions for other OS at the link above.

First download the `.tar` file and unzip it. Run the config command to configure the cache directory where your data will be downloaded. A menu will appear like the one below.

```
curl -O https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/2.10.7/sratoolkit.2.10.7-ubuntu64.tar.gz
tar -xzf sratoolkit.2.10.7-ubuntu64.tar.gz
cd grace/sratoolkit.2.10.7-ubuntu64/bin
./vdb-config -i
```

Configure the cache directory where your data will be downloaded. I created a folder called `rawdata` under my working directory.

```{r, echo = FALSE}
embed_png("../images/sra-1.png")
```


If you are using a cloud instance, select the corresponding one and check report `cloud instance identity`.

```{r, echo = FALSE}
embed_png("../images/sra-2.png")
```

# Select Data for Batch Download from SRA

Usually you will want to download a batch of runs and not just each run individually. We can use the Run Selector from Short Read Archive to specify subsets of the data to download.

If you are following along from our example data for the COVID data, you can find the full dataset in NCBI's Run Selector <a href="https://www.ncbi.nlm.nih.gov/Traces/study/?query_key=2&WebEnv=NCID_1_41090550_130.14.22.76_5555_1592686230_2418810784_0MetA0_S_HStore&o=acc_s%3Aa">here</a>.

Select the subset of the dataset that you want to analyze. I chose to start with four samples:

```{r, echo = FALSE}
embed_png("../images/sra-3.png")
```

Click on `Selected` → `Accession List` to download the text file for your batch download.

# Download from SRA

Next, we'll be downloading the data from SRA using the toolkit that we installed earlier.

The two commands that you will be using from the SRA toolkit are:

- <a href="https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=prefetch">`prefetch`</a>: fetch the .sra files for individual runs. `prefetch` documentation can be found <a href="https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=prefetch">here</a>.

- <a href="https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=fastq-dump">`fastq-dump`</a>: process the .sra files into fastq files. `fastq` documentation can be found <a href="https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=fastq-dump">here</a>.

First, let's download the .sra files. Use the cart.txt matching the accession list you just downloaded from Run Selector.

```
`./prefetch --option-file cart.txt`
```

Next, use fastq-dump to turn SRA files into fastq files.

```
./fastq-dump ~/grace/rawdata/sra/SRR* --outdir ~/grace/rawdata/fastq
```

If you want to process everything in batch, you can use a bash script similar to the one below. This bash script runs the `fastq-dump` command on the list of files from your `cart.txt` file.
```
CART='cart.txt'
echo $CART
ALL_LINES=$(cat $CART)
for sra in $ALL_LINES;
do
./fastq-dump $sra --outdir /data/grace/rawdata/fastq
done
```

# Next steps
Congrats! Now your data is ready to be processed. Follow our MiXCR tutorial [here](https://immunarch.com/articles/web_only/load_mixcr.html) to prepare your data for analysis.
Loading