From 3f90707c11bea184e3f8bf22bdb66dd563287248 Mon Sep 17 00:00:00 2001 From: Samuel Westreich Date: Tue, 15 Aug 2017 10:43:51 -0700 Subject: [PATCH] website updates --- docs/index.md | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/docs/index.md b/docs/index.md index 920007d..baddfd5 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,8 +1,9 @@ Version 2 of the SAMSA pipeline - faster! Lighter! More options! Less waiting! ### New in version 2: +* Cluster-oriented design, now built to run on a supercomputing cluster instead of using external annotation systems. * DIAMOND integration, allowing for SAMSA to be run without ever needing an MG-RAST account. -* Option to annotate against custom databases. +* Option to annotate against custom databases, created as DIAMOND references. * Better, more polished R scripts that can be executed from the command line. * PCA plots and other graphical outputs. * Filtering of ribosomes for even more speed. @@ -10,19 +11,19 @@ Version 2 of the SAMSA pipeline - faster! Lighter! More options! Less waiting ## Quick start 1. Clone or download the following programs: - 1. SAMSA, version 2.0: https://github.com/transcript/samsa_v2 - 2. DIAMOND, version 0.8.3: https://github.com/bbuchfink/diamond - 3. Trimmomatic, a flexible read cleaner: http://www.usadellab.org/cms/?page=trimmomatic - 4. PEAR, if using paired-end data (recommended): https://sco.h-its.org/exelixis/web/software/pear/ - 5. SortMeRNA: http://bioinfo.lifl.fr/RNA/sortmerna/ -2. Make changes to the master_script.bash, which performs the first 3 of 4 steps in the SAMSA pipeline (preprocessing, annotation, aggregation) + 1. SAMSA, version 2.0: [https://github.com/transcript/samsa_v2](https://github.com/transcript/samsa_v2) + 2. DIAMOND, version 0.8.3: [https://github.com/bbuchfink/diamond](https://github.com/bbuchfink/diamond) + 3. Trimmomatic, a flexible read cleaner: [http://www.usadellab.org/cms/?page=trimmomatic](http://www.usadellab.org/cms/?page=trimmomatic) + 4. PEAR, if using paired-end data (recommended): [https://sco.h-its.org/exelixis/web/software/pear/](https://sco.h-its.org/exelixis/web/software/pear/) + 5. SortMeRNA: [http://bioinfo.lifl.fr/RNA/sortmerna/](http://bioinfo.lifl.fr/RNA/sortmerna/) +2. Make changes to the master_script.bash, which performs all of the main steps in the SAMSA pipeline (preprocessing, annotation, aggregation, and DESeq-enabled statistical analysis) 3. If not using master_script, use DIAMOND to annotate your reads against a database of your choosing (note that database must be local and DIAMOND-indexed). See "example\_DIAMOND\_annotation\_script.bash" for more details. 4. If not using master_script, use "DIAMOND\_analysis\_counter.py" to create a ranked abundance summary of the DIAMOND results from each metatransciptome file. 5. Import these abundance summaries into R and use "run\_DESeq\_stats.R" to determine the most significantly differing features between either individual metatranscriptomes, or control vs. experimental groups. -## SAMSA: Simple Analysis of Metatranscriptomes by Sequence Annotation -Metatranscriptome, RNA-seq data from multiple members of a microbial community, offers incredibly powerful insights into the workings of a complex ecosystem. RNA sequences are able to not only identify the individual members of a community down to the strain level, but can also provide information on the activity of these microbes at the time of sample collection - something that cannot be determined through other meta- (metagenome, 16S rRNA sequencing) method. +## Background +Metatranscriptomics - RNA-seq data from multiple members of a microbial community - offers incredibly powerful insights into the workings of a complex ecosystem. RNA sequences are able to not only identify the individual members of a community down to the strain level, but can also provide information on the activity of these microbes at the time of sample collection - something that cannot be determined through other meta- (metagenome, 16S rRNA sequencing) method. However, working with metatranscriptome data often proves challenging, given its high complexity and large size. SAMSA is one of the first bioinformatics pipelines designed with metatranscriptome data specifically in mind. It accepts raw sequence data in FASTQ form as its input, and performs four phases: