Skip to content

Commit

Permalink
website updates
Browse files Browse the repository at this point in the history
  • Loading branch information
Samuel Westreich committed Aug 15, 2017
1 parent b42cdd9 commit 3f90707
Showing 1 changed file with 10 additions and 9 deletions.
19 changes: 10 additions & 9 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,29 @@
Version 2 of the SAMSA pipeline - faster! Lighter! More options! Less waiting!

### New in version 2:
* Cluster-oriented design, now built to run on a supercomputing cluster instead of using external annotation systems.
* DIAMOND integration, allowing for SAMSA to be run without ever needing an MG-RAST account.
* Option to annotate against custom databases.
* Option to annotate against custom databases, created as DIAMOND references.
* Better, more polished R scripts that can be executed from the command line.
* PCA plots and other graphical outputs.
* Filtering of ribosomes for even more speed.
* And more!

## Quick start
1. Clone or download the following programs:
1. SAMSA, version 2.0: https://github.com/transcript/samsa_v2
2. DIAMOND, version 0.8.3: https://github.com/bbuchfink/diamond
3. Trimmomatic, a flexible read cleaner: http://www.usadellab.org/cms/?page=trimmomatic
4. PEAR, if using paired-end data (recommended): https://sco.h-its.org/exelixis/web/software/pear/
5. SortMeRNA: http://bioinfo.lifl.fr/RNA/sortmerna/
2. Make changes to the master_script.bash, which performs the first 3 of 4 steps in the SAMSA pipeline (preprocessing, annotation, aggregation)
1. SAMSA, version 2.0: [https://github.com/transcript/samsa_v2](https://github.com/transcript/samsa_v2)
2. DIAMOND, version 0.8.3: [https://github.com/bbuchfink/diamond](https://github.com/bbuchfink/diamond)
3. Trimmomatic, a flexible read cleaner: [http://www.usadellab.org/cms/?page=trimmomatic](http://www.usadellab.org/cms/?page=trimmomatic)
4. PEAR, if using paired-end data (recommended): [https://sco.h-its.org/exelixis/web/software/pear/](https://sco.h-its.org/exelixis/web/software/pear/)
5. SortMeRNA: [http://bioinfo.lifl.fr/RNA/sortmerna/](http://bioinfo.lifl.fr/RNA/sortmerna/)
2. Make changes to the master_script.bash, which performs all of the main steps in the SAMSA pipeline (preprocessing, annotation, aggregation, and DESeq-enabled statistical analysis)
3. If not using master_script, use DIAMOND to annotate your reads against a database of your choosing (note that database must be local and DIAMOND-indexed). See "example\_DIAMOND\_annotation\_script.bash" for more details.
4. If not using master_script, use "DIAMOND\_analysis\_counter.py" to create a ranked abundance summary of the DIAMOND results from each metatransciptome file.
5. Import these abundance summaries into R and use "run\_DESeq\_stats.R" to determine the most significantly differing features between either individual metatranscriptomes, or control vs. experimental groups.


## SAMSA: Simple Analysis of Metatranscriptomes by Sequence Annotation
Metatranscriptome, RNA-seq data from multiple members of a microbial community, offers incredibly powerful insights into the workings of a complex ecosystem. RNA sequences are able to not only identify the individual members of a community down to the strain level, but can also provide information on the activity of these microbes at the time of sample collection - something that cannot be determined through other meta- (metagenome, 16S rRNA sequencing) method.
## Background
Metatranscriptomics - RNA-seq data from multiple members of a microbial community - offers incredibly powerful insights into the workings of a complex ecosystem. RNA sequences are able to not only identify the individual members of a community down to the strain level, but can also provide information on the activity of these microbes at the time of sample collection - something that cannot be determined through other meta- (metagenome, 16S rRNA sequencing) method.

However, working with metatranscriptome data often proves challenging, given its high complexity and large size. SAMSA is one of the first bioinformatics pipelines designed with metatranscriptome data specifically in mind. It accepts raw sequence data in FASTQ form as its input, and performs four phases:

Expand Down

0 comments on commit 3f90707

Please sign in to comment.