Below you can find installation instructions with all necessary libraries.
-
install few first prerequisites
sudo aptitude install build-essential python-dev curl sudo aptitude install software-properties-common unzip
-
install BEDTools
run following commands in shell
curl -O http://bedtools.googlecode.com/files/BEDTools.v2.17.0.tar.gz tar xvzf BEDTools.v2.17.0.tar.gz cd bedtools-2.17.0 make cp bin/* /usr/local/bin/
-
install MongoDB
follow instructions for ubuntu at mongodb website
after installation mongod process should be running and database should be located at /var/lib/mongodb
-
install virtualenv
create directory for virtualenvs
mkdir ~/venvs
download and unpack python-virtualenv wget https://pypi.python.org/packages/source/v/virtualenv/virtualenv-1.10.1.tar.gz tar xvzf virtualenv-1.10.1.tar.gz cd virtualenv-1.10.1
create virtual environment for ave and activate it
python virtualenv.py --no-site-packages ~/venvs/ave_env source ~/venvs/ave_env/bin/activate
-
install node.js
follow instructions at node.js website
These instructions are independent of the operating system. It is important to work in virtualenv (`source ~/venvs/ave_env/bin/activate', as explained above).
-
Download the application.
-
Unpack ave and enter ave directory
-
install node packages
npm install
-
install python libraries
from within ave directory run (make sure that ave virtualenv is activated):
pip install -U cython pip install -r requirements.txt
-
Setup the db
To setup the db with your own data, all Arabidopsis example data you can use provided script. You will need:
-
reference sequence in fasta format
make sure that name of the chromosome (or some other meaningful identifier) is provided as fasta identifier (the string just after ">"). Like in the example for Chromosome 1 sequence:
>Chr1 CHROMOSOME dumped from ADB: Jun/20/09 14:53 CCCTAAACCCTAAACCCTAAACCCTAAACCTCTGAATCCTTAATCCCTA
-
gene annotations in gff3 format
-
SNP annotations in gff3 format
-
chromInfo.txt file containing information about chromosome names and sizes, for example for Arabidopis:
Chr1 30427671 Chr2 19698289 Chr3 23459830 Chr4 18585056 Chr5 26975502 ChrC 154478 ChrM 366924
identifiers in first column must match identifiers in fasta and gff files
-
currently configuration uses json file, it should be valid json format (json validator), it should look like following:
{ "genome": "TAIR10", "ref": [ "/path/to/data/annots/TAIR10_chr1.fas", "/path/to/data/annots/TAIR10_chr2.fas", "/path/to/data/annots/TAIR10_chr3.fas", "/path/to/data/annots/TAIR10_chr4.fas", "/path/to/data/annots/TAIR10_chr5.fas", "/path/to/data/annots/TAIR10_chrC.fas", "/path/to/data/annots/TAIR10_chrM.fas" ], "annot": [ "/path/to/data/annots/TAIR10_GFF3_genes.gff", "/path/to/data/annots/snps/CDS_snps.gff", "/path/to/data/annots/snps/three_prime_UTR_snps.gff", "/path/to/data/annots/snps/five_prime_UTR_snps.gff" ], "chromInfo": "/path/to/data/annots/chromInfo.txt" }
Please validate gff files before importing them. This can be done at genome tools webiste
SNPs should be annotated like in this example ("SNP_" in col 3 is important for fature to be recognised as snp) columns 1-7:
Chr1 1001Genomes SNP_adal_3 138 138 3 . .
column 8 (key value pairs):
Change=T:C;Strain=adal_3;Project=GMINordborg2010;ID=9323.138
First column should correspond to seq id from fasta file provided as reference.
In last column:
Change
followsreference:variant
orderStrain
is the name of the strain/accession/ecotype in which this SNP have been called.Project
is the sequencing projectID
is any unique identifier for this SNPYou can annotate the SNPs in gff file with SNPs location.
python ./ave_tools.py group_snps_by_loc --annot gene_annotation.gff \ --snps snp_file1.gff --snp_file2.gff
or
python ./ave_tools.py group_snps_by_loc --annot gene_annotation.gff \ --snps *.gff
The script generates new gff files, one for each snp location, with annotated location in last column:
Project=GMINordborg2010;Strain=ale_stenar_44_4;variant_location=CDS; ID=992.6992;Change=T:C
To import data into the database use config file (conf.json):
python ./ave_tools.py import --config conf.json
-
run:
node app.js
Access app from within web browser (preferably latest chrome). Ip address and port is provided in app.js output.
Example SNP annotations have been obtained from 1001 Genomes Project. Please read the Data Usage Policy at the project website.