-
Notifications
You must be signed in to change notification settings - Fork 19
Command line interface
Installing the package provides the command smallrnaseq
in your path. This allows users is a command line interface to the library without the need for any Python coding at all. It provides a set of pre-defined functions with parameters specified in a text configuration file.
Usage solely involves setting up the config file and having your input files prepared. Running the command smallrnaseq -r
will create a default config file for you. You can then edit this. Then run:
smallrnaseq -c default.conf -r
The advantage of configuration files is avoiding long commands that have to be re-typed or are prone to mistakes. Also the files can be kept to recall what setting we used or to copy them for another set of files. The current options available in the file are as follows. The meanings of each option is explained explained below. If you are unsure or don't require a setting, leave it at the default.
[base]
filename =
path = testfiles
filetype = fastq
index_path = indexes
aligner = bowtie
bowtie_params = -v 1 --best
ref_genome =
features =
indexes = RFAM,mirbase-hsa
output = smrna_results
counting = default
add_labels = 0
mirbase = 0
species = bta
pad5 = 3
pad3 = 5
Settings explained:
name | example value | meaning |
---|---|---|
filename | test.fastq | input fastq file with reads |
path | testfiles | folder containing fastq files instead of a single file |
filetype | fastq | |
index_path | indexes | location of bowtie or subread indexes |
aligner | bowtie | which aligner to use, bowtie or subread |
bowtie_params | -v 1 --best | alignment parameters |
ref_genome | hg19 | reference genome index name |
features | Homo_sapiens.GRCh37.75.gtf | genome annotation file |
indexes | RFAM,mirbase-hsa | names of annotated library indexes to map to |
output | smrna_results | output folder for temp files |
counting | default | method of feature counting |
add_labels | 0 | whether to add labels to replace the file names in the results |
mirbase | 0 | map to mirbase only |
species | bta | mirbase species to use |
pad5 | 3 | 3' flanking bases to add when generating mature mirbase sequences |
pad3 | 5 | 5' flanking bases to add |
Say we have a set of fastq files in the folder 'testfiles' that we want to count miRNAs in. We would simply set the options mirbase = 1
and path = testfiles
. If your file names are long and you want to replace them with short ids, set add_labels = 1
. This also writes out a file called samplelabels.csv' in the output folder. Note if just mapping to mirbase we don't have to set an index file since it is generated automatically.
The main outputs are csv files with the counts for each sample in a column, along with normalised count column. These csv files can be opened in a spreadsheet.