-
Notifications
You must be signed in to change notification settings - Fork 19
Counting microRNAs
There are two basic was to count the RNAs in your fastq files, by aligning to the reference genome or to a library of known sequences. The latter is faster and simpler but limited to known genes and cannot deal as well with ambiguous reads. More complex analyses and novel discovery are possible using the reference genome.
Currently bowtie and subread are supported for alignment. You can first set the alignment parameters BOWTIE_PARAMS and SUBREAD_PARAMS if you want to alter the default settings.
By default bowtie is mapped to library sequences using -n 1 -l 20
which allows one mismatch up to 20 and ignores mismatches beyond that so that non-templated additions at the 3' end can be included. See the isomiR counting section.
miRNA-seq reads can be mapped to the known mirbase sequences as a quick way to count known mature mirnas and may often be adequate. You can use the map_mirbase
function to do this in one step. This does the following:
- creates the mature sequences with flanking nucleotides
- creates a bowtie/subread index
- aligns and counts the reads
- counts isomirs
- returns a pandas dataframe
A file called mature_counts.csv
will also be saved. isomirs are saved as isomir_counts.csv
Example:
import smallrnaseq as smrna
res = smrna.map_mirbase(files=['test_1.fastq','test_2.fastq'], overwrite=True, aligner='bowtie',
species='hsa', pad5=3, pad3=5)
This requires a reference genome and a gtf file with miRNA features.
Example:
featcounts = srseq.map_genome_features(['test_1.fastq'], 'bos_taurus', gtffile,
outpath='ncrna_map', aligner='subread', merge=True)
You can call mirna counting from the command line without using Python commands. The key to using this is the config file. For mirnas you need to add the following settings:
Then call the command using:
smallrnaseq -c mymirs.conf -r