-
Notifications
You must be signed in to change notification settings - Fork 19
Differential Expression
Differential expression of genes can be done currently using the edgeR package. You need R installed for this. The smallrnaseq.de
module has methods to help you do this in as few steps as possible whilst being flexible enough to use on various data sets. The steps involved are to get the correct samples according to factor labels, kept in another file. An example is given below to illustrate using publicly available small rna-seq data.
The data used for this example is from a published study of small RNA in the serum of cattle with differing antibody responses to Mycoplasma bovis to determine if any small RNAs (with focus on tRF and miRNA) correlate with the antibody response. The data can be downloaded from the sequence read archive (https://www.ncbi.nlm.nih.gov/bioproject/319677).
The sample labels are available in the run table file at the same link. This SraRunTable.txt file should contain all the experimental meta data per sample necessary to replicate the analysis (though these files are sometimes incomplete). In this case we need to use the sample time points to compare matched to the sample file name. These are stored in the Run_s and age_s columns in the file (shown below). If you are analysing your own data you will need to have such a mapping to condition/filename for such an analysis.
Sample of the SraRunTable.txt file:
BioSample_s Experiment_s Library_Name_s Run_s age_s sex_s tissue_s
0 SAMN04908419 SRX1733730 20131617-1 SRR3457948 3 months male serum
1 SAMN04908420 SRX1733731 20131617-2 SRR3457949 6 months male serum
2 SAMN04908429 SRX1733733 20131779-2 SRR3457950 6 months male serum
3 SAMN04908430 SRX1733734 20131779-3 SRR3457951 15 months male serum
4 SAMN04908431 SRX1733735 20131794-1 SRR3457952 3 months male serum
Assuming we have all the raw files, they need to be adapter trimmed. Optionally you can remove other ncrnas before counting your target rnas class, though that may not be advisable.
The following code maps all the files to bovine mature miRNAs and counts the mapped genes, then saves the results to a csv file which has the counts in one column per sample. You can skip this if you already have the counts file.
import pandas as pd
import smallrnaseq as smrna
from smallrnaseq import base, utils, de
path = 'pathtodata'
base.BOWTIE_INDEXES = 'bowtie_index'
refs = ['mirbase-bta'] #name of bowtie index
files = glob.glob(path+'/*.fastq')
outpath = 'ncrna_map'
#map to selected annotation files
counts = smrna.map_rnas(files, refs, outpath, overwrite=True)
R = smrna.pivot_count_data(counts, idxcols=['name','db'])
R.to_csv('mirna_counts.csv',index=False)
The following does the DE analysis. Here we compare the 3 month to 15 months time points. The de.get_factor_samples
method does the work of combining the sample labels with the count data to get the right columns into edgeR. You provide conditions to be compared as tuples of (column name,
labels = pd.read_csv(os.path.join(path,'SraRunTable.txt'), sep='\t')
#print labels[['Run_s','age_s']]
counts = pd.read_csv(os.path.join(path,'mirna_counts.csv'))
#get the samples needed for the required conditions you want to compare
data = de.get_factor_samples(counts, labels, [('age_s','3 months'),('age_s','15 months')],
samplecol='Run_s', index='name')
res = de.run_edgeR(data=data, cutoff=1.5)
print res
names = res.name
#plot these genes with seaborn
xorder=['3 months','6 months','15 months']
m = de.melt_samples(counts, labels, names, samplecol='Run_s')
g = base.sns.factorplot('age_s','read count', data=m, col='name', kind="point", s=10, lw=1,
col_wrap=4,size=4,aspect=1.2,legend_out=True,sharey=False, order=xorder)
Casas, E. et al., 2016. Association of MicroRNAs with antibody response to mycoplasma bovis in beef cattle. PLoS ONE, 11(8), pp.1–11.