Differential Expression

Differential expression of genes can be done currently using the edgeR package. You need R installed for this. The smallrnaseq.de module has methods to help you do this in as few steps as possible whilst being flexible enough to use on various data sets. The steps involved are to get the correct samples according to factor labels, kept in another file. An example is given below to illustrate using publicly available small rna-seq data.

Data set

The data used for this example is from a published study of small RNA in the serum of cattle with differing antibody responses to Mycoplasma bovis to determine if any small RNAs (with focus on tRF and miRNA) correlate with the antibody response. The data can be downloaded from the sequence read archive (https://www.ncbi.nlm.nih.gov/bioproject/319677).

Sample labels

The sample labels are available in the run table file at the same link. This SraRunTable.txt file should contain all the experimental meta data per sample necessary to replicate the analysis (though these files are sometimes incomplete). In this case we need to use the sample time points to compare matched to the sample file name. These are stored in the Run_s and age_s columns in the file (shown below). If you are analysing your own data you will need to have such a mapping to condition/filename for such an analysis.

Sample of the SraRunTable.txt file:

    BioSample_s Experiment_s Library_Name_s       Run_s      age_s sex_s tissue_s
0  SAMN04908419   SRX1733730     20131617-1  SRR3457948   3 months  male    serum
1  SAMN04908420   SRX1733731     20131617-2  SRR3457949   6 months  male    serum
2  SAMN04908429   SRX1733733     20131779-2  SRR3457950   6 months  male    serum
3  SAMN04908430   SRX1733734     20131779-3  SRR3457951  15 months  male    serum
4  SAMN04908431   SRX1733735     20131794-1  SRR3457952   3 months  male    serum

Example code

Assuming we have all the raw files, they need to be adapter trimmed. Optionally you can remove other ncrnas before counting your target rnas class, though that may not be advisable.

The following code maps all the files to bovine mature miRNAs and counts the mapped genes, then saves the results to a csv file which has the counts in one column per sample. You can skip this if you already have the counts file.

import pandas as pd
import smallrnaseq as smrna
from smallrnaseq import base, utils, de

path = 'pathtodata'
base.BOWTIE_INDEXES = 'bowtie_index'
refs = ['mirbase-bta'] #name of bowtie index

files = glob.glob(path+'/*.fastq') 
outpath = 'ncrna_map'
#map to selected annotation files
counts = smrna.map_rnas(files, refs, outpath, overwrite=True)
R = smrna.pivot_count_data(counts, idxcols=['name','db'])
R.to_csv('mirna_counts.csv',index=False)

The following does the DE analysis. Here we compare the 3 month to 15 months time points. The de.get_factor_samples method does the work of combining the sample labels with the count data to get the right columns into edgeR. You provide conditions to be compared as tuples of (column name,

labels = pd.read_csv(os.path.join(path,'SraRunTable.txt'), sep='\t')
#print labels[['Run_s','age_s']]
counts = pd.read_csv(os.path.join(path,'mirna_counts.csv'))

#get the samples needed for the required conditions you want to compare
data = de.get_factor_samples(counts, labels, [('age_s','3 months'),('age_s','15 months')],
                                samplecol='Run_s', index='name')
res = de.run_edgeR(data=data, cutoff=1.5)
print res
names = res.name

#plot these genes with seaborn
xorder=['3 months','6 months','15 months']
m = de.melt_samples(counts, labels, names, samplecol='Run_s')
g = base.sns.factorplot('age_s','read count', data=m, col='name', kind="point", s=10, lw=1,
                         col_wrap=4,size=4,aspect=1.2,legend_out=True,sharey=False, order=xorder)

Links

References

Casas, E. et al., 2016. Association of MicroRNAs with antibody response to mycoplasma bovis in beef cattle. PLoS ONE, 11(8), pp.1–11.

Provide feedback

Saved searches