Scripts used to extract curated lists of RNA modification enzymes and assess their tissue-specificity across multiple species and tissues, as well as cancer and normal tissues, used in the paper:
Begik O, Lucas MC, Liu H, Ramirez JM, Mattick JS and Novoa EM. Integrative analyses of the RNA modification machinery reveal tissue- and cancer-specific signatures. Genome Biology, May 2020. link: https://rdcu.be/b3Z5I. doi: https://doi.org/10.1186/s13059-020-02009-z
Data accompanying the paper can be found here HERE
PART1: Search and extract sets of RNA modification enzymes in selected species across the tree of life
Required: HMMER
find_homologs.sh <pfam_hmm> <fasta_reference_proteome>
# Example: find_homologs.sh A_deamin.hmm.txt Saccharomyces_cerevisiae.fasta
- This script takes pfam profile as an input and fasta reference proteome
- Output will be proteins that have similiar functional domains, e.g. A_deamin.hmm.txtSaccharomyces_cerevisiae.fasta
- Manual curation is performed to select candidates (based on literature, etc)
Extract fasta sequences for the proteins of interest in a list of orthologs proteins of a gene group
Required: perl
extract_fasta.sh <uniprot_ID_list>
# Example: extract_fasta.sh A_deamin
Required: mafft
mafft.sh <FASTA>
# Example: mafft.sh uniprotIDlist.txt.named.fasta
Required : iqtree
iqtree.sh <MAFFT_OUTPUT>
# Example: iqtree.sh uniprotIDlist.txt.named.fasta.mafft
Initially obtained list of Main RNA Writer Proteins and we added non-catalytic subunits, readers, erasers and other tRNA writer proteins from V de Crécy-Lagard et al - 2019. Therefore, we have 146 RMMs at the end.
Extract TPM values for the RNA modification enzymes from the GTEx TPM dataset
Rscript gtex_manipulation.R <GTEX.Expression.File> <ENSEMBL_GeneSymbol_Class.File>
# Example: Rscript gtex_manipulation.R GTEx_Analysis_2016-01-15_v7_RNASeQCv1.1.8_gene_median_tpm.gct human_id_symbol_class.tsv
Extract TPM values for the RNA modification enzymes from the ENCODE TPM dataset
Rscript encode_manipulation.R <ENCODE.Expression.File> <ENSEMBL_GeneSymbol_Class.File>
# Example: Rscript encode_manipulation.R mm65.long.gene.with.expr.cshl.tsv mouse_id_symbol_class.tsv
Scripts for tissue-specificity analysis and plots
Rscript gtex_tissuewide.R <TPM table>
# Example: Rscript gtex_tissuewide.R RMLP.GTEX.TissueAveraged.TPM.tsv
Scripts for tissue-specificity analysis and plots
Rscript encode_tissuewide.R <TPM table>
# Example: Rscript encode_tissuewide.R RMLP.encode.TPM.brainav.tsv
Scripts for Pearson correlation analysis between two datasets
Rscript gtex_vs_encode_similarity.R <GTEX data> <ENCODE data>
# Example: Rscript gtex_vs_encode_similarity.R RMLP.GTEX.TissueAveraged.TPM.tsv RMLP.encode.TPM.brainav.tsv
Rscript kaessmann.amniote.R <input.expression.data> <ENSEMBL_GeneSymbol_Class.File>
# Example: Rscript kaessmann.amniote.R NormalizedRPKM_ConstitutiveExons_Amniote1to1Orthologues.txt human_id_symbol_class.tsv
Rscript kaessmann.primate.R <input.expression.data> <ENSEMBL_GeneSymbol_Class.File>
# Example: Rscript kaessmann.primate.R NormalizedRPKM_ConstitutiveExons_Primate1to1Orthologues.txt human_id_symbol_class.tsv
Rscript spermatogenesis.R <spermatogenesis.expression.data> <ensembl_file>
# Example: Rscript spermatogenesis.R spermatogenesis_scRNA_averageexpression.tsv gene_hgnc_ensmus.tsv
Rscript cancer_script1_datamanipulation.R <TCGA.GTEX.file> <ENSEMBL.file>
Example: Rscript cancer_script1_datamanipulation.R RMLP.TcgaTargetGtex_rsem_gene_tpm_withheader.tsv human_id_symbol_class.tsv
Rscript cancer_script2_boxplot.R <TCGA.GTEX.file>
Example: Rscript cancer_script2_boxplot.R TCGA_GTEX_FINAL.log2.without3cancer.tsv
Rscript cancer_script3_mean_heatmap.R <TCGA.GTEX.file>
Example: Rscript cancer_script3_mean_heatmap.R TCGA_GTEX_FINAL.log2.without3cancer.tsv
Rscript cancer_script4_log2FC.R <MedianLog Tumor and Normal TPM file> <Original TPM File>
Example: Rscript cancer_script4_log2FC.R medianlog_tumor_normal.tpm.tsv RMLP.TcgaTargetGtex_rsem_gene_tpm_withheader.tsv
Rscript cancer_script5_Dysregulation.R <MedianLog Tumor and Normal TPM file for all genes> <MedianLog Tumor and Normal TPM file for RMPs>
Example: Rscript cancer_script4_log2FC.R all_genes_logmedian_scatter_format.tsv medianlog_tumor_normal.tpm.tsv
Rscript cancer_script6_Stage.R <Gene Expression File> <ENSEMBL_GeneSymbol_Class.File> <clinical information> <Phenotype data>
Example: Rscript cancer_script6_stageexpression.R RMLP.TcgaTargetGtex_rsem_gene_tpm_withheader.tsv human_id_symbol_class.tsv clinical.tsv TcgaTargetGTEX_phenotype.txt
Rscript cancer_script7_SURVIVAL.R <Gene Expression File> <Survival data>
Example: Rscript cancer_script7_SURVIVAL.R TCGA_GTEX_FINAL.log2.without3cancer.tsv TCGA_survival_data