Use of Sequence dependent DNA shape and conformational ensemble using DynaSeq predictions to analyse transcriptomics data
This page presents necessary scripts for implementation of DNA shape profile analysis on transcriptomics data starting with the differential gene expression results generated by DESeq2. These differentially expressed genes (DEGs) are used to predict subsets of genes regulated by a common TF. The steps followed are as mentioned below:
Prediction of genes with significantly different expression patterns between two samples
Finding genes with common regulatory TF in the upstream regions
DNA shape analysis of the selected genomic upstream sequences
Generation of shape models based on static and ensemble respectively to find potential new gene targets.
Prospective downstream analysis such as gene ontology enrichment, specificity analyses, experimental validation (not included here)
generate_TSS.R
dna_shape_analysis.R
visualization.R
shape_models.R
analyze_misclassifications.R
The generate_TSS.R file contains customizable functions to read files containing the result of DESeq2 to generate PWMs enrichment in promoters of genes and subset genes by those regulated by a TF and the rest and use this information to create unique TSS coordinates for each gene in both gene sets. This code will generate two bed files named: remaining_degs_filtered.bed and tf_reg_degs_filtered.bed. After this the user would require to use bedtools flank and getfasta on those files to extract genomic sequences into fasta files.
These fasta files are read by the dna_shape_analysis.R script which uses dictionary_ensemble_5bin_5mer and dynaseq_static_diction files to generate the shape profiles for each set of sequences.
The generated static shape and shape ensemble can be used to plot the data into PDF files using visualization.R.
The shape profiles can be modeled using shape_models.R to generate misclassified gene promoters. The shape profiles of misclassified promoters can be visualized along with the original sets of promoters using analyse_misclassifications.R.