DynaSeq_transcriptomics

Use of Sequence dependent DNA shape and conformational ensemble using DynaSeq predictions to analyse transcriptomics data

This page presents necessary scripts for implementation of DNA shape profile analysis on transcriptomics data starting with the differential gene expression results generated by DESeq2. These differentially expressed genes (DEGs) are used to predict subsets of genes regulated by a common TF. The steps followed are as mentioned below:

Identification of differentially-expressed genes

Prediction of genes with significantly different expression patterns between two samples

Identification of genes with shared upstream TFs

Finding genes with common regulatory TF in the upstream regions

Shape analysis of selected promoters

DNA shape analysis of the selected genomic upstream sequences

Identification of potential TF targets

Generation of shape models based on static and ensemble respectively to find potential new gene targets.

Refining results and further analysis

Prospective downstream analysis such as gene ontology enrichment, specificity analyses, experimental validation (not included here)

The whole pipeline can be implemented using:

generate_TSS.R
dna_shape_analysis.R
visualization.R
shape_models.R
analyze_misclassifications.R

The generate_TSS.R file contains customizable functions to read files containing the result of DESeq2 to generate PWMs enrichment in promoters of genes and subset genes by those regulated by a TF and the rest and use this information to create unique TSS coordinates for each gene in both gene sets. This code will generate two bed files named: remaining_degs_filtered.bed and tf_reg_degs_filtered.bed. After this the user would require to use bedtools flank and getfasta on those files to extract genomic sequences into fasta files.
These fasta files are read by the dna_shape_analysis.R script which uses dictionary_ensemble_5bin_5mer and dynaseq_static_diction files to generate the shape profiles for each set of sequences.
The generated static shape and shape ensemble can be used to plot the data into PDF files using visualization.R.
The shape profiles can be modeled using shape_models.R to generate misclassified gene promoters. The shape profiles of misclassified promoters can be visualized along with the original sets of promoters using analyse_misclassifications.R.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DynaSeq_transcriptomics

Use of Sequence dependent DNA shape and conformational ensemble using DynaSeq predictions to analyse transcriptomics data

Identification of differentially-expressed genes

Identification of genes with shared upstream TFs

Shape analysis of selected promoters

Identification of potential TF targets

Refining results and further analysis

The whole pipeline can be implemented using:

Files

README.md

Latest commit

History

README.md

File metadata and controls

DynaSeq_transcriptomics

Use of Sequence dependent DNA shape and conformational ensemble using DynaSeq predictions to analyse transcriptomics data

Identification of differentially-expressed genes

Identification of genes with shared upstream TFs

Shape analysis of selected promoters

Identification of potential TF targets

Refining results and further analysis

The whole pipeline can be implemented using: