1. K562 analysis

1.1 Pre-processing

For each step of the pre-processing, there are two scripts named after the tool used (e.g. fastqc). One R script (e.g. fastqc.r) which generates a script with bash commands and a bash wrapper script (e.g. fastqcWrapper.sh) that submits the script previously generated by the R script via qsub to the HPC. Only the R and wrapper scripts are included on here.

For pre-processing steps where the type of reads (i.e. single end or pair end) was relevant (e.g. aligning to the genome), separate scripts were created and are denoted by SE and PE in the file name for single end and pair end respectively. The pre-processing scripts were used in the following order:

cat_files_SE/PE.r - to merge experiment replicates
fastqc.r / fastqcWrapper.sh - for quality control
trimmomaticSE/PR.r / trimmommaticWrapperSE/PE.sh - to trim adapters sequences
bowtie_SE/PE.r / bowtieWrapperSE/PE.sh - to align the data to the genome
sam2bamSE.r / sam2bamSEcmds.sh / sam2bamSEWrapper.sh - to convert the sam files to bam - this was optional and was not performed for all dada due to some issues with peak calling from bam files
callpeaks.r / callpeaksWrapper.sh - to call ChIP peaks

1.2 ChIPanalyser analysis

A combination of R and bash scripts were used for the ChIPanalyser analysis as follows:

generateTable.r - to store all the necessary parameters for the analysis for each TF into one table
ChIPanalPerformAnalysis.R - to extract the parameters from the table and parse them, finally passing them to the performAnalysis.r function
performAnalysis.r - to perform the analysis; this can be called manually for indivisual TFs, or by the ChIpanalPerformAnalysis.sh script

2. mm10 analysis

2.1 Pre-processing

For pre-processing steps where the type of reads (i.e. single end or pair end) was relevant (e.g. aligning to the genome), separate scripts were created and are denoted by SE and PE in the file name for single end and pair end respectively. The pre-processing scripts were used in the following order:

1_preprocessing_rscripts.R - to generate all preprocessing scripts into .sh files
2_checkpreproc.R - to check that the preprocessing has worked
3_preproc_barplots.R - to generate the statistic plots for the preprocessing

2.2 ChIPanalyser analysis

A combination of R scripts were used

4_ChIPanalyser_analysis.R - to perform the model training and validation with ChIPanalyser
5_optimal_data.R - to extract optimal parameters from

3. IMR90 and HepG2 analysis

2.1 Pre-processing

For pre-processing steps where the type of reads (i.e. single end or pair end) was relevant (e.g. aligning to the genome), separate scripts were created and are denoted by SE and PE in the file name for single end and pair end respectively. The pre-processing scripts were used in the following order:

1_preProcessing_general.R - to generate all preprocessing scripts into .sh files for IMR90 cells
2_barplots_general.R - to generate the statistic plots for the preprocessing for IMR90 cells
3_ATACseq_general.R - to preprocess ATAC-seq data in IMR90 cells
4_DNase_general.R - to preprocess DNaseI-seq data in IMR90 cells
5_MNase_general.R - to preprocess MNase-seq data in IMR90 cells
6_NOMe_general.R - to preprocess NOMe-seq data in IMR90 cells
7_calculateAccessibilityLevels.R - calculate accessibility levels for different QDAs in IMR90 cells

2.2 ChIPanalyser analysis

A combination of R scripts were used

8_getMotif_general.R - to extract motifs for the TFs
9_ChIPanalyser_general.R - to prepare to objects for ChIPanalyser algorithm
10_executingChIPanalyser_general.R - to perform the model training and validation with ChIPanalyser
11_HEPG2_DNAaccessibility_preprocessing.R - to preprocess ATAC-seq, DNaseI-seq and MNase-seq data in HepG2 cells
12_validatingResultsWithCREB1.R - analysis for CREB1 in HepG2 cells
13_validatingResultsWithFOXA1.R - analysis for FoxA1 in HepG2 cells
14_validatingResultsWithGATA4.R - analysis for GATA4 in HepG2 cells

3. MCF10A analysis

2.1 analysis

A combination of R scripts were used

1_getFPKM_and_newLostMaintained.R - to rescale the parameters for the models trained in K562 cells based on RNA-seq data.
2_newLostMaintained_regions.R - the regions that lose, gain and maintain DNA accessibility.
3_generateAccessData.R - generates accessibility datasets
4_ChIPanalyser.R - runs ChIPanalyser over the regions that lose, gain and maintain DNA accessibility
5_her2_plots.R - generate the plots.

4. Plots for the paper

Once all the above scripts are run, you can use plots_analysis_split.R to generate almost all plots from the manuscript.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
IMR90_HepG2_analysis		IMR90_HepG2_analysis
K562_analysis		K562_analysis
MCF10A_analysis		MCF10A_analysis
mm10_analysis		mm10_analysis
README.md		README.md
plots_analysis_split.R		plots_analysis_split.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1. K562 analysis

1.1 Pre-processing

1.2 ChIPanalyser analysis

2. mm10 analysis

2.1 Pre-processing

2.2 ChIPanalyser analysis

3. IMR90 and HepG2 analysis

2.1 Pre-processing

2.2 ChIPanalyser analysis

3. MCF10A analysis

2.1 analysis

4. Plots for the paper

About

Releases 1

Packages

Languages

nrzabet/human_TF_analysis

Folders and files

Latest commit

History

Repository files navigation

1. K562 analysis

1.1 Pre-processing

1.2 ChIPanalyser analysis

2. mm10 analysis

2.1 Pre-processing

2.2 ChIPanalyser analysis

3. IMR90 and HepG2 analysis

2.1 Pre-processing

2.2 ChIPanalyser analysis

3. MCF10A analysis

2.1 analysis

4. Plots for the paper

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages