This repository contains the code used for the paper Viart et al. (2025) entitled "Breast tumors from ATM pathogenic variant carriers display a specific genome-wide DNA methylation profile"
To be able to use the code of this repository, you have first to clone the repository with the command:
git clone https://github.com/nviart/Viart_ATM_breast_methylation.git
This code is written in R language. The easiest way to use it with the same development environment is to use the versionning system renv (https://rstudio.github.io/renv/index.html). First install renv package if not done. Then, change in the config.R file the path to renv environment ("renv.path" variable). It will allow to load the environment in each of the other scripts. The R environment must be launched within the script folder.
Some files need to be downloaded:
- The files from Pidsley et al. (2016) that can be found here: https://github.com/sirselim/illumina450k_filtering
- The manifest from Illumina named "infinium-methylationepic-v-1-0-b5-manifest-file.csv" that can be found here: https://support.illumina.com/downloads/infinium-methylationepic-v1-0-product-files.html
- You also need bed files indicating the promoter, gene and CpG Islands regions that you want to study.
The config.R file contains some variables that will be used by the other scripts. Set them according to your own environment. You have to indicate:
Origin
: path to the files where will be saved the outputs of the pipelinemanifest.path
: path to the manifest file downloaded in step 2onlyEPIC
: if only EPIC arrays will be usedEPICfolder
: the folder containig ALL your methylation arrays dataDescriptorFileName1
andDescriptorFileName1
: the names of the csv descriptive files of the samples that should contain at least ....probeFiltering
: boolean, if you want to filter the probes described in Pidsley et al. (2016)path.filters
: path to the files downloaded from https://github.com/sirselim/illumina450k_filtering
Note All the scripts are made to run on Torque PBS scheduler. You have to adapt the head of each script according to your environment.
The preprocessing pipeline used in the paper Viart et al. (2025) is described in the chart below:
The script 1.Functional_normalisation.R
goes from the loading of the data to the matrix of M values.
Then, the script 2.Mapping.R
allows to map the M values to promoters, genes and CpG Islands using the bed files indicated in the config.R file.