Associations of nuclear mitochondrial DNA insertions and human lifespan in aging fibroblasts and in the human brain
These pipelines will call non-reference Numts as well as non-reference SVs & MEIs from various packages. We will take hfb12-1_S1.bam for an example in the following command lines.
Requires:
samtools
exonerate/2.4.0 (for Dinumt)
vcftools/0.1.15 (for Dinumt)
Dinumt/0.0.23
bcftools/1.5 (for delly)
Delly2/0.8.5
bowtie2/2.1.0 (for MELT)
MELT/2.1.4
For non-reference Numt calling:
${DINUMT}/dinumt.pl \
--mask_filename=${DINUMT}/dinumt/refNumts.bed \
--input_filename=${DATA}/bam/hfb12-1_S1.bam \
--reference=.${REF}/hg19.fa \
--min_reads_cluster=1 \
--include_mask \
--mt_names=chrM \
--output_filename=hfb12-1_S1.bam.final.vcf \
--prefix=hfb12-1_S1.bam.final \
--len_cluster_include=564 \
--len_cluster_link=1128 \
--insert_size=306 \
--max_read_cov=100 \
--output_support \
--support_filename=hfb12-1_S1.bam.final_support.sam \
--ucsc
For merging non-reference Numt VCFs into one file:
grep ^# hfb12-1_S1.bam.final.vcf > header.txt
cat *vcf | grep -v ^# | vcf-sort -c | perl ${DINUMT}/clusterNumtsVcf.pl --samtools=samtools --reference=${REF} > data.txt
cat header.txt data.txt > merged.vcf
For non-reference SV calling:
${DELLY}/src/delly call -g ${REF}/hg19.fa -x ${DELLY}/excludeTemplates/human.hg19.excl.tsv ${DATA}/hfb12-1_S1.bam -o hfb12-1_S1.bam.delly.bcf
bcftools view hfb12-1_S1.bam.delly.bcf > hfb12-1_S1.bam.delly.vcf
For non-reference MEI calling:
java -jar ../MELTv2.1.4/MELT.jar Single -bamfile ${DATA}/hFB7-8_S1.bam -w ../hFB7-8_S1.bam.melt.0205 -t ../MELTv2.1.4/me_refs/1KGP_Hg19/LINE1_MELT.zip -h ${REF}/hg19.fa -n ../MELTv2.1.4/add_bed_files/1KGP_Hg19/hg19.genes.bed -c 21
java -jar ../MELTv2.1.4/MELT.jar Single -bamfile ${DATA}/hFB7-8_S1.bam -w ../hFB7-8_S1.bam.melt.0205 -t ../MELTv2.1.4/me_refs/1KGP_Hg19/ALU_MELT.zip -h ${REF}/hg19.fa -n ../MELTv2.1.4/add_bed_files/1KGP_Hg19/hg19.genes.bed -c 21
java -jar ../MELTv2.1.4/MELT.jar Single -bamfile ${DATA}/hFB7-8_S1.bam -w ../hFB7-8_S1.bam.melt.0205 -t ../MELTv2.1.4/me_refs/1KGP_Hg19/SVA_MELT.zip -h ${REF}/hg19.fa -n ../MELTv2.1.4/add_bed_files/1KGP_Hg19/hg19.genes.bed -c 21
Sample bam for non-reference genetic variant calling:
Please check the folder callset/sample
This script describes steps to use 'Roadmap epigenome states' dataset to find overlaps with Numt positions in the genome. The end goal is to predict chromatin states that Numts may overlap with in human genome.
Step1: Choose a bed file that is relevant to the sample/tissue-type for matching.
Step2: Import it with 'rtracklayer' as GRanges. Extract ranges from your vcf file and use "countOverlaps".
Example dataset is bedfile 'E73' a 15-state epigenome data for DLPFC region. https://docs.google.com/spreadsheets/d/1yikGx4MsO9Ei36b64yOy9Vb6oPC5IBGlFbYEt-N6gOM/edit#gid=15
Requires:
library(GenomicRanges)
library(rtracklayer)
library(VariantAnnotation)
library(MTseeker)
library(MTseekerData)
library(metablastr)
library(biomaRt)
Run
/lib/R_script_epigenome.R
This bash script will give you the VCFs for all tissue/cell-line specific Numts in Life-span model.
Requires:
Inputs under the folder ../data/
Run
bash bash.specific.extraction.sh
For Analysis and callsets:
- Weichen Zhou, Kalpita R. Karan, Wenjin Gu, Hans-Ulrich Klein, Gabriel Sturm, Philip L. De Jager, David A. Bennett, Michio Hirano, Martin Picard, Ryan E Mills. Somatic nuclear mitochondrial DNA insertions are prevalent in the human brain and accumulate over time in fibroblasts, bioRxiv 2023.02.03.527065,
https://doi.org/10.1101/2023.02.03.527065
For Dinumt:
- Dayama, Gargi, Sarah B Emery, Jeffrey M Kidd, and Ryan E. Mills. 2014. The genomic landscape of polymorphic human nuclear mitochondrial insertions,
Nucleic Acids Research, 2014, gku1038,
https://doi.org/10.1093/nar/gku1038