Skip to content

Latest commit

 

History

History
119 lines (92 loc) · 4.21 KB

File metadata and controls

119 lines (92 loc) · 4.21 KB

Associations of nuclear mitochondrial DNA insertions and human lifespan in aging fibroblasts and in the human brain

Variant calling pipelines

These pipelines will call non-reference Numts as well as non-reference SVs & MEIs from various packages. We will take hfb12-1_S1.bam for an example in the following command lines.

Requires:

samtools
exonerate/2.4.0 (for Dinumt) 
vcftools/0.1.15 (for Dinumt) 
Dinumt/0.0.23
bcftools/1.5 (for delly)
Delly2/0.8.5
bowtie2/2.1.0 (for MELT) 
MELT/2.1.4

For non-reference Numt calling:

${DINUMT}/dinumt.pl \
--mask_filename=${DINUMT}/dinumt/refNumts.bed \
--input_filename=${DATA}/bam/hfb12-1_S1.bam \
--reference=.${REF}/hg19.fa \
--min_reads_cluster=1 \
--include_mask \
--mt_names=chrM \
--output_filename=hfb12-1_S1.bam.final.vcf \
--prefix=hfb12-1_S1.bam.final \
--len_cluster_include=564 \
--len_cluster_link=1128 \
--insert_size=306 \
--max_read_cov=100 \
--output_support \
--support_filename=hfb12-1_S1.bam.final_support.sam \
--ucsc

For merging non-reference Numt VCFs into one file:

grep ^# hfb12-1_S1.bam.final.vcf  > header.txt 
cat *vcf | grep -v ^# | vcf-sort -c | perl ${DINUMT}/clusterNumtsVcf.pl --samtools=samtools --reference=${REF} > data.txt 
cat header.txt data.txt > merged.vcf

For non-reference SV calling:

${DELLY}/src/delly call -g ${REF}/hg19.fa -x ${DELLY}/excludeTemplates/human.hg19.excl.tsv ${DATA}/hfb12-1_S1.bam -o hfb12-1_S1.bam.delly.bcf
bcftools view hfb12-1_S1.bam.delly.bcf > hfb12-1_S1.bam.delly.vcf

For non-reference MEI calling:

java -jar ../MELTv2.1.4/MELT.jar Single -bamfile ${DATA}/hFB7-8_S1.bam -w ../hFB7-8_S1.bam.melt.0205 -t ../MELTv2.1.4/me_refs/1KGP_Hg19/LINE1_MELT.zip -h ${REF}/hg19.fa -n ../MELTv2.1.4/add_bed_files/1KGP_Hg19/hg19.genes.bed -c 21
java -jar ../MELTv2.1.4/MELT.jar Single -bamfile ${DATA}/hFB7-8_S1.bam -w ../hFB7-8_S1.bam.melt.0205 -t ../MELTv2.1.4/me_refs/1KGP_Hg19/ALU_MELT.zip -h ${REF}/hg19.fa -n ../MELTv2.1.4/add_bed_files/1KGP_Hg19/hg19.genes.bed -c 21
java -jar ../MELTv2.1.4/MELT.jar Single -bamfile ${DATA}/hFB7-8_S1.bam -w ../hFB7-8_S1.bam.melt.0205 -t ../MELTv2.1.4/me_refs/1KGP_Hg19/SVA_MELT.zip -h ${REF}/hg19.fa -n ../MELTv2.1.4/add_bed_files/1KGP_Hg19/hg19.genes.bed -c 21

Sample bam for non-reference genetic variant calling:

Please check the folder callset/sample

Epigenetic analysis

This script describes steps to use 'Roadmap epigenome states' dataset to find overlaps with Numt positions in the genome. The end goal is to predict chromatin states that Numts may overlap with in human genome.

Step1: Choose a bed file that is relevant to the sample/tissue-type for matching.

Step2: Import it with 'rtracklayer' as GRanges. Extract ranges from your vcf file and use "countOverlaps".

Example dataset is bedfile 'E73' a 15-state epigenome data for DLPFC region. https://docs.google.com/spreadsheets/d/1yikGx4MsO9Ei36b64yOy9Vb6oPC5IBGlFbYEt-N6gOM/edit#gid=15

Requires:

library(GenomicRanges)
library(rtracklayer)
library(VariantAnnotation)
library(MTseeker)
library(MTseekerData)
library(metablastr)
library(biomaRt)

Run

/lib/R_script_epigenome.R

Identification of tissue/cell-line specific Numts

This bash script will give you the VCFs for all tissue/cell-line specific Numts in Life-span model.

Requires:

Inputs under the folder ../data/

Run

bash bash.specific.extraction.sh

Citation

For Analysis and callsets:

For Dinumt:

Contact:

[email protected] or https://github.com/WeichenZhou

[email protected] or https://github.com/kalpita23