Simulate both DNA and RNA-seq data with specified transcript abundances in order to benchmark allele-specific expression algorithms.
- Reference file (fasta)
- GFF (Gene and transcript coordinates)
- dbSNP database.
- Paired-end DNA fastq files
- Paired-end mRNA fastq files
- TSV file with SNPs
From the GFF file we take exon coordinates and intersect them with the dbSNP to select SNPs that belong to exons. We also add a number of random SNVs per exon and we randomly assign a phase to each mutation. Then, we create two contigs (haplotypes) from a reference contig by introducing mutations based on the phase value.