-
Notifications
You must be signed in to change notification settings - Fork 186
Genome diversity
We have included a suite of tools including pedigree analysis and variation between varieties. These tools can be useful in re-sequencing projects aiming at the study of genome diversity.
Tip
Download the test dataset here.
One basic analysis is to visualize pedigrees between varieties that illustrate breeding history. The pedigree information can be encoded in a standard .ped
file.
#Family ID Individual ID Paternal ID Maternal ID Sex (1=male; 2=female; other=unknown) Phenotype
F001 Variety10 Variety11 Variety12 0 0
F001 Variety8 Variety9 Variety10 0 0
F001 Variety7 Variety9 Variety9 0 0
F001 Variety4 Variety7 Variety8 0 0
F001 Variety2 Variety6 Variety4 0 0
F001 Variety3 Variety4 Variety5 0 0
F001 Variety1 Variety2 Variety3 0 0
We can then easily visualize it.
python -m jcvi.compara.pedigree pedigree pedigree.ped --ploidy=8 --N 10000 \
--title "Pedigree of Variety1"
The root nodes (nodes with no parent information) are assumed to be outcrossing. We can then estimate the parentage in the form of piecharts colored by the root nodes. The inbreeding coefficients (
In resequencing projects, it is often useful to visualize the copy-number variations (CNV) between varieties. Let's assume that the lines/varieties have been sequenced and mapped. We can use mosdepth to compute the depth tiled along each of the resequenced genomes, with a binsize of 1000000
.
mosdepth --by 1000000 VAR1_srtd.wgs VAR1_srtd.wgs.bam
mosdepth --by 1000000 VAR2_srtd.wgs VAR2_srtd.wgs.bam
mosdepth --by 1000000 VAR3_srtd.wgs VAR3_srtd.wgs.bam
These commands will generated 3 BED files suitable for CNV plotting - VAR1_srtd.wgs.regions.bed.gz
etc.
python -m jcvi.graphics.landscape depth \
VAR?_srtd.wgs.regions.bed.gz \
--chrinfo chrinfo.txt \
--titleinfo titleinfo.txt
We can further customize the color schemes of chromosomes in chrinfo.txt
.
chr01A, #c51b7d, 1A
chr01B, #4d9221, 1B
chr02A, #c51b7d, 2A
chr02B, #4d9221, 2B
chr03A, #c51b7d, 3A
chr03B, #4d9221, 3B
The titles of each resequencing panels can be customized in titleinfo.txt
.
VAR0_srtd.wgs.regions.bed.gz, *S. species*, ‘Variety 1’
VAR1_srtd.wgs.regions.bed.gz, *S. species*, ‘Variety 2’
VAR2_srtd.wgs.regions.bed.gz, *S. species*, ‘Variety 3’
Finally, with all the plotting elements configured, we can visualize the CNV in 3 varieties.
It is interesting to see there are reduced depth ("deletions") on 3A
and 5B
, which may be investigated further.
© Haibao Tang, 2010-2024