IBM map was downloaded from the MaizeGDB.
#!/bin/bash
wget -O b73-IBM_887740.map https://www.maizegdb.org/map_text?id=887740
wget -O b73-IBM_887741.map https://www.maizegdb.org/map_text?id=887741
wget -O b73-IBM_887742.map https://www.maizegdb.org/map_text?id=887742
wget -O b73-IBM_887743.map https://www.maizegdb.org/map_text?id=887743
wget -O b73-IBM_887744.map https://www.maizegdb.org/map_text?id=887744
wget -O b73-IBM_887745.map https://www.maizegdb.org/map_text?id=887745
wget -O b73-IBM_887746.map https://www.maizegdb.org/map_text?id=887746
wget -O b73-IBM_887747.map https://www.maizegdb.org/map_text?id=887747
wget -O b73-IBM_887748.map https://www.maizegdb.org/map_text?id=887748
wget -O b73-IBM_887749.map https://www.maizegdb.org/map_text?id=887749
The maps were saved as a tab-delimted text file.
build index for your scaffolds:
00_build_index_hisat2.sh B73Ab10.scaffolds.fasta
for f in *.map; do
gg1_ProcessMaizeGDBmaps.sh $f B73Ab10;
done
You will see .fasta
and .tsv
for each chr separately
gg2_MergeMarkers.sh B73Ab10
No stdout, but you will see:
B73Ab10_merged.fasta
B73Ab10_merged.txt
as output
Map markers
gg3_MapMarkers.sh \
/work/LAS/mhufford-lab/arnstrm/Canu_1.8/genetic_maps/B73Ab10/B73Ab10_index/B73Ab10 \
B73Ab10_merged.fasta \
B73Ab10_merged.txt B73Ab10
No stdout, but few files are generated by this script.
B73Ab10_mapped.sam
mapping_stats.txt
B73Ab10_mapped.bam
B73Ab10_sorted.bam
B73Ab10_sorted-q30.bam
B73Ab10_sorted_noMismatch.bam
B73Ab10_part1.txt
B73Ab10_part2.txt
B73Ab10_IBM-mapped.csv
The file B73Ab10_IBM-mapped.csv
is needed for the ALLMAPS step later, but we will renamed this file:
mv B73Ab10_IBM-mapped.csv ibm-markers.csv
First we need to prepare PG marker file (once for all NAM lines). File is located in CyVerse Data Commons that is accessible via iRods
icd /iplant/home/shared/panzea/genotypes/GBS/v27
iget Lu_2015_NatCommun_panGenomeAnchors20150219.txt.gz
or through direct link:
https://datacommons.cyverse.org/browse/iplant/home/shared/panzea/genotypes/GBS/v27/Lu_2015_NatCommun_panGenomeAnchors20150219.txt.gz
# wget/curl will not work
Once downloaded process the file downloaded as follows:
for f in {1..10}; do
awk -v x=$f '$3==x && $7==0 {print $5"\tpg_"NR"\t"$8}' Lu_2015_NatCommun_panGenomeAnchors20150219.txt;
done > pb_anchors.txt
# bed file to extract sequence
for f in {1..10}; do
awk -v x=$f '$3==x && $7==0 {print $5"\t"$6-50"\t"$6+50"\tpg_"NR"\t.\t+"}' Lu_2015_NatCommun_panGenomeAnchors20150219.txt;
done > pb_anchors.bed
# genome
wget ftp://ftp.ensemblgenomes.org/pub/plants/release-22/fasta/zea_mays/dna/Zea_mays.AGPv3.22.dna.genome.fa.gz
gunzip Zea_mays.AGPv3.22.dna.genome.fa.gz
# marker sequence
ml bedtools2
awk '$2>0' pb_anchors.bed > temp
mv temp pb_anchors.bed
bedtools getfasta -fi Zea_mays.AGPv3.22.dna.genome.fa -fo pb_anchors.fasta -bed pb_anchors.bed -name
B73Ab10 example run:
gg3_MapMarkers.sh \
/work/LAS/mhufford-lab/arnstrm/Canu_1.8/genetic_maps/B73Ab10/B73Ab10_index/B73Ab10 \
pb_anchors.fasta \
pb_anchors.txt \
B73Ab10
same set of files are created as before. But you will only need B73Ab10_IBM-mapped.csv
. Rename this file
B73Ab10_IBM-mapped.csv B73Ab10_pangenome.csv
gg4_clean-markers.sh B73Ab10_pangenome.csv
this will create pangeome.csv
file.
Only if you find heterozygous scaffolds in your genome (based on optical map), you can construct a file with just the names of scaffolds that are heterozygous and remove them form the marker file. This will prevent them from including those scaffolds in the AGP.
remove-hets.sh hets-to-remove.txt
Run ALLMAPS
singularity exec --bind $PWD /work/LAS/mhufford-lab/arnstrm/Canu_1.8/genetic_maps/jcvi.simg \
/work/LAS/mhufford-lab/arnstrm/Canu_1.8/genetic_maps/98_runALLMAPS.sh \
pangenome.csv \
ibm-markers.csv \
B73Ab10.scaffolds.fasta
This process was repeated for (A) ONT only assembly, (B) PacBio only assembly, and (C) Hybrid assembly scaffolds separately.