Skip to content

Latest commit

 

History

History
169 lines (124 loc) · 4.46 KB

README.md

File metadata and controls

169 lines (124 loc) · 4.46 KB

AGP generation steps

Process IBM Markers and generate mapped marker file

IBM map was downloaded from the MaizeGDB.

#!/bin/bash
wget -O b73-IBM_887740.map https://www.maizegdb.org/map_text?id=887740
wget -O b73-IBM_887741.map https://www.maizegdb.org/map_text?id=887741
wget -O b73-IBM_887742.map https://www.maizegdb.org/map_text?id=887742
wget -O b73-IBM_887743.map https://www.maizegdb.org/map_text?id=887743
wget -O b73-IBM_887744.map https://www.maizegdb.org/map_text?id=887744
wget -O b73-IBM_887745.map https://www.maizegdb.org/map_text?id=887745
wget -O b73-IBM_887746.map https://www.maizegdb.org/map_text?id=887746
wget -O b73-IBM_887747.map https://www.maizegdb.org/map_text?id=887747
wget -O b73-IBM_887748.map https://www.maizegdb.org/map_text?id=887748
wget -O b73-IBM_887749.map https://www.maizegdb.org/map_text?id=887749

The maps were saved as a tab-delimted text file.

Step 0: Build index

build index for your scaffolds:

00_build_index_hisat2.sh B73Ab10.scaffolds.fasta

Step 1: Process markers

for f in *.map; do
  gg1_ProcessMaizeGDBmaps.sh $f  B73Ab10;
done

You will see .fasta and .tsv for each chr separately

Step 2: Pool markers

gg2_MergeMarkers.sh B73Ab10

No stdout, but you will see:

B73Ab10_merged.fasta
B73Ab10_merged.txt

as output

Step 3a: Map IBM markers

Map markers

gg3_MapMarkers.sh \
   /work/LAS/mhufford-lab/arnstrm/Canu_1.8/genetic_maps/B73Ab10/B73Ab10_index/B73Ab10 \
   B73Ab10_merged.fasta \
   B73Ab10_merged.txt B73Ab10

No stdout, but few files are generated by this script.

B73Ab10_mapped.sam
mapping_stats.txt
B73Ab10_mapped.bam
B73Ab10_sorted.bam
B73Ab10_sorted-q30.bam
B73Ab10_sorted_noMismatch.bam
B73Ab10_part1.txt
B73Ab10_part2.txt
B73Ab10_IBM-mapped.csv

The file B73Ab10_IBM-mapped.csv is needed for the ALLMAPS step later, but we will renamed this file:

mv B73Ab10_IBM-mapped.csv ibm-markers.csv

Process PanGenome anchor markers and generate mapped marker file

First we need to prepare PG marker file (once for all NAM lines). File is located in CyVerse Data Commons that is accessible via iRods

icd /iplant/home/shared/panzea/genotypes/GBS/v27
iget Lu_2015_NatCommun_panGenomeAnchors20150219.txt.gz

or through direct link:

https://datacommons.cyverse.org/browse/iplant/home/shared/panzea/genotypes/GBS/v27/Lu_2015_NatCommun_panGenomeAnchors20150219.txt.gz
# wget/curl will not work

Once downloaded process the file downloaded as follows:

for f in {1..10}; do
 awk -v x=$f '$3==x && $7==0 {print $5"\tpg_"NR"\t"$8}' Lu_2015_NatCommun_panGenomeAnchors20150219.txt;
done > pb_anchors.txt
# bed file to extract sequence
for f in {1..10}; do
 awk -v x=$f '$3==x && $7==0 {print $5"\t"$6-50"\t"$6+50"\tpg_"NR"\t.\t+"}' Lu_2015_NatCommun_panGenomeAnchors20150219.txt;
done > pb_anchors.bed
# genome
wget ftp://ftp.ensemblgenomes.org/pub/plants/release-22/fasta/zea_mays/dna/Zea_mays.AGPv3.22.dna.genome.fa.gz
gunzip Zea_mays.AGPv3.22.dna.genome.fa.gz
# marker sequence
ml bedtools2
awk '$2>0' pb_anchors.bed > temp
mv temp pb_anchors.bed
bedtools getfasta -fi Zea_mays.AGPv3.22.dna.genome.fa -fo pb_anchors.fasta -bed pb_anchors.bed  -name

B73Ab10 example run:

Step 3b: Map pangenome markers

gg3_MapMarkers.sh \
   /work/LAS/mhufford-lab/arnstrm/Canu_1.8/genetic_maps/B73Ab10/B73Ab10_index/B73Ab10 \
    pb_anchors.fasta \
    pb_anchors.txt \
    B73Ab10

same set of files are created as before. But you will only need B73Ab10_IBM-mapped.csv. Rename this file

B73Ab10_IBM-mapped.csv B73Ab10_pangenome.csv

Step 4: Clean markers/ Remove hets

gg4_clean-markers.sh B73Ab10_pangenome.csv

this will create pangeome.csv file.

Only if you find heterozygous scaffolds in your genome (based on optical map), you can construct a file with just the names of scaffolds that are heterozygous and remove them form the marker file. This will prevent them from including those scaffolds in the AGP.

remove-hets.sh hets-to-remove.txt

Step 5: Run ALLMAPS to create AGP file and Pseudomolecules

Run ALLMAPS

singularity exec --bind $PWD /work/LAS/mhufford-lab/arnstrm/Canu_1.8/genetic_maps/jcvi.simg \
     /work/LAS/mhufford-lab/arnstrm/Canu_1.8/genetic_maps/98_runALLMAPS.sh \
     pangenome.csv \
     ibm-markers.csv \
     B73Ab10.scaffolds.fasta

This process was repeated for (A) ONT only assembly, (B) PacBio only assembly, and (C) Hybrid assembly scaffolds separately.