Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR syri:201 - Length of query sequence of Chr1 is less than the maximum coordinate of its aligned regions #251

Open
Sabrili opened this issue May 13, 2024 · 7 comments

Comments

@Sabrili
Copy link

Sabrili commented May 13, 2024

Hi @mnshgl0110 and everyone :)

I'm sorry if this is a really obvious question, I'm new to programming and haven't used Syri before.

I am getting this error code when I try to run Syri:
"Running SyRI - ERROR - syri:201 - Length of query sequence of Chr1 is less than the maximum
coordinate of its aligned regions. Exiting."

I don't really understand what has triggered this error or how to fix it.

Here is the code I'm using which gives the error:

ln -sf Genome1.fa refgenome
ln -sf Genome2.fa qrygenome
nucmer --maxmatch -c 100 -b 500 -l 50 refgenome qrygenome
delta-filter -m -i 90 -l 100 out.delta > out.filtered.delta
show-coords -THrd out.filtered.delta > out.filtered.coords
syri -c out.filtered.coords -d out.filtered.delta -r refgenome -q qrygenome

Genome1 and Genome2 have the same number of chromosomes and the chromosome names are the same across the two genomes, however the chromosomes in Genome2 are much longer than Genome1 due to a high number of repeat elements.

As far as I can understand I think I have formatted the fasta files correctly?

Additionally I've noticed that the input TSV file shows alignments between the wrong chromosomes e.g. Chr1 of the reference and Chr4 of the query, could this be contributing to this error?
For example:
image

Any advice on understanding this error and how to fix it would be really appreciated :)

Thank you so much!

@mnshgl0110
Copy link
Member

mnshgl0110 commented May 14, 2024

Hi @Sabrili. The commands look correct so this error should not happen. Can you please retry re-running the pipeline in a new empty folder?
If that does not solve the problem, then please check that the chromosomes named Chr1 in reference and the query are indeed homologous? To check that, you can try using fixchr to generate a dotplot of the alignments.

@Sabrili
Copy link
Author

Sabrili commented May 19, 2024

Hi @mnshgl0110
Thank you so much for your quick answer!
I've tried what you have suggested above, but had no success unfortunately :(
I wasn't able to figure out how to use fixchr, but I did use another tool to create a dotplot, and the two genomes are homologous, but probably don't show the same level of syteny that you'd usually expect because one genome is much larger than the other genome. Is this what could be causing the problem?
thumbnail_map_final_renamed_chromAssemEDITED_Syri_to_Osativa_323_v7 0EDITED_Syri

When I separate the genomes into individual chromosomes, Syri works perfectly, but I was hoping to compare the whole genomes to see if there were any translocations across chromosomes.
Chr1

@mnshgl0110
Copy link
Member

The error message implies that the alignment coordinates in the out.filtered.coords are inconsistent with the query genome fasta file. For ex: if the Chr1 is 10000bp long in the query genome fasta, but the .coords file has an alignment going till Chr1:10010, then syri would report this error.
Can you check the output of the following commands:

# Max alignment coordinate for query Chr1
awk '{if($11=="Chr1"){if($3>max){max=$3}; if($4>max){max=$4}}} END {print max}' out_m_i90_l100.coords

# Length of Chr1 in the genome fasta
samtools faidx qrygenome
cat qrygenome.fai

The max alignment coordinate should be less than or equal to the chromosome length in the fasta. If that is not the case, then there is some issue with how the alignments are generated.

@Ruba-2324
Copy link

Error message: Running SyRI - ERROR - Length of query sequence of Chr1 is less than the maximum coordinate of its aligned regions. Exiting.
I have attached my SAM file which includes the chromosome length in the header :
@sq SN:Chr1 LN:43614107 @sq SN:Chr2 LN:37756626 @sq SN:Chr3 LN:39466563 @sq SN:Chr4 LN:36959593 @sq SN:Chr5 LN:31463784 @sq SN:Chr6 LN:31699651 @sq SN:Chr7 LN:30471016 @sq SN:Chr8 LN:31292496 @sq SN:Chr9 LN:25369104 @sq SN:Chr10 LN:25570769 @sq SN:Chr11 LN:33856246 @sq SN:Chr12 LN:27287347 @pg ID:minimap2 PN:minimap2 VN:2.24-r1122 Chr1 0 Chr1 12483338 Chr1 2048 Chr1 1050797 Chr1 2048 Chr1 17582251 Chr1 2048 Chr1 29667519 Chr1 2048 Chr1 31825734 Chr1 2048 Chr1 41013379 Chr1 2048 Chr1 11021685 Chr1 2048 Chr1 6146468 Chr1 2048 Chr1 33587074 Chr1 2048 Chr1 34984509 Chr1 2048 Chr1 25184829 Chr1 2048 Chr1 39203887 Chr1 2048 Chr1 4608003 Chr1 2048 Chr1 36384390 Chr1 2048 Chr1 9561042 Chr1 2048 Chr1 27434022 Chr1 2048 Chr1 22725299 Chr1 2048 Chr1 21768438 Chr1 2048 Chr1 37490408 Chr1 2048 Chr1 26637579 Chr1 2048 Chr1 20240611 Chr1 2048 Chr1 21027974 Chr1 2048 Chr1 8223493 Chr1 2048 Chr1 28434494Chr1 2048 Chr1 23724675 Chr1 2048 Chr1 7610541 Chr1 2048 Chr1 24578476 Chr1 2048 Chr1 40433425 Chr1 2048 Chr1 1 Chr1 2048 Chr1 29142435 Chr1 2048 Chr1 535508 Chr1 2048 Chr1 38759742 Chr1 2048 Chr1 42718430 Chr1 2048 Chr1 43096884 Chr1 2048 Chr1 38426877 Chr1 2048 Chr1 9052350 Chr1 2048 Chr1 19984171 Chr1 2048 Chr1 5874976 Chr1 2048 Chr1 10638988 Chr1 2048 Chr1 31609734 Chr1 2048 Chr1 43468799 Chr1 2048 Chr1 24434015 Chr1 2048 Chr1 10877449 Chr1 2048 Chr1 9382765 Chr1 2048 Chr1 8932732 Chr1 2048 Chr1 26507102 Chr1 2048 Chr1 5796670 these are first four columns of the sam input file that im giving to syri andgetting the same error.
I attached my 1st four column of sorted bam file :Chr1 2048 Chr1 1
Chr1 2048 Chr1 535508
Chr1 2048 Chr1 1050797
Chr1 2048 Chr1 4608003
Chr1 2048 Chr1 5796670
Chr1 2048 Chr1 5874976
Chr1 2048 Chr1 6146468
Chr1 2048 Chr1 7610541
Chr1 2048 Chr1 8223493
Chr1 2048 Chr1 8932732
Chr1 2048 Chr1 9052350
Chr1 2048 Chr1 9382765
Chr1 2048 Chr1 9561042
Chr1 2048 Chr1 10638988
Chr1 2048 Chr1 10877449
Chr1 2048 Chr1 11021685
Chr1 0 Chr1 12483338
Chr1 2048 Chr1 17582251
Chr1 2048 Chr1 19984171
Chr1 2048 Chr1 20240611
Chr1 2048 Chr1 21027974
Chr1 2048 Chr1 21768438
Chr1 2048 Chr1 22725299
Chr1 2048 Chr1 23724675
Chr1 2048 Chr1 24434015
Chr1 2048 Chr1 24578476
Chr1 2048 Chr1 25184829
Chr1 2048 Chr1 26507102
Chr1 2048 Chr1 26637579
Chr1 2048 Chr1 27434022
Chr1 2048 Chr1 28434494
Chr1 2048 Chr1 29142435
Chr1 2048 Chr1 29667519
Chr1 2048 Chr1 31609734
Chr1 2048 Chr1 31825734
Chr1 2048 Chr1 33587074
Chr1 2048 Chr1 34984509
Chr1 2048 Chr1 36384390
Chr1 2048 Chr1 37490408
Chr1 2048 Chr1 38426877
Chr1 2048 Chr1 38759742
Chr1 2048 Chr1 39203887
Chr1 2048 Chr1 40433425
Chr1 2048 Chr1 41013379
Chr1 2048 Chr1 42718430
Chr1 2048 Chr1 43096884
Chr1 2048 Chr1 43468799

I tried both the sam and bam file. The length is greater than the alignment reads but still it showing the same error and I tried by extracting single chromosome still the error exists. Thanks in advance!

@mnshgl0110
Copy link
Member

The error message corresponds to Chr1 in the query genome. In the SAM, the chromosome lengths corresponds to reference genome. Try converting your SAM to PAF and then check the alignment coordinates and chromosome lengths. If everything looks ok, then you can try syri with that file.

@Ruba-2324
Copy link

Depending on your response i could understand that you ask me to use PAF format file, but syri will accept the PAF format?

@Sabrili
Copy link
Author

Sabrili commented Jul 28, 2024

Hi all,

Apologies for my late update, just wanted to say I managed to get the alignment working using minimap2 instead of nucmer. Syri worked perfectly. Thanks everyone!

minimap2 -ax asm5 -t 24 --eqx refgenome qrygenome > out.sam
module load samtools/1.13-gcc-10.3.0
samtools view -b out.sam > out.bam
syri -c out.bam -r refgenome -q qrygenome -F B --prefix out--nc 24 --log INFO
plotsr --sr syri.out --genomes genomes.txt -o out.png

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants