You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I encountered an issue while building a custom Kraken2 database due to a suppressed assembly in NCBI's database.
Specifically, the error occurs during the rsync file transfer step when attempting to download the file:
GCF_045946455.1_ASM4594645v1_genomic.fna.gz
Error Message:
rsync: link_stat "/all/GCF/045/946/455/GCF_045946455.1_ASM4594645v1/GCF_045946455.1_ASM4594645v1_genomic.fna.gz" (in genomes) failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1865) [generator=3.2.7]
Upon investigation, I found that this assembly (Pseudomonas aeruginosa strain PAE100) is suppressed in NCBI due to quality concerns. Both the RefSeq (GCF_045946455.1) and GenBank (GCA_045946455.1) versions are marked as suppressed and unavailable for download.
Would it be helpful if Kraken2's database-building process could automatically detect and handle suppressed assemblies by skipping them or providing a clear warning with actionable steps? This would prevent such errors from halting the workflow unexpectedly.
Questions for the Experts:
How can I handle suppressed assemblies like this during the Kraken2 database-building process?
Is there a way to configure Kraken2 to automatically skip suppressed or unavailable assemblies without halting the workflow?
What is the best practice for identifying and managing suppressed entries when building custom Kraken2 databases?
I would appreciate any guidance or recommendations on how to address this issue effectively.
Thank you for your time and support!
The text was updated successfully, but these errors were encountered:
Update on Progress:
The database build process has been completed successfully. Although the suppressed assembly GCF_045946455.1_ASM4594645v1 caused an rsync error and was skipped, the rest of the sequences were processed without issues. A total of 22,107 sequences (5.96 Gbp) were successfully included in the database. The final construction completed in approximately 24 minutes and 45 seconds. Can I proceed by ignoring the suppressed one.?
I encountered an issue while building a custom Kraken2 database due to a suppressed assembly in NCBI's database.
Specifically, the error occurs during the rsync file transfer step when attempting to download the file:
GCF_045946455.1_ASM4594645v1_genomic.fna.gz
Error Message:
rsync: link_stat "/all/GCF/045/946/455/GCF_045946455.1_ASM4594645v1/GCF_045946455.1_ASM4594645v1_genomic.fna.gz" (in genomes) failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1865) [generator=3.2.7]
Upon investigation, I found that this assembly (Pseudomonas aeruginosa strain PAE100) is suppressed in NCBI due to quality concerns. Both the RefSeq (GCF_045946455.1) and GenBank (GCA_045946455.1) versions are marked as suppressed and unavailable for download.
Would it be helpful if Kraken2's database-building process could automatically detect and handle suppressed assemblies by skipping them or providing a clear warning with actionable steps? This would prevent such errors from halting the workflow unexpectedly.
Questions for the Experts:
How can I handle suppressed assemblies like this during the Kraken2 database-building process?
Is there a way to configure Kraken2 to automatically skip suppressed or unavailable assemblies without halting the workflow?
What is the best practice for identifying and managing suppressed entries when building custom Kraken2 databases?
I would appreciate any guidance or recommendations on how to address this issue effectively.
Thank you for your time and support!
The text was updated successfully, but these errors were encountered: