Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with Suppressed Assemblies in Kraken2 Database Build (GCF_045946455.1) #902

Open
dejenie21 opened this issue Jan 9, 2025 · 1 comment

Comments

@dejenie21
Copy link

I encountered an issue while building a custom Kraken2 database due to a suppressed assembly in NCBI's database.
Specifically, the error occurs during the rsync file transfer step when attempting to download the file:
GCF_045946455.1_ASM4594645v1_genomic.fna.gz
Error Message:
rsync: link_stat "/all/GCF/045/946/455/GCF_045946455.1_ASM4594645v1/GCF_045946455.1_ASM4594645v1_genomic.fna.gz" (in genomes) failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1865) [generator=3.2.7]

Upon investigation, I found that this assembly (Pseudomonas aeruginosa strain PAE100) is suppressed in NCBI due to quality concerns. Both the RefSeq (GCF_045946455.1) and GenBank (GCA_045946455.1) versions are marked as suppressed and unavailable for download.

Would it be helpful if Kraken2's database-building process could automatically detect and handle suppressed assemblies by skipping them or providing a clear warning with actionable steps? This would prevent such errors from halting the workflow unexpectedly.

Questions for the Experts:
How can I handle suppressed assemblies like this during the Kraken2 database-building process?
Is there a way to configure Kraken2 to automatically skip suppressed or unavailable assemblies without halting the workflow?
What is the best practice for identifying and managing suppressed entries when building custom Kraken2 databases?
I would appreciate any guidance or recommendations on how to address this issue effectively.

Thank you for your time and support!

@dejenie21
Copy link
Author

Update on Progress:
The database build process has been completed successfully. Although the suppressed assembly GCF_045946455.1_ASM4594645v1 caused an rsync error and was skipped, the rest of the sequences were processed without issues. A total of 22,107 sequences (5.96 Gbp) were successfully included in the database. The final construction completed in approximately 24 minutes and 45 seconds. Can I proceed by ignoring the suppressed one.?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant