Failing during checking assembly? #224

Thomieh73 · 2024-06-13T08:57:15Z

Hi,
I am trying to download viral genomes from genbank. I have put the commands inside a nextflow script and it runs on the login node of our HPC cluster.

The command inside the nextflow script that I use is this:

ncbi-genome-download --formats fasta --section genbank viral --parallel 4 --flat-output -r 5 -P -o genbank_genomes

What I encounter is that your tool is first doing the checking of the assemblies and that finishes when I do it for the refseq genomes. But for the genbank genomes it is a lot more genomes that need to be checked.

Checking assemblies:  23%|██▎       | 43268/187089 [1:34:09<4:41:54,  8.50entries/s](ncbidown33)

There is two things I notice.

the connection to NCBI is failing. But this is a problem you know, which you have addressed by making it possible to set the number of retries. I have that currently at 5 times.
The whole job is killed on my login node, before the number of retries reaches 5. I notice that it has restarted the process several times because it somehow stalls. I get the exitcode 143. Which usually means the process gets killed externally. The job stops at different places, it can be early, but it can also be after having checked 99% of the assemblies.

So I wonder what to do here?
I have contacted the admin of the HPC I am using to see if they have an idea.

Would it help to have more parallel processes? Are those used in the checking step? But I might eat up more cpus on the login node of our cluster. I can not use the compute nodes, since they have no access to the internet.

Or would there be another way of breaking up the checking of assemblies, so that I can make batches which are smaller and will finish.
I know the cache file that is created contains the ftp location of the genome. By grabbing that I can download all genomes, but why would I then use this tool.

Any suggestions you might have are welcome

The text was updated successfully, but these errors were encountered:

kblin · 2024-07-03T07:00:56Z

Phew, good question, I've never had things fail this way. The NCBI connection falling over happens more than I'd like, but the script just getting stuck I don't think I've seen before.

Would it make sense to try and run the download on a "not the HPC machine" and then copy the files over locally?

kblin added the question label Jul 3, 2024

Laura-Alex mentioned this issue Nov 19, 2024

ncbi-genome-download works with refseq for me, but not genbank #234

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing during checking assembly? #224

Failing during checking assembly? #224

Thomieh73 commented Jun 13, 2024 •

edited

Loading

kblin commented Jul 3, 2024

Failing during checking assembly? #224

Failing during checking assembly? #224

Comments

Thomieh73 commented Jun 13, 2024 • edited Loading

kblin commented Jul 3, 2024

Thomieh73 commented Jun 13, 2024 •

edited

Loading