-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enaGroupGet doesn't throw an exception when it fails to download one accession #42
Comments
Sample and project accessions will work wtb enaGroupGet, run and experiment accessions will work with enaDataGet. For the FASTQ file issue. Are you using Python 2 or Python 3? If the former, please try switching to Python 2. There is a known issue with Python 2 that can intermittently cause this problem. |
I will however look into the error catching and throwing. It might be bypassing something |
I have some more info - I got a notification from the cluster admin that I am out of disk space. So maybe the file is trying to be written but cannot because there is no disk space available. |
Also I am using Python 3. Am pretty sure now that it's related to availability of space on the cluster. We have a SLURM cluster and I am pretty sure what is happening is that enaGroupGet thinks it's writing to file, when the cluster just silently denies any actual writing. Also the dual call to enaDataGet and enaGroupGet is because I don't know in advance if an accession features a single run or whether it is an experiment accession that features multiple runs. |
You don't need enaGroupGet for an experiment accession. If you tried it, it wouldn't work. An experiment is still classed as one object and therefore will get all of the runs associated with it. The group option is for samples and projects that can have multiple experiments. Thanks for the update on the problem. I will see if there is a way to check for disk space or the type of error that comes back when it is full |
Yes, that's exactly why I am using both! Consider the following two accessions: |
Both of those are sample accessions and therefore can only be used with enaGroupGet regardless of how many runs or experiments they contain. enaDataGet is for experiment (ERX, DRX, SRX) and run (ERR, DRR, SRR) accessions enaGroupGet is for sample (ERS, DRS, SRS, SAME, SAMN, SAMD) and project (ERP, DRP, SRP, PRJE, PRJD, PRJN) accessions enaDataGet knows what data type you want to download based on the accession. enaGroupGet needs you to tell it which type of data (read data, sequences, assemblies) should be downloaded for that sample/project. |
Thanks for the explanation! For some reason I thought I had gotten an error previously from enaGroupGet for the first sample in the provided list but now that I tested it again, it works just fine. |
I am still having issues with this. The problem lies in the fact that in my hands, enaGroupGet will sometimes not download both .fastq.gz files from a paired end run and between that and the fact that some accessions are represented by a single read file, like e.g. ERS010304, it's impossible to accommodate for both cases. Is there a dry-run option? With that I could do the error checking myself. Or does enaGroupGet do any checksum testing? I wouldn't mind fronting a pull-request for that but as someone unfamiliar with the code, this would ofc take a hot minute. |
Yes it does checksum checking. I'll look into if any more checking can be added that might help. If a file fails to download, it should inform you of that. I'll look into this further too. And what use a dry run would be, though download problems would usually be in the part that can't be dry run. Unfortunately I haven't had time to look at this code for a few months. I should be able to pick it up again in Feb |
This seems like a related problem so I'll add to this thread. Sometimes it only downloads one of the 2 files in a run but still exit with an exit code of 0. See below message from a enaDataGet run:
It seems that this is because I think a potentially better solution is to have |
I have been running into trouble in a NextFlow pipeline I wrote. After some investigation, i found out that for some of the samples only one of the fastq files was downloaded but the process still finished without an error. The relevant part of my bash script is below:
I am using both because some accessions have only one sub-accession and some have plenty.
The text was updated successfully, but these errors were encountered: