You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I run about 7,000+ jobs which use downloadcmd in parallel (therefore timing of downloadcmd is unpredictable ), I think they are colliding on either ~/NDA/nda-tools/downloadcmd/packages/MYPACKAGEID/.download-progress/download-job-manifest.csv or ~/NDA/nda-tools/downloadcmd/packages/MYPACKAGEID/.download-progress/SOME-SPECIAL-HASH/download-progress-report.csv. This causes the majority of downloads to fail with the below Error and therefore my downstream data conversions fail.
Suggestions
Perhaps downloadcmd could provide an option to ignore using the download-job-manifest.csv or the download-progress-report.csv, or both? I have found myself needing to purge these files anyway in order to retry my failed conversion jobs that include the downloadcmd in the pipeline.
Alternatively, could the downloadcmd routine have a way to keep only temporary download-job-manifest.csv and download-progress-report.csv files?
Keep a lock on the CSV files while they are being written to, and check the lock is not active before trying to write. Then send back a message to the user if they hit the lock.
Error
Running NDATools Version 0.2.26
Traceback (most recent call last):
File "/home/earlea/.local/bin/downloadcmd", line 8, in<module>sys.exit(main())
File "/home/earlea/.local/lib/python3.10/site-packages/NDATools/clientscripts/downloadcmd.py", line 185, in main
s3Download = Download(config, args)
File "/home/earlea/.local/lib/python3.10/site-packages/NDATools/Download.py", line 173, in __init__
self.download_progress_report_file_path = self.initialize_verification_files()
File "/home/earlea/.local/lib/python3.10/site-packages/NDATools/Download.py", line 685, in initialize_verification_files
job_record = self.find_matching_download_job(download_job_manifest_path)
File "/home/earlea/.local/lib/python3.10/site-packages/NDATools/Download.py", line 650, in find_matching_download_job
if is_job_match(job):
File "/home/earlea/.local/lib/python3.10/site-packages/NDATools/Download.py", line 644, in is_job_match
return all(map(test_match, must_match))
File "/home/earlea/.local/lib/python3.10/site-packages/NDATools/Download.py", line 632, in test_match
val1 = Utils.convert_to_abs_path(val1)
File "/home/earlea/.local/lib/python3.10/site-packages/NDATools/Utils.py", line 231, in convert_to_abs_path
return os.path.abspath(os.path.expanduser(os.path.expandvars(file_name)))
File "/usr/local/Anaconda/envs/py3.10/lib/python3.10/posixpath.py", line 287, in expandvars
path = os.fspath(path)
TypeError: expected str, bytes or os.PathLike object, not NoneType
The text was updated successfully, but these errors were encountered:
Description
When I run about 7,000+ jobs which use
downloadcmd
in parallel (therefore timing ofdownloadcmd
is unpredictable ), I think they are colliding on either~/NDA/nda-tools/downloadcmd/packages/MYPACKAGEID/.download-progress/download-job-manifest.csv
or~/NDA/nda-tools/downloadcmd/packages/MYPACKAGEID/.download-progress/SOME-SPECIAL-HASH/download-progress-report.csv
. This causes the majority of downloads to fail with the below Error and therefore my downstream data conversions fail.Suggestions
downloadcmd
could provide an option to ignore using thedownload-job-manifest.csv
or thedownload-progress-report.csv
, or both? I have found myself needing to purge these files anyway in order to retry my failed conversion jobs that include thedownloadcmd
in the pipeline.downloadcmd
routine have a way to keep only temporarydownload-job-manifest.csv
anddownload-progress-report.csv
files?Error
The text was updated successfully, but these errors were encountered: