-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
delay csvdb load tasks by a random offset to avoid database lock issues #396
base: master
Are you sure you want to change the base?
Conversation
Thanks for the input, Kevin. It looks like a good workaround. Please bear in mind that I tried to fix SQLite operational errors with the following changes but more thought might be required:
Those changes were added to the master in the
Best regards, |
Hi Sebastian,
Let me know if there's any command I should run to update the CGAT installation, or something. |
For the record, I was systematically getting the 'database locked' error message for |
Thanks, Kevin. We try to solve this by asking
However, it does not seem to work properly. We are about to perform a major refactoring on the code, and we'll try to solve this problem then. The idea you propose in this PR is a potential solution. I will leave this open for our reference and I will decide what to do during the code refactoring. Many thanks! |
Hi @sebastian-luna-valero I 100% agree that this PR is not the ideal implementation, and I would not take it personally if it is closed without being merged. Happy to see it used as a reference. Thanks for your work on the refactoring, I look forward to the result! |
…l is not specified (#399) * force --local when drmaa cannot be imported * fix syntax error
* Parameterise job_memory in pipeline_genesets.py The pipeline was crashing on our cluster because jobs were exceeding the amount of memory that they had requested. To fix, the amount of memory for all cluster jobs is now specificed in the pipeline.ini for all jobs - by default 4G for standard tasks and 20G for "highmemory" tasks (inspection with top showed one task to be using 15G!). Pipeline now completes successfully for ensembl version 91 for both mouse and human annotations. * tweak memory settings. * pep8
… pipeline and not annotations (#405)
* bug fixes for pipeline report * have added option to try and force rendering of html when code chunk fails if there is no nh output * readded the nh to site for r * I have updated the resport for jupyter so it no longer outputs the figures and now displays inline with the notebook * updated report so it doesnt output pdf reports * updated picard report so it now doesnt output pdf report
I assume that this code was unintentionally removed by a previous reversion of a commit (83e6dea). It is absolutely critical for debugging failed tasks on our cluster!!
* Increase memory requirements for loading genesets. Loading hg38_ensembl91 (after running gtf2csv with the -f option) needs at 13G. * Fix memory requirements for GenomicContext jobs
* Conda 4.4 breaks everything again * bugfix * bugfix
* Copy environment to subprocess If cgat is installed in a venv, the location of the "cgat" wrapper script is added to PATH when the venv is sourced. venvs are typically sourced in the user .bashrc. .bashrc is not evaluated by non-interactive shells. The shell opened by subprocess with "shell=True" is non-interactive. Thus, unless the "cgat" command is on some path not set in .bashrc, execution of jobs locally (i.e. non cluster) that use the cgat wrapper (e.g. cgat csv2db) will fail for venvs. This fixes the issue by copying the environment path to the subprocess shell. * add BASH_ENV=.bashrc to local jobs to be consistent with Cluster.py * Also fix subprocess environment issue in Control.py * set BASH_ENV
This resolved my issues when processing a lot of FASTQ files in parallel.
(There may be better ways to address the issue, though)