Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delay csvdb load tasks by a random offset to avoid database lock issues #396

Open
wants to merge 24 commits into
base: master
Choose a base branch
from
Open

Conversation

kevinrue
Copy link
Contributor

@kevinrue kevinrue commented Feb 6, 2018

This resolved my issues when processing a lot of FASTQ files in parallel.
(There may be better ways to address the issue, though)

@sebastian-luna-valero
Copy link
Member

Thanks for the input, Kevin. It looks like a good workaround.

Please bear in mind that SQLite is not a proper Relational Database Management System, and actually its performance is dependent on the underlying filesystem.

I tried to fix SQLite operational errors with the following changes but more thought might be required:

Those changes were added to the master in the v0.3.2 release, could you please let me know what version are you running with the following commands:

pip list | grep -i cgat

Best regards,
Sebastian

@kevinrue
Copy link
Contributor Author

kevinrue commented Feb 6, 2018

Hi Sebastian,
I think I should be up to date. I pulled from the master branch this morning.

$ pip list | grep -i cgat
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
CGAT (0.3.2, /gfs/devel/kralbrecht/cgat)
CGATPipelines (0.3.2, /gfs/devel/kralbrecht/CGATPipelines)
CGATReport (0.7.6.1)

Let me know if there's any command I should run to update the CGAT installation, or something.

@kevinrue
Copy link
Contributor Author

kevinrue commented Feb 6, 2018

For the record, I was systematically getting the 'database locked' error message for pipeline_readqc.py with 384 FASTQ files. I don't know exactly where this falls in the range of input files that you usually process in a single run.
As I said, the random delay of 1-30 seconds worked in my case, but might need to be increased or even parameterised if the error pops out again for larger amounts of input files.
Best
Kevin

@sebastian-luna-valero
Copy link
Member

Thanks, Kevin.

We try to solve this by asking ruffus to limit the number of concurrent jobs to 1`:

  @jobs_limit(PARAMS.get("jobs_limit_db", 1), "db") 

However, it does not seem to work properly.

We are about to perform a major refactoring on the code, and we'll try to solve this problem then. The idea you propose in this PR is a potential solution. I will leave this open for our reference and I will decide what to do during the code refactoring.

Many thanks!
Sebastian

@kevinrue
Copy link
Contributor Author

kevinrue commented Feb 8, 2018

Hi @sebastian-luna-valero
Yes I saw the jobs_limit instruction in the code, and it did puzzle me as to why it does not seem to work. However, I was short on time and instead of investigating the issue, implemented the quick fix offered in this PR. Obviously, the ideal fix here would be to somehow have this jobs_limit instruction work as expected.

I 100% agree that this PR is not the ideal implementation, and I would not take it personally if it is closed without being merged. Happy to see it used as a reference.
As a user, I'll say that I'm happy enough to have this random delay even increased to a minute or two if needed, which would still represent a fraction of the total pipeline run, in compensation for reducing the chance of database block virtually to nil.

Thanks for your work on the refactoring, I look forward to the result!

Adam Cribbs and others added 22 commits February 9, 2018 10:02
…l is not specified (#399)

*  force --local when drmaa cannot be imported

*  fix syntax error
* Parameterise job_memory in pipeline_genesets.py

The pipeline was crashing on our cluster because jobs were exceeding the amount of memory that they had requested.

To fix, the amount of memory for all cluster jobs is now specificed in the pipeline.ini for all jobs - by default 4G for standard tasks and 20G for "highmemory" tasks (inspection with top showed one task to be using 15G!).

Pipeline now completes successfully for ensembl version 91 for both mouse and human annotations.

* tweak memory settings.

*  pep8
* bug fixes for pipeline report

* have added option to try and force rendering of html when code chunk fails if there is no nh output

* readded the nh to site for r

* I have updated the resport for jupyter so it no longer outputs the figures and now displays inline with the notebook

* updated report so it doesnt output pdf reports

* updated picard report so it now doesnt output pdf report
I assume that this code was unintentionally removed by a previous reversion of a commit (83e6dea).

It is absolutely critical for debugging failed tasks on our cluster!!
* Increase memory requirements for loading genesets.

Loading hg38_ensembl91 (after running gtf2csv with the -f option) needs at 13G.

* Fix memory requirements for GenomicContext jobs
*  Conda 4.4 breaks everything again

*  bugfix

*  bugfix
* stipped -> stripped

* stipped -> stripped (2)
*  increase priority of /ifs/home/sebastian/.cgat ini file

*  bugfix
* Copy environment to subprocess

If cgat is installed in a venv, the location of the "cgat" wrapper script
is added to PATH when the venv is sourced.

venvs are typically sourced in the user .bashrc.

.bashrc is not evaluated by non-interactive shells.

The shell opened by subprocess with "shell=True" is non-interactive.

Thus, unless the "cgat" command is on some path not set in .bashrc, execution of
jobs locally (i.e. non cluster) that use the cgat wrapper (e.g. cgat csv2db)
will fail for venvs.

This fixes the issue by copying the environment path to the subprocess
shell.

*  add BASH_ENV=.bashrc to local jobs to be consistent with Cluster.py

* Also fix subprocess environment issue in Control.py

* set BASH_ENV
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants