delay csvdb load tasks by a random offset to avoid database lock issues #396

kevinrue · 2018-02-06T14:41:15Z

This resolved my issues when processing a lot of FASTQ files in parallel.
(There may be better ways to address the issue, though)

sebastian-luna-valero · 2018-02-06T16:42:45Z

Thanks for the input, Kevin. It looks like a good workaround.

Please bear in mind that SQLite is not a proper Relational Database Management System, and actually its performance is dependent on the underlying filesystem.

I tried to fix SQLite operational errors with the following changes but more thought might be required:

Pipeline changes: Code changes for improved portability of pipelines #380
scripts changes: Allow multi-threaded sqlite3 operations cgat#377

Those changes were added to the master in the v0.3.2 release, could you please let me know what version are you running with the following commands:

pip list | grep -i cgat

Best regards,
Sebastian

kevinrue · 2018-02-06T19:06:37Z

Hi Sebastian,
I think I should be up to date. I pulled from the master branch this morning.

$ pip list | grep -i cgat
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
CGAT (0.3.2, /gfs/devel/kralbrecht/cgat)
CGATPipelines (0.3.2, /gfs/devel/kralbrecht/CGATPipelines)
CGATReport (0.7.6.1)

Let me know if there's any command I should run to update the CGAT installation, or something.

kevinrue · 2018-02-06T19:10:15Z

For the record, I was systematically getting the 'database locked' error message for pipeline_readqc.py with 384 FASTQ files. I don't know exactly where this falls in the range of input files that you usually process in a single run.
As I said, the random delay of 1-30 seconds worked in my case, but might need to be increased or even parameterised if the error pops out again for larger amounts of input files.
Best
Kevin

sebastian-luna-valero · 2018-02-08T09:30:43Z

Thanks, Kevin.

We try to solve this by asking ruffus to limit the number of concurrent jobs to 1`:

  @jobs_limit(PARAMS.get("jobs_limit_db", 1), "db")

However, it does not seem to work properly.

We are about to perform a major refactoring on the code, and we'll try to solve this problem then. The idea you propose in this PR is a potential solution. I will leave this open for our reference and I will decide what to do during the code refactoring.

Many thanks!
Sebastian

kevinrue · 2018-02-08T12:12:58Z

Hi @sebastian-luna-valero
Yes I saw the jobs_limit instruction in the code, and it did puzzle me as to why it does not seem to work. However, I was short on time and instead of investigating the issue, implemented the quick fix offered in this PR. Obviously, the ideal fix here would be to somehow have this jobs_limit instruction work as expected.

I 100% agree that this PR is not the ideal implementation, and I would not take it personally if it is closed without being merged. Happy to see it used as a reference.
As a user, I'll say that I'm happy enough to have this random delay even increased to a minute or two if needed, which would still represent a fraction of the total pipeline run, in compensation for reducing the chance of database block virtually to nil.

Thanks for your work on the refactoring, I look forward to the result!

…l is not specified (#399) * force --local when drmaa cannot be imported * fix syntax error

#401)

* Parameterise job_memory in pipeline_genesets.py The pipeline was crashing on our cluster because jobs were exceeding the amount of memory that they had requested. To fix, the amount of memory for all cluster jobs is now specificed in the pipeline.ini for all jobs - by default 4G for standard tasks and 20G for "highmemory" tasks (inspection with top showed one task to be using 15G!). Pipeline now completes successfully for ensembl version 91 for both mouse and human annotations. * tweak memory settings. * pep8

… pipeline and not annotations (#405)

* bug fixes for pipeline report * have added option to try and force rendering of html when code chunk fails if there is no nh output * readded the nh to site for r * I have updated the resport for jupyter so it no longer outputs the figures and now displays inline with the notebook * updated report so it doesnt output pdf reports * updated picard report so it now doesnt output pdf report

I assume that this code was unintentionally removed by a previous reversion of a commit (83e6dea). It is absolutely critical for debugging failed tasks on our cluster!!

* Increase memory requirements for loading genesets. Loading hg38_ensembl91 (after running gtf2csv with the -f option) needs at 13G. * Fix memory requirements for GenomicContext jobs

* Conda 4.4 breaks everything again * bugfix * bugfix

* stipped -> stripped * stipped -> stripped (2)

* increase priority of /ifs/home/sebastian/.cgat ini file * bugfix

* Copy environment to subprocess If cgat is installed in a venv, the location of the "cgat" wrapper script is added to PATH when the venv is sourced. venvs are typically sourced in the user .bashrc. .bashrc is not evaluated by non-interactive shells. The shell opened by subprocess with "shell=True" is non-interactive. Thus, unless the "cgat" command is on some path not set in .bashrc, execution of jobs locally (i.e. non cluster) that use the cgat wrapper (e.g. cgat csv2db) will fail for venvs. This fixes the issue by copying the environment path to the subprocess shell. * add BASH_ENV=.bashrc to local jobs to be consistent with Cluster.py * Also fix subprocess environment issue in Control.py * set BASH_ENV

delay csvdb load tasks by a random offset to avoid database lock issues

3cfa5ad

fix import of PipelineExomeAncestry (#397)

068f8a0

Adam Cribbs and others added 22 commits February 9, 2018 10:02

Small change to avoid confuision when running featurecounts (#398)

70eea4b

Prevent pipelines from running when DRMAA is not available and --loca…

496e905

…l is not specified (#399) * force --local when drmaa cannot be imported * fix syntax error

pin conda version

1f2d5e4

pin sphinx 1.6.7 to work with cgat-report 0.7.6.1

2a25250

make threads=1 instead of threads=10 for build_report in pipeline.ini (

112a020

#401)

fixes #388 (#403)

2267c0d

Have updated PARAMS for template so it now points to the new genesets…

ea12c41

… pipeline and not annotations (#405)

Add back explicit reporting of failed cluster jobs (#406)

48eba7d

I assume that this code was unintentionally removed by a previous reversion of a commit (83e6dea). It is absolutely critical for debugging failed tasks on our cluster!!

Sns fix loading of gtf annotations (#408)

d0f6d22

* Increase memory requirements for loading genesets. Loading hg38_ensembl91 (after running gtf2csv with the -f option) needs at 13G. * Fix memory requirements for GenomicContext jobs

Don't use conda 4.4 (#409)

702e832

* Conda 4.4 breaks everything again * bugfix * bugfix

increase priority of /ifs/home/sebastian/.cgat ini file (#412)

41b0158

Typo (#413)

8d21dd0

* stipped -> stripped * stipped -> stripped (2)

Bugfix #412 (#414)

3c94ae4

* increase priority of /ifs/home/sebastian/.cgat ini file * bugfix

Update PipelineRrbs.py

04cdcba

updates documentation for pipeline_rrbs

1c82403

removed duplicate function (#416)

deba85c

delay csvdb load tasks by a random offset to avoid database lock issues

50d0fe0

Control scatter delay by pipeline ini

4d7f447

merge conflict

b5c39e5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

delay csvdb load tasks by a random offset to avoid database lock issues #396

delay csvdb load tasks by a random offset to avoid database lock issues #396

kevinrue commented Feb 6, 2018

sebastian-luna-valero commented Feb 6, 2018

kevinrue commented Feb 6, 2018

kevinrue commented Feb 6, 2018

sebastian-luna-valero commented Feb 8, 2018

kevinrue commented Feb 8, 2018

delay csvdb load tasks by a random offset to avoid database lock issues #396

Are you sure you want to change the base?

delay csvdb load tasks by a random offset to avoid database lock issues #396

Conversation

kevinrue commented Feb 6, 2018

sebastian-luna-valero commented Feb 6, 2018

kevinrue commented Feb 6, 2018

kevinrue commented Feb 6, 2018

sebastian-luna-valero commented Feb 8, 2018

kevinrue commented Feb 8, 2018