Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: replace all bbtool wrappers with one bbtool wrapper for all subcommands #1322

Merged
merged 119 commits into from
Nov 14, 2023

Conversation

SilasK
Copy link
Contributor

@SilasK SilasK commented May 2, 2023

Description

I add a generic bbtool wrapper.
It is based on #2

BBtools hase many bash shcripts. This wrapper should allow to run them all.

QC

  • I confirm that:

For all wrappers added by this PR,

  • there is a test case which covers any introduced changes,
  • input: and output: file paths in the resulting rule can be changed arbitrarily,
  • either the wrapper can only use a single core, or the example rule contains a threads: x statement with x being a reasonable default,
  • rule names in the test case are in snake_case and somehow tell what the rule is about or match the tools purpose or name (e.g., map_reads for a step that maps reads),
  • all environment.yaml specifications follow the respective best practices,
  • wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in input: or output:),
  • all fields of the example rules in the Snakefiles and their entries are explained via comments (input:/output:/params: etc.),
  • stderr and/or stdout are logged correctly (log:), depending on the wrapped tool,
  • temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function tempfile.gettempdir() points to (see here; this also means that using any Python tempfile default behavior works),
  • the meta.yaml contains a link to the documentation of the respective tool or command,
  • Snakefiles pass the linting (snakemake --lint),
  • Snakefiles are formatted with snakefmt,
  • Python wrapper scripts are formatted with black.
  • Conda environments use a minimal amount of channels, in recommended ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as conda-forge should have highest priority and defaults channels are usually not needed because most packages are in conda-forge nowadays).

bio/bbtools/generic/wrapper.py Outdated Show resolved Hide resolved
bio/bbtools/generic/wrapper.py Outdated Show resolved Hide resolved
bio/bbtools/generic/wrapper.py Outdated Show resolved Hide resolved
@SilasK SilasK changed the title feat: added bbtool gneric: One wrapper to un them all! feat: added bbtool gneric: One wrapper to run them all! May 2, 2023
@SilasK
Copy link
Contributor Author

SilasK commented Nov 13, 2023

I now extract the mem using a regex.

I also made a major update to the Snakefile test. It now represents a good usage of the wrapper and the bbtools functions.
e.g. previously all parameters were input as a comma-separated string.
With the same rules, it works for paired or single-end reads.

What do you think of my updated workflow?
@fgvieira @dlaehnemann

@fgvieira
Copy link
Collaborator

I think it looks good but, since it will introduce breaking changes, not sure if @johanneskoester want to take a look first.

Copy link
Contributor

@dlaehnemann dlaehnemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks good to me. But I haven't looked at all the wrapper.py code in detail -- @johanneskoester has requested changes anyway, and will thus have to have another look. I'll ping him...

bio/bbtools/meta.yaml Outdated Show resolved Hide resolved
bio/bbtools/meta.yaml Outdated Show resolved Hide resolved
bio/bbtools/meta.yaml Outdated Show resolved Hide resolved
bio/bbtools/meta.yaml Outdated Show resolved Hide resolved
bio/bbtools/meta.yaml Outdated Show resolved Hide resolved
bio/bbtools/meta.yaml Outdated Show resolved Hide resolved
bio/bbtools/meta.yaml Show resolved Hide resolved
bio/bbtools/test/Snakefile Show resolved Hide resolved
bio/bbtools/test/Snakefile Outdated Show resolved Hide resolved
bio/bbtools/wrapper.py Outdated Show resolved Hide resolved
@SilasK
Copy link
Contributor Author

SilasK commented Nov 13, 2023

Now we are almost done!

I had two problems with the linter:

  1. It didn't want me to have functions in the main Snakemake, therefore the import. Even if I think it's less ideal.

  2. It interpreted a ktrim="r" as a path, and didn't accept it. I found a solution, which is less optimal.

@SilasK
Copy link
Contributor Author

SilasK commented Nov 13, 2023

In the wrapper I put the links to other PRs in snakemake that will incorporate these functions. But I guess we should not wait until then to release the next version of this wrapper.

Do you have any final feedback @fgvieira and @dlaehnemann
Thank you in advance.

@dlaehnemann
Copy link
Contributor

Now we are almost done!
Yes, definitely!

I had two problems with the linter:
1. It didn't want me to have functions in the main Snakemake, therefore the import. Even if I think it's less ideal.
Then I guess we'll leave it as is. Maybe add a comment as suggested, if you think this is useful.

2. It interpreted a `ktrim="r"` as a path, and didn't accept it. I found a solution, which is less optimal.

Is that under params? Because I would find that weird, and possibly a linter bug. But if it enforces this for input and output, this makes sense. Then, I would suggest simply specifying this as a param. And if that's what you did, did you try something like either of the following, to make it work with the current linter:

params:
    ktrim="'r'",
    ktrim="'r' ",
    ktrim="r ",

As for the links to PRs, I think it is perfectly fine to simply leave those as TODOs. Maybe make it a bit clearer still, that once these are merged, the respective functions can simply go away and use snakemake internal functionality.

@SilasK
Copy link
Contributor Author

SilasK commented Nov 14, 2023

It looks like a linter bug:
Code
Error

@dlaehnemann
Copy link
Contributor

Ah, thanks for the pointers to the linter. Regarding the (input) function, you could indeed make this an inline lambda expression like this:

rule loglog:
    input:
        expand(
            "reads/raw/{{sample}}_{fraction}.fastq.gz",
            fraction=lambda wc: ["R1", "R2"] if config.get("reads_are_paired", True) else ["se"]
        ),

It has the advantage of being right there in the input of the rule. It has the disadvantage of being slightly less readable than the actual function with newlines. And you'll have to repeat that code wherever you provide such a read fraction specification. It seems like there's no perfect solution with the current linter.

And from what I understand about the ktrim="r", this recognizes that r is a prefix of the input/output path, which starts with reads/.... That's a very short prefix, but it is correct. The linter could probably be amended to ignore such single letter cases or make it more of a suggestion, then a must (not sure if that exists, though). And I think you should get around this by including some random whitespace at the end, so:
ktrim="r "
Then, it shouldn't be a prefix any more...

@SilasK
Copy link
Contributor Author

SilasK commented Nov 14, 2023

The whitespace fixed the linter.

I find the lambda version less clear.

@dlaehnemann
Copy link
Contributor

Agreed, I'll merge! 🎉

@dlaehnemann dlaehnemann merged commit 6eb3c22 into snakemake:master Nov 14, 2023
6 checks passed
@github-actions github-actions bot mentioned this pull request Nov 14, 2023
@SilasK
Copy link
Contributor Author

SilasK commented Nov 14, 2023

Thank you very much for your review and your suggestions!

@dlaehnemann
Copy link
Contributor

Thanks for keeping at it! Getting things right for reproducible and reusable data analyses can be exhausting, especially when you're battling with multiple levels of the workflow / analysis stack and several workflows at the same time.

So take a moment and celebrate! 🚀

@fgvieira
Copy link
Collaborator

Nice work!!

johanneskoester pushed a commit that referenced this pull request Nov 23, 2023
🤖 I have created a release \*beep\* \*boop\*
---
##
[3.0.0](https://www.github.com/snakemake/snakemake-wrappers/compare/v2.13.0...v3.0.0)
(2023-11-23)


### ⚠ BREAKING CHANGES

* replace all bbtool wrappers with one bbtool wrapper for all
subcommands (#1322)

### Features

* allow for specifying multiple chromosomes in ensembl reference dna
sequence download wrapper
([#2376](https://www.github.com/snakemake/snakemake-wrappers/issues/2376))
([c5590f0](https://www.github.com/snakemake/snakemake-wrappers/commit/c5590f03c9990b851181171e70217bb911ddac27))
* replace all bbtool wrappers with one bbtool wrapper for all
subcommands
([#1322](https://www.github.com/snakemake/snakemake-wrappers/issues/1322))
([6eb3c22](https://www.github.com/snakemake/snakemake-wrappers/commit/6eb3c227686c545b937d7cc5ef39b0b435f3e2b0))


### Bug Fixes

* Update MultiQC to 1.18
([#2377](https://www.github.com/snakemake/snakemake-wrappers/issues/2377))
([03bf2d9](https://www.github.com/snakemake/snakemake-wrappers/commit/03bf2d9e2d13968cb71a5e67d80a57b1502dc060))


### Performance Improvements

* autobump bio/bbtools
([#2325](https://www.github.com/snakemake/snakemake-wrappers/issues/2325))
([bf09e80](https://www.github.com/snakemake/snakemake-wrappers/commit/bf09e801dba4e3cf2186ec80631e22df6adbf993))
* autobump bio/bedtools/bamtobed
([#2334](https://www.github.com/snakemake/snakemake-wrappers/issues/2334))
([91b8054](https://www.github.com/snakemake/snakemake-wrappers/commit/91b8054fa408e2a96fed5ab24eea3f4f9f9a7f5c))
* autobump bio/bedtools/complement
([#2341](https://www.github.com/snakemake/snakemake-wrappers/issues/2341))
([d0b9659](https://www.github.com/snakemake/snakemake-wrappers/commit/d0b96598864e0f5827eafb6a3f47aab8c2524995))
* autobump bio/bedtools/coveragebed
([#2342](https://www.github.com/snakemake/snakemake-wrappers/issues/2342))
([8c25388](https://www.github.com/snakemake/snakemake-wrappers/commit/8c25388d851eaaae786f2ecdb3a684cf66037c1b))
* autobump bio/bedtools/genomecov
([#2336](https://www.github.com/snakemake/snakemake-wrappers/issues/2336))
([2ba070e](https://www.github.com/snakemake/snakemake-wrappers/commit/2ba070edd0781b97227a2103bcd28670d0382141))
* autobump bio/bedtools/intersect
([#2338](https://www.github.com/snakemake/snakemake-wrappers/issues/2338))
([b299438](https://www.github.com/snakemake/snakemake-wrappers/commit/b2994382ff94d11491ea991635fd25f74e9a56a4))
* autobump bio/bedtools/merge
([#2330](https://www.github.com/snakemake/snakemake-wrappers/issues/2330))
([8c6ae83](https://www.github.com/snakemake/snakemake-wrappers/commit/8c6ae83bdd70b09dd3e434d13cc8b035ebe34089))
* autobump bio/bedtools/slop
([#2324](https://www.github.com/snakemake/snakemake-wrappers/issues/2324))
([dfef51e](https://www.github.com/snakemake/snakemake-wrappers/commit/dfef51e5d5358a1fdfc87e7c560b31aab6e3a1aa))
* autobump bio/bedtools/sort
([#2328](https://www.github.com/snakemake/snakemake-wrappers/issues/2328))
([fde082e](https://www.github.com/snakemake/snakemake-wrappers/commit/fde082e8c75bf674324499b52bd48340c0572f64))
* autobump bio/bedtools/split
([#2326](https://www.github.com/snakemake/snakemake-wrappers/issues/2326))
([7663786](https://www.github.com/snakemake/snakemake-wrappers/commit/7663786a39a45ad0e6090324f400e94bb3e0bc68))
* autobump bio/blast/blastn
([#2335](https://www.github.com/snakemake/snakemake-wrappers/issues/2335))
([518a30a](https://www.github.com/snakemake/snakemake-wrappers/commit/518a30ad707534046a2a1cfe861e861ece749683))
* autobump bio/blast/makeblastdb
([#2329](https://www.github.com/snakemake/snakemake-wrappers/issues/2329))
([f24d810](https://www.github.com/snakemake/snakemake-wrappers/commit/f24d8106f4c2b6d1230535486967e82ae3f95dd5))
* autobump bio/bustools/count
([#2339](https://www.github.com/snakemake/snakemake-wrappers/issues/2339))
([f8598f8](https://www.github.com/snakemake/snakemake-wrappers/commit/f8598f83cf8e4f7b5bd75f18bc036b4f9610de07))
* autobump bio/bwa-mem2/mem
([#2323](https://www.github.com/snakemake/snakemake-wrappers/issues/2323))
([c751efd](https://www.github.com/snakemake/snakemake-wrappers/commit/c751efda5b93b155c1b42726dbe0f7fd4e10873c))
* autobump bio/bwa-meme/mem
([#2331](https://www.github.com/snakemake/snakemake-wrappers/issues/2331))
([ae28cb4](https://www.github.com/snakemake/snakemake-wrappers/commit/ae28cb4db8473e147f033f447f186fead3a3f092))
* autobump bio/bwa-memx/mem
([#2337](https://www.github.com/snakemake/snakemake-wrappers/issues/2337))
([1fe5ba7](https://www.github.com/snakemake/snakemake-wrappers/commit/1fe5ba7190b2f690a683a4c7f3397f039b8d71a2))
* autobump bio/bwa/mem
([#2333](https://www.github.com/snakemake/snakemake-wrappers/issues/2333))
([e24ce7c](https://www.github.com/snakemake/snakemake-wrappers/commit/e24ce7c1affe9f779dd3bff9c72bfb46608fdbb9))
* autobump bio/bwa/sampe
([#2327](https://www.github.com/snakemake/snakemake-wrappers/issues/2327))
([70a1ac3](https://www.github.com/snakemake/snakemake-wrappers/commit/70a1ac3efaeff35b3059e28d109ac8112b374669))
* autobump bio/bwa/samse
([#2332](https://www.github.com/snakemake/snakemake-wrappers/issues/2332))
([61c0e6d](https://www.github.com/snakemake/snakemake-wrappers/commit/61c0e6d4fd93e13f956ef40720ab072519060d90))
* autobump bio/bwa/samxe
([#2340](https://www.github.com/snakemake/snakemake-wrappers/issues/2340))
([d0865bc](https://www.github.com/snakemake/snakemake-wrappers/commit/d0865bc5a2f1e0a3358be2a678fd1545d95a75d8))
* autobump bio/cooltools/dots
([#2344](https://www.github.com/snakemake/snakemake-wrappers/issues/2344))
([c55e268](https://www.github.com/snakemake/snakemake-wrappers/commit/c55e26814506ea5d6d33afd2689dfe005f2cf968))
* autobump bio/cooltools/eigs_cis
([#2348](https://www.github.com/snakemake/snakemake-wrappers/issues/2348))
([1b7d26e](https://www.github.com/snakemake/snakemake-wrappers/commit/1b7d26e2775cdd6c785cf34a886e15bd82289aca))
* autobump bio/cooltools/eigs_trans
([#2346](https://www.github.com/snakemake/snakemake-wrappers/issues/2346))
([f00f77b](https://www.github.com/snakemake/snakemake-wrappers/commit/f00f77b0a0d057f5576f1a375bc96b79a9e9f0c9))
* autobump bio/cooltools/expected_cis
([#2347](https://www.github.com/snakemake/snakemake-wrappers/issues/2347))
([f4c8924](https://www.github.com/snakemake/snakemake-wrappers/commit/f4c8924b6b4f544a989e665d5e272d0f917ef700))
* autobump bio/cooltools/expected_trans
([#2350](https://www.github.com/snakemake/snakemake-wrappers/issues/2350))
([7e3f763](https://www.github.com/snakemake/snakemake-wrappers/commit/7e3f7635a0a7e4832dbf4d45fcabb9497efb0e89))
* autobump bio/cooltools/insulation
([#2343](https://www.github.com/snakemake/snakemake-wrappers/issues/2343))
([77e4ef5](https://www.github.com/snakemake/snakemake-wrappers/commit/77e4ef5d5a149bded23ebb28be3430e3c013eb76))
* autobump bio/cooltools/pileup
([#2345](https://www.github.com/snakemake/snakemake-wrappers/issues/2345))
([6c6b0b1](https://www.github.com/snakemake/snakemake-wrappers/commit/6c6b0b1c6712d37909927f444debf13becb7841b))
* autobump bio/cooltools/saddle
([#2349](https://www.github.com/snakemake/snakemake-wrappers/issues/2349))
([9f0f802](https://www.github.com/snakemake/snakemake-wrappers/commit/9f0f802f339667707bd8ed6013ca6e759ca9ce05))
* autobump bio/gseapy/gsea
([#2351](https://www.github.com/snakemake/snakemake-wrappers/issues/2351))
([b3b3c94](https://www.github.com/snakemake/snakemake-wrappers/commit/b3b3c94d857cfcdfafdc9ef819ba8f2567f26afa))
* autobump bio/hmmer/hmmbuild
([#2355](https://www.github.com/snakemake/snakemake-wrappers/issues/2355))
([2744f59](https://www.github.com/snakemake/snakemake-wrappers/commit/2744f59f5a9c13d5e80152f737d315535f4049ff))
* autobump bio/hmmer/hmmpress
([#2354](https://www.github.com/snakemake/snakemake-wrappers/issues/2354))
([3f379bc](https://www.github.com/snakemake/snakemake-wrappers/commit/3f379bce4f20970983b319cd327bf5a31623ba8d))
* autobump bio/hmmer/hmmscan
([#2353](https://www.github.com/snakemake/snakemake-wrappers/issues/2353))
([50c6425](https://www.github.com/snakemake/snakemake-wrappers/commit/50c642568ae259784c0fb50cf3440e3b84e0e6b5))
* autobump bio/hmmer/hmmsearch
([#2352](https://www.github.com/snakemake/snakemake-wrappers/issues/2352))
([ed2bbcc](https://www.github.com/snakemake/snakemake-wrappers/commit/ed2bbccff232d5a16677ea60e6e70e241ed4bb61))
* autobump bio/last/lastal
([#2357](https://www.github.com/snakemake/snakemake-wrappers/issues/2357))
([2589f18](https://www.github.com/snakemake/snakemake-wrappers/commit/2589f18971c2b377e0c8003c7f2ef7d412a277ae))
* autobump bio/last/lastdb
([#2356](https://www.github.com/snakemake/snakemake-wrappers/issues/2356))
([4cee71f](https://www.github.com/snakemake/snakemake-wrappers/commit/4cee71f08cf1cf55c31ddb654f1349830412e708))
* autobump bio/picard/addorreplacereadgroups
([#2370](https://www.github.com/snakemake/snakemake-wrappers/issues/2370))
([cf1cef0](https://www.github.com/snakemake/snakemake-wrappers/commit/cf1cef071e5b932c797078661bacf1eaf8ab12b2))
* autobump bio/picard/bedtointervallist
([#2360](https://www.github.com/snakemake/snakemake-wrappers/issues/2360))
([b71d414](https://www.github.com/snakemake/snakemake-wrappers/commit/b71d414de9d5adea175fd773caead4c4e72a2dde))
* autobump bio/picard/collectalignmentsummarymetrics
([#2359](https://www.github.com/snakemake/snakemake-wrappers/issues/2359))
([8873d75](https://www.github.com/snakemake/snakemake-wrappers/commit/8873d7552e8ef4acd58aa3d380e847768342aa59))
* autobump bio/picard/collectgcbiasmetrics
([#2366](https://www.github.com/snakemake/snakemake-wrappers/issues/2366))
([f6d5248](https://www.github.com/snakemake/snakemake-wrappers/commit/f6d52486f7a511d705a383325ba07edd8821f99a))
* autobump bio/picard/collecthsmetrics
([#2369](https://www.github.com/snakemake/snakemake-wrappers/issues/2369))
([f730d54](https://www.github.com/snakemake/snakemake-wrappers/commit/f730d54f21a56723237c33645f103645c9ea977e))
* autobump bio/picard/collectinsertsizemetrics
([#2373](https://www.github.com/snakemake/snakemake-wrappers/issues/2373))
([a952ad4](https://www.github.com/snakemake/snakemake-wrappers/commit/a952ad4503ea74ecef60e0646c0fcbdc8aa8158f))
* autobump bio/picard/collectmultiplemetrics
([#2364](https://www.github.com/snakemake/snakemake-wrappers/issues/2364))
([b21473a](https://www.github.com/snakemake/snakemake-wrappers/commit/b21473aff3482061b1e41b2b508f64e8072a77d7))
* autobump bio/picard/collectrnaseqmetrics
([#2361](https://www.github.com/snakemake/snakemake-wrappers/issues/2361))
([2b36e3c](https://www.github.com/snakemake/snakemake-wrappers/commit/2b36e3c0dd38b1abd75e6a531c12913b4365b1b0))
* autobump bio/picard/collecttargetedpcrmetrics
([#2358](https://www.github.com/snakemake/snakemake-wrappers/issues/2358))
([babe13e](https://www.github.com/snakemake/snakemake-wrappers/commit/babe13e9120377c7e6c3ea7f96c10fdb4e616571))
* autobump bio/picard/createsequencedictionary
([#2372](https://www.github.com/snakemake/snakemake-wrappers/issues/2372))
([70630ac](https://www.github.com/snakemake/snakemake-wrappers/commit/70630ace2fba1b8a7e07e8dadeadb210dba5fbe6))
* autobump bio/picard/markduplicates
([#2365](https://www.github.com/snakemake/snakemake-wrappers/issues/2365))
([66bea69](https://www.github.com/snakemake/snakemake-wrappers/commit/66bea6963e0cfe26908cd093acf011504869c4a3))
* autobump bio/picard/mergesamfiles
([#2362](https://www.github.com/snakemake/snakemake-wrappers/issues/2362))
([b1e64e2](https://www.github.com/snakemake/snakemake-wrappers/commit/b1e64e2158f13381bd3c052e9641a1f6c65e9602))
* autobump bio/picard/mergevcfs
([#2363](https://www.github.com/snakemake/snakemake-wrappers/issues/2363))
([2ca8f9e](https://www.github.com/snakemake/snakemake-wrappers/commit/2ca8f9eaa2255a7a9393e54ea88bae1c2d60ea06))
* autobump bio/picard/revertsam
([#2367](https://www.github.com/snakemake/snakemake-wrappers/issues/2367))
([8a54527](https://www.github.com/snakemake/snakemake-wrappers/commit/8a54527a7b493d434c3d94e3376fab18673691b7))
* autobump bio/picard/samtofastq
([#2371](https://www.github.com/snakemake/snakemake-wrappers/issues/2371))
([62ae9e5](https://www.github.com/snakemake/snakemake-wrappers/commit/62ae9e50d617a4529b767fce6d9f9b070d0dda45))
* autobump bio/picard/sortsam
([#2368](https://www.github.com/snakemake/snakemake-wrappers/issues/2368))
([1a359a4](https://www.github.com/snakemake/snakemake-wrappers/commit/1a359a4bc88d38880bbf552698f95c6f1d605021))
* autobump bio/sortmerna
([#2374](https://www.github.com/snakemake/snakemake-wrappers/issues/2374))
([e0920c9](https://www.github.com/snakemake/snakemake-wrappers/commit/e0920c9440f494167354f20583618d603b2c173e))
* autobump bio/vsearch
([#2375](https://www.github.com/snakemake/snakemake-wrappers/issues/2375))
([c59fbc8](https://www.github.com/snakemake/snakemake-wrappers/commit/c59fbc8f52e16204ffb3b7c8b1c17479c4ebe391))
---


This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants