Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/parameters #277

Merged
merged 5 commits into from
Jan 24, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion default/fullPipeline_illumina_nanpore.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,17 @@ steps:
# --length_required reads shorter than length_required will be discarded, default is 15. (int [=15])
# PE data, the front/tail trimming settings are given with -f, --trim_front1 and -t, --trim_tail1
additionalParams: " --detect_adapter_for_pe -q 20 --cut_front --trim_front1 3 --cut_tail --trim_tail1 3 --cut_mean_quality 10 --length_required 50 "
timeLimit: "AUTO"
nonpareil:
additionalParams: " -v 10 -r 1234 "
jellyfish:
additionalParams:
count: " -m 21 -s 100M "
# --counter-len is the counter length in bits.
# -s is the size of the hash
# -m k-mer length
# -m, --conter-len and -s determine the RAM peak usage which can be tested by using jellyfish mem.
# --disk writes intermediate results to disk
count: " -m 21 --counter-len 9 -s 30G --disk "
histo: " "

qcONT:
Expand Down
2 changes: 1 addition & 1 deletion docs/modules/assembly.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Input

=== "Command for short read data"
=== "Command for short read data with optional single end reads"

```
-entry wShortReadAssembly -params-file example_params/assembly.yml
Expand Down
4 changes: 3 additions & 1 deletion example_params/assembly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@ scratch: "/vol/scratch"
publishDirMode: "symlink"
steps:
assembly:
input: test_data/assembly/samples.tsv
input:
paired: test_data/assembly/samples.tsv
single: test_data/assembly/samplesUnpaired.tsv
megahit:
additionalParams: " --min-contig-len 200 "
fastg: true
Expand Down
3 changes: 2 additions & 1 deletion example_params/assemblyMetaspades.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ scratch: "/vol/scratch"
publishDirMode: "symlink"
steps:
assembly:
input: test_data/assembly/samples.tsv
input:
paired: test_data/assembly/samples.tsv
metaspades:
additionalParams: " "
fastg: true
Expand Down
2 changes: 1 addition & 1 deletion modules/annotation/module.nf
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,7 @@ process pResistanceGeneIdentifier {
S5CMD_PARAMS=params.steps?.annotation?.rgi?.database?.download?.s5cmd?.params ?: ""
'''
mkdir -p !{params.polished.databases}
ADDITIONAL_RGI_PARAMS=!{params.steps?.annotation?.rgi?.additionalParams}
ADDITIONAL_RGI_PARAMS="!{params.steps?.annotation?.rgi?.additionalParams}"

# Check developer documentation
CARD_JSON=""
Expand Down
31 changes: 26 additions & 5 deletions modules/assembly/shortReadAssembler.nf
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ def getOutput(SAMPLE, RUNID, TOOL, filename){
'/' + TOOL + '/' + filename
}

def timestamp = new java.util.Date().format( 'YYYYMMdd-HHmmss-SSS')

/*
* This process uses kmer frequencies and the nonpareil diversity index to predict peak memory consumption on an assembler.
Expand Down Expand Up @@ -170,17 +171,36 @@ workflow wShortReadAssemblyList {


/*
* Takes a tab separated file of files containing reads as input and produces assembly results.
* Input file with columns seperated by tabs:
* Takes two tab separated file of files containing paired and optional single reads
* as input and produces assembly results.
* Input files must have two columns seperated by tabs:
* SAMPLE and READS
*
* Output is of the format [SAMPLE, CONTIGS]
*
*/
workflow wShortReadAssemblyFile {
main:
Channel.from(file(params.steps.assembly.input)) | splitCsv(sep: '\t', header: true) \
| map { it -> [ it.SAMPLE, it.READS, file("NOT_SET")]} | set { reads }
SAMPLE_IDX = 0
SAMPLE_PAIRED_IDX = 1
UNPAIRED_IDX = 2

readsPaired = Channel.empty()
if(params.steps.assembly.input.containsKey("paired")) {
Channel.from(file(params.steps.assembly.input.paired)) | splitCsv(sep: '\t', header: true) \
| map { it -> [ it.SAMPLE, it.READS]} | set { readsPaired }
}

readsSingle = Channel.empty()
if(params.steps.assembly.input.containsKey("single")) {
Channel.from(file(params.steps.assembly.input.single)) | splitCsv(sep: '\t', header: true) \
| map { it -> [ it.SAMPLE, it.READS]} | set { readsSingle }
}

readsPaired | join(readsSingle, by: SAMPLE_IDX, remainder: true) \
| map { sample -> sample[UNPAIRED_IDX] == null ? \
[sample[SAMPLE_IDX], sample[SAMPLE_PAIRED_IDX], file("NOT_SET")] : sample } \
| set { reads }

_wAssembly(reads, Channel.empty(), Channel.empty())
emit:
Expand Down Expand Up @@ -287,9 +307,10 @@ workflow _wCalculateMegahitResources {
| join(kmerFrequencies) | pPredictFlavor

PREDICTED_RAM_IDX = 1

pPredictFlavor.out.memory \
| collectFile(newLine: true, seed: "SAMPLE\tPREDICTED_RAM", storeDir: params.logDir){ item ->
[ "predictedMegahitRAM.tsv", item[SAMPLE_IDX] + '\t' + item[PREDICTED_RAM_IDX] ]
[ "predictedMegahitRAM." + timestamp + ".tsv", item[SAMPLE_IDX] + '\t' + item[PREDICTED_RAM_IDX] ]
}

resourceType.doNotPredict | map{ it -> it + "NoPrediction" } \
Expand Down
2 changes: 2 additions & 0 deletions test_data/assembly/samplesUnpaired.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
SAMPLE READS
test1 https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/meta_test/small/unpaired.fq.gz