Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Np add single end r1 r2 mapping dna mode and merge sort split reads by name #1092

Conversation

nikellepetrillo
Copy link
Contributor

Description

Give your PR a concise yet descriptive title.
Please explain the changes you made here.
Explain the motivation for making this change. What existing problem does the pull request solve?
Mention any issues fixed, addressed, or otherwise related to this pull request, including issue numbers or hard links for issues in other repos.
You can delete these instructions once you have written your PR description.


Checklist

If you can answer "yes" to the following items, please add a checkmark next to the appropriate checklist item(s) and notify our WARP documentation team by tagging either @ekiernan or @kayleemathews in a comment on this PR.

  • Did you add inputs, outputs, or tasks to a workflow?
  • Did you modify, delete or move: file paths, file names, input names, output names, or task names?
  • If you made a changelog update, did you update the pipeline version number?

@github-actions
Copy link

Remember to squash merge!

1 similar comment
@github-actions
Copy link

Remember to squash merge!

…dna_mode_and_merge_sort_split_reads_by_name' into np_add_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name
Copy link
Contributor

@ekiernan ekiernan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a formatting question, but it LGTM!


for file in "${R1_files[@]}"; do
sample_id=$(basename "$file" ".hisat3n_dna.split_reads.R1.fastq")
hisat-3n /cromwell_root/reference/hg38 -q -U ${sample_id}.hisat3n_dna.split_reads.R1.fastq --directional-mapping-reverse --base-change C,T --no-repeat-index --no-spliced-alignment --no-temp-splicesite -t --new-summary --summary-file ${sample_id}.hisat3n_dna_split_reads_summary.R1.txt --threads 11 | samtools view -b -q 10 -o "${sample_id}.hisat3n_dna.split_reads.R1.bam"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we split this line up like we do for the parameters for other tools, so that each parameter is on it's own line?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would probably make it more readable (and consistent with the rest of warp)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup totally missed this. will reformat!

@github-actions
Copy link

Remember to squash merge!

@@ -452,8 +454,11 @@ task Split_unmapped_reads {

CODE

# wait 15 seconds for the files to be written
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this causing issues? weird

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the tar command on line 461 kept complaining about the files being changed as it was tarring things up... Im not sure why the tar command would even start up if the previous python command block wasn't finished writing out the files? but the sleep 15 seemed to help

Copy link
Member

@jessicaway jessicaway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks @nikellepetrillo!


for file in "${R1_files[@]}"; do
sample_id=$(basename "$file" ".hisat3n_dna.split_reads.R1.fastq")
hisat-3n /cromwell_root/reference/hg38 -q -U ${sample_id}.hisat3n_dna.split_reads.R1.fastq --directional-mapping-reverse --base-change C,T --no-repeat-index --no-spliced-alignment --no-temp-splicesite -t --new-summary --summary-file ${sample_id}.hisat3n_dna_split_reads_summary.R1.txt --threads 11 | samtools view -b -q 10 -o "${sample_id}.hisat3n_dna.split_reads.R1.bam"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would probably make it more readable (and consistent with the rest of warp)


for file in "${R2_files[@]}"; do
sample_id=$(basename "$file" ".hisat3n_dna.split_reads.R2.fastq")
hisat-3n /cromwell_root/reference/hg38 -q -U ${sample_id}.hisat3n_dna.split_reads.R2.fastq --directional-mapping --base-change C,T --no-repeat-index --no-spliced-alignment --no-temp-splicesite -t --new-summary --summary-file ${sample_id}.hisat3n_dna_split_reads_summary.R2.txt --threads 11 | samtools view -b -q 10 -o "${sample_id}.hisat3n_dna.split_reads.R2.bam"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here (split the command over multiple lines for readability and consistency)

…dna_mode_and_merge_sort_split_reads_by_name' into np_add_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name
Copy link
Contributor

@kevinpalis kevinpalis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@github-actions
Copy link

Remember to squash merge!

@github-actions
Copy link

Remember to squash merge!

@nikellepetrillo nikellepetrillo merged commit 53ba51f into develop Oct 11, 2023
@nikellepetrillo nikellepetrillo deleted the np_add_single_end_r1_r2_mapping_dna_mode_and_merge_sort_split_reads_by_name branch October 11, 2023 17:56
khajoue2 added a commit that referenced this pull request Nov 2, 2023
* Update Loom_schema.md (#1080)

* Lk dynamically select barcode orientation (#1086)

* Added dynamic barcode selection to the ATAC FastqProcess task

* Np pd 2327 separate unmapped reads (#1084)

* add separate unmapped reads task

* clean up

* add splut unmapped reads (#1087)

* KM Update metrics descriptions (#1088)

* Update metrics descriptions

* Update Loom_schema.md

* Update Loom_schema.md

* Add TagSort and definitions

* Km fix broken links in docs (#1090)

* Update Terra link address

* Fix links to snaptools Docker

* Fixed link to SS2 overview

* Update link to snM3C test_inputs directory

* Fixed link to Multiome test data

* Fix link to loom_to_h5ad.py script

* Fix link to loom creation script

* Fixed links

* Lk pd 2363 join barcodes (#1089)

Added GEX and ATAC whitelist barcodes to both ATAC and GEX h5ad files so that barcodes that can be joined.

* Update UnmappedBamToAlignedBam.wdl (#1070)

Changed MarkDuplicates task to use Float size(Array[File], [String])

* Np add single end r1 r2 mapping dna mode and merge sort split reads by name (#1092)

* echo whitelistnew task

* rearranging

* cleaning up

* cleaning up

* cleaning up

* formatting

---------

Co-authored-by: aawdeh <[email protected]>

* Lk pd 2374 update whitelist (#1093)

Updated path of multiome whitelists

* Km fix broken links in WARP docs (#1097)

* Update Multiome README

* Update broken links in docs

* Update get-started.md

* Update broken bookmarks in docs

* Remove the dropna and MATCH columns from JoinBarcodes (#1098)

* Remove the dropna and MATCH columns from JoinBarcodes

* PD-2379-add_remove_overlap_read_parts_task (#1094)

* Added merge_original_and_split_bam_sort_all_reads_name_and_position

* Cleaning up a few ls and pwds, adding echos

* add task dedup_unique_bam_and_index_unique_bam (#1103)

* Km atac h5ad overview (#1102)

* Create count-matrix-overview.md

* Update count-matrix-overview.md

* Add barcode column names

Add the column names containing the GEX and ATAC barcodes in the output h5ad files to the Multiome README

* Update count-matrix-overview.md

* Update README.md

* Km doc updates (#1100)

* Update links in Optimus Overview doc

* Update version numbers in READMEs

* Update count-matrix-overview.md

* Update count-matrix-overview.md

* Add link to SnapATAC2 import_data function

* Update count-matrix-overview.md

* Update website/docs/Pipelines/ATAC/count-matrix-overview.md

Co-authored-by: ekiernan <[email protected]>

* Update website/docs/Pipelines/ATAC/count-matrix-overview.md

Co-authored-by: ekiernan <[email protected]>

* Update website/docs/Pipelines/ATAC/count-matrix-overview.md

Co-authored-by: ekiernan <[email protected]>

---------

Co-authored-by: ekiernan <[email protected]>

* Add unique_reads_allc

* Update CondensedSnm3C.wdl

* Km add atac rrid and update multiome readme (#1109)

* Update Multiome README

* Update pipeline version output

* Update README.md

* Update inputs to CramToUnmappedBam test (#1104)

* update paths to input files

* Add unique_reads_cgn_extraction task (#1107)

---------

Co-authored-by: ekiernan <[email protected]>
Co-authored-by: Nikelle Petrillo <[email protected]>
Co-authored-by: Kaylee Mathews <[email protected]>
Co-authored-by: aawdeh <[email protected]>
Co-authored-by: ekiernan <[email protected]>
Co-authored-by: Jessica Way <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants