Skip to content

Commit

Permalink
Lk fix parse barcodes and join barcodes (#1410)
Browse files Browse the repository at this point in the history
fix bgzip on parsebarcodes and join barcodes
  • Loading branch information
ekiernan authored Nov 5, 2024
1 parent f81cee2 commit 24be0e4
Show file tree
Hide file tree
Showing 8 changed files with 20 additions and 7 deletions.
1 change: 1 addition & 0 deletions pipelines/skylab/atac/atac.changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
* Updated the ATAC library CSV to be consistent in file naming convention and to have similar case for metric names to the Optimus workflow library CSV
* Added a new metric to the ATAC library CSV to calculate percent_target, which is the number of estimated cells by SnapATAC2 divided by expected_cells input
* Updated the ATAC workflow so that the output fragment file is bgzipped by default
* Updated memory settings for PairedTag; does not impact the ATAC workflow


# 2.3.2
Expand Down
1 change: 1 addition & 0 deletions pipelines/skylab/multiome/Multiome.changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
* Updated the ATAC library CSV and the Gene Expression library CSV to be consistent in file naming convention and to have similar case for metric names
* Added a new metric to the ATAC library CSV to calculate percent_target, which is the number of estimated cells by SnapATAC2 divided by expected_cells input
* Updated the ATAC workflow so that the output fragment file is bgzipped by default
* Updated memory settings for PairedTag; does not impact the Multiome workflow

# 5.7.1
2024-10-18 (Date of Last Commit)
Expand Down
1 change: 1 addition & 0 deletions pipelines/skylab/optimus/Optimus.changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
* Updated gex_expected_cells to a required output
* Reformatted the library CSV output filename to remove an extra gex
* Updated the ATAC fragment file output so that it is bgzipped; this does not impact the Optimus workflow
* Updated memory settings for PairedTag; does not impact the Optimus workflow

# 7.7.0
2024-09-24 (Date of Last Commit)
Expand Down
1 change: 1 addition & 0 deletions pipelines/skylab/paired_tag/PairedTag.changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
* Updated the ATAC library CSV and the Gene Expression library CSV to be consistent in file naming convention and to have similar case for metric names
* Added a new metric to the ATAC library CSV to calculate percent_target, which is the number of estimated cells by SnapATAC2 divided by expected_cells input
* Updated the ATAC fragment file output so that it is bgzipped
* Updated memory settings for PairedTag Utils

# 1.7.1
2024-10-18 (Date of Last Commit)
Expand Down
1 change: 1 addition & 0 deletions pipelines/skylab/slideseq/SlideSeq.changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

* Updated the h5adUtils WDL to rename the gene expression library CSV filename; this does not impact slideseq
* Updated the ATAC fragment file output so that it is bgzipped; this does not impact the slideseq workflow
* Updated memory settings for PairedTag; does not impact the Slideseq workflow

# 3.4.2
2024-09-24 (Date of Last Commit)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

* Updated the h5adUtils WDL to rename the gene expression library CSV filename; this does not impact slideseq
* Updated the ATAC fragment file output so that it is bgzipped; this does not impact the Multi-snSS2 workflow
* Updated memory settings for PairedTag; does not impact the snSS2 workflow

# 2.0.1
2024-09-24 (Date of Last Commit)
Expand Down
13 changes: 9 additions & 4 deletions tasks/skylab/H5adUtils.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -235,8 +235,8 @@ task JoinMultiomeBarcodes {

Int nthreads = 1
String cpuPlatform = "Intel Cascade Lake"
Int machine_mem_mb = ceil((size(atac_h5ad, "MiB") + size(gex_h5ad, "MiB") + size(atac_fragment, "MiB")) * 3) + 10000
Int disk = ceil((size(atac_h5ad, "GiB") + size(gex_h5ad, "GiB") + size(atac_fragment, "GiB")) * 5) + 10
Int machine_mem_mb = ceil((size(atac_h5ad, "MiB") + size(gex_h5ad, "MiB") + size(atac_fragment, "MiB")) * 6) + 10000
Int disk = ceil((size(atac_h5ad, "GiB") + size(gex_h5ad, "GiB") + size(atac_fragment, "GiB")) * 8) + 10
String docker_path
}
String gex_base_name = basename(gex_h5ad, ".h5ad")
Expand All @@ -255,8 +255,10 @@ task JoinMultiomeBarcodes {
set -e pipefail

# decompress the bgzipped fragment file
echo "Moving fragment file for bgzipping"
mv ~{atac_fragment} ~{atac_fragment_base}.sorted.tsv.gz
echo "Decompressing fragment file"
bgzip -d ~{atac_fragment} > "~{atac_fragment_base}.sorted.tsv"
bgzip -d "~{atac_fragment_base}.sorted.tsv.gz"
echo "Done decompressing"


Expand All @@ -276,12 +278,14 @@ task JoinMultiomeBarcodes {
print("Reading ATAC h5ad:")
print("~{atac_h5ad}")
print("Read ATAC fragment file:")
print("~{atac_fragment}")
print(atac_fragment)
print("Reading Optimus h5ad:")
print("~{gex_h5ad}")
atac_data = ad.read_h5ad("~{atac_h5ad}")
gex_data = ad.read_h5ad("~{gex_h5ad}")
atac_tsv = pd.read_csv(atac_fragment, sep="\t", names=['chr','start', 'stop', 'barcode','n_reads'])
print("Printing ATAC fragment tsv")
print(atac_tsv)
whitelist_gex = pd.read_csv("~{gex_whitelist}", header=None, names=["gex_barcodes"])
whitelist_atac = pd.read_csv("~{atac_whitelist}", header=None, names=["atac_barcodes"])
Expand Down Expand Up @@ -317,6 +321,7 @@ task JoinMultiomeBarcodes {
atac_data.write_h5ad("~{atac_base_name}.h5ad")
df_fragment.to_csv("~{atac_fragment_base}.tsv", sep='\t', index=False, header = False)
CODE
# sorting the file
echo "Sorting file"
sort -k1,1V -k2,2n "~{atac_fragment_base}.tsv" > "~{atac_fragment_base}.sorted.tsv"
Expand Down
8 changes: 5 additions & 3 deletions tasks/skylab/PairedTagUtils.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -205,13 +205,13 @@ task ParseBarcodes {
Int nthreads = 1
String cpuPlatform = "Intel Cascade Lake"
String docker_path
Int disk = ceil((size(atac_h5ad, "GiB") + size(atac_fragment, "GiB")) * 8) + 10
Int machine_mem_mb = ceil((size(atac_h5ad, "MiB") + size(atac_fragment, "MiB")) * 6) + 10000
}

String atac_base_name = basename(atac_h5ad, ".h5ad")
String atac_fragment_base = basename(atac_fragment, ".sorted.tsv.gz")

Int machine_mem_mb = ceil((size(atac_h5ad, "MiB") + size(atac_fragment, "MiB")) * 3) + 10000
Int disk = ceil((size(atac_h5ad, "GiB") + size(atac_fragment, "GiB")) * 5) + 10

parameter_meta {
atac_h5ad: "The resulting h5ad from the ATAC workflow."
Expand All @@ -222,8 +222,10 @@ task ParseBarcodes {
set -e pipefail

# decompress the bgzipped atac file
echo "Moving fragment tsv for decompression"
mv ~{atac_fragment} ~{atac_fragment_base}.sorted.tsv.gz
echo "Decompressing fragment file"
bgzip -d ~{atac_fragment} > "~{atac_fragment_base}.sorted.tsv"
bgzip -d "~{atac_fragment_base}.sorted.tsv.gz"
echo "Done decompressing"

python3 <<CODE
Expand Down

0 comments on commit 24be0e4

Please sign in to comment.