Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Counts.txt file not created when using .bam files as input #150

Closed
mikrzol opened this issue Apr 23, 2024 · 8 comments
Closed

Counts.txt file not created when using .bam files as input #150

mikrzol opened this issue Apr 23, 2024 · 8 comments

Comments

@mikrzol
Copy link

mikrzol commented Apr 23, 2024

I am trying to use ShortStack for analyzing a large sRNA-seq experiment (132 samples). The alignment part takes a very long time (almost a full week) on a supercomputer server I am working on, despite specifying 12 threads, 20 GB RAM per thread.

The organism is barley (Hordeum vulgare). I use the toplevel genome without any masking and added mitochondrion, nonchromosomal and chloroplast sequences, which makes the reference rather large, but I got poor results when using a masked genome.

Since I can only submit a job for max a week on the server, I only managed to get the .bam files. I thought I could use them for further analysis by using the --bamfiles rather than --readfiles option to get the results and it works fine, but I don't get the Counts.txt file I would need for DE Seq analysis.

Is this an expected limitation when using .bam files as input?

@MikeAxtell
Copy link
Owner

MikeAxtell commented Apr 23, 2024 via email

@MikeAxtell MikeAxtell self-assigned this May 8, 2024
@MikeAxtell MikeAxtell added the bug label May 8, 2024
@MikeAxtell
Copy link
Owner

I took another look at this.

When more than one BAM is given to --bamfile no Counts.txt is created. This doesn't seem like the behavior that I intended. But there may have been a technical reason ... I will look into it and report back on this thread.

The workaround is simple: First merge your multiple bamfiles making sure to mark readgroups per file names using the -r switch on samtools merge :

samtools merge --threads [n_threads] -r -o merged.bam in1.bam in2.bam ...

You can then input the merged.bam file as a single argument to ShortStack's --bamfile option, and Counts.txt file will be created.

@MikeAxtell MikeAxtell added enhancement and removed bug labels May 9, 2024
@MikeAxtell
Copy link
Owner

As of commit 5335d2e this enhancement has been added. Now, when user inputs more than one BAM file to --bamfile, a Counts.txt will be created.

This will be included in the next release, 4.0.4, which should be out soon.

@mikrzol
Copy link
Author

mikrzol commented May 10, 2024

Hi Prof. Axtell,

thanks for the update. Glad I could point out a bug and it'll get fixed soon!

Thanks also for the suggestion to split the alignment and merge the alignments later. I submitted the jobs to perform in parallel on the server I'm using for the analysis and I ran into some issues, but it was probably because of wrong versions of tools being used for some reason (e.g., the default server samtools rather than the one installed in the conda environment with ShortStack).

I'll keep you updated.

@MikeAxtell
Copy link
Owner

Resolved in release 4.0.4

@mikrzol
Copy link
Author

mikrzol commented May 20, 2024

Hi Prof. Axtell,

thanks for resolving the issue. I tried using ShortStack (4.0.3) to first align files in batches, then merging them into one file with samtools merge -r, and finally using the resulting file as the input, but I still didn't get the Counts.txt file (despite using only one merged.bam file).

I hope updating ShortStack will solve the issue. Would you kindly add the newest version to bioconda for ease of access?

@MikeAxtell
Copy link
Owner

Yes, version 4.0.4 should solve this.
New releases are automatically bumped by bionconda. Except that right now the bioconda system for updating recipes is broken ... see bioconda/bioconda-recipes#41025
ShortStack v4.0.4 will show up on bioconda as soon as bioconda's team solves their current technical issues. That is out of my hands.

In the meantime you can just download the ShortStack v4.0.4 script directly from this repo and execute it in a suitable environment.

@MikeAxtell MikeAxtell reopened this May 20, 2024
@dirkjanvw
Copy link

Hi! Just a quick note that ShortStack v4.0.4 is now available via bioconda :) The issues with the bioconda system seem solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants