Releases: datasnakes/htseq-count-cluster
HTSeqCountCluster v1.4 Release
Changes
- Added class and function string documentation.
HTSeqCountCluster v1.3 Release
Changes
- Added required packages to setup.py
HTSeqCountCluster v1.2 Release
Notable changes
- Removed arguments from main function of htseq-count-cluster - 6814e6b
HTSeqCountCluster v1.1 Release
htseq-count-cluster
A cli wrapper for running htseq's htseq-count
on a cluster.
View documentation.
Install
pip install HTSeqCountCluster
Features
- For use with large datasets (we've previously used a dataset of 120 different human samples)
- For use with SGE/SGI cluster systems
- Submits multiple jobs
- Command line interface/script
- Merges counts files into one counts table/csv file
- Uses
accepted_hits.bam
file output oftophat
Examples
Run htseq-count-cluster
After generating bam output files from tophat, instead of using HTSeq's htseq count, you
can use our htseq-count-cluster
script. This script is intended for use with
clusters that are using pbs (qsub) for job monitoring.
htseq-count-cluster -p path/to/samples/ -f samples.csv -g genes.gtf -o path/to/cluster-output/
This script uses logzero so there will be color coded logging information to your shell.
A common linux practice is to use screen
to create a new shell and run a program
so that if it does produce output to the stdout/shell, the user can exit that particular
shell without the program ending and utilize another shell.
Help message output
usage: htseq_count_cluster.py [-h] -p INPATH -f INFILE -g GTF -o OUTPATH
[-e EMAIL]
This is a command line wrapper around htseq-count.
optional arguments:
-h, --help show this help message and exit
-p INPATH, --inpath INPATH
Path of your samples/sample folders.
-f INFILE, --infile INFILE
Name or path to your input csv file.
-g GTF, --gtf GTF Name or path to your gtf/gff file.
-o OUTPATH, --outpath OUTPATH
Directory of your output counts file. The counts file
will be named.
-e EMAIL, --email EMAIL
Email address to send script completion to.
*Ensure that htseq-count is in your path.
Merge output counts files
In order to prep your data for DESeq2
, limma
or edgeR
, it's best to have 1 merged
counts file instead of multiple files produced from the htseq-count-cluster
script. We offer this
as a standalone script as it may be useful to keep those files separate.
merge-counts -d path/to/cluster-output/
Help message for mergecounts.py
usage: mergecounts.py [-h] -d DIRECTORY
Merge multiple counts tables into 1 counts .csv file.
Your output file will be named: merged_counts_table.csv
optional arguments:
-h, --help show this help message and exit
-d DIRECTORY, --directory DIRECTORY
Path to folder of counts files.
ToDo
- Monitor jobs.
- Enhance wrapper input for other use cases.
- Add example data.
Maintainers
Shaurita Hutchins | @sdhutchins | ✉
Help
Please feel free to open an issue if you have a question/feedback/problem
or submit a pull request to add a feature/refactor/etc. to this project.