Skip to content

Releases: datasnakes/htseq-count-cluster

HTSeqCountCluster v1.4 Release

25 Feb 22:43
5262774
Compare
Choose a tag to compare

Changes

  • Added class and function string documentation.

HTSeqCountCluster v1.3 Release

16 May 23:47
Compare
Choose a tag to compare

Changes

  • Added required packages to setup.py

HTSeqCountCluster v1.2 Release

15 May 22:47
Compare
Choose a tag to compare

Notable changes

  • Removed arguments from main function of htseq-count-cluster - 6814e6b

HTSeqCountCluster v1.1 Release

15 May 22:11
Compare
Choose a tag to compare

Build Status

htseq-count-cluster

A cli wrapper for running htseq's htseq-count on a cluster.

View documentation.

Install

pip install HTSeqCountCluster

Features

  • For use with large datasets (we've previously used a dataset of 120 different human samples)
  • For use with SGE/SGI cluster systems
  • Submits multiple jobs
  • Command line interface/script
  • Merges counts files into one counts table/csv file
  • Uses accepted_hits.bam file output of tophat

Examples

Run htseq-count-cluster

After generating bam output files from tophat, instead of using HTSeq's htseq count, you
can use our htseq-count-cluster script. This script is intended for use with
clusters that are using pbs (qsub) for job monitoring.

htseq-count-cluster -p path/to/samples/ -f samples.csv -g genes.gtf -o path/to/cluster-output/

This script uses logzero so there will be color coded logging information to your shell.

A common linux practice is to use screen to create a new shell and run a program
so that if it does produce output to the stdout/shell, the user can exit that particular
shell without the program ending and utilize another shell.

Help message output

usage: htseq_count_cluster.py [-h] -p INPATH -f INFILE -g GTF -o OUTPATH
                              [-e EMAIL]

This is a command line wrapper around htseq-count.

optional arguments:
  -h, --help            show this help message and exit
  -p INPATH, --inpath INPATH
                        Path of your samples/sample folders.
  -f INFILE, --infile INFILE
                        Name or path to your input csv file.
  -g GTF, --gtf GTF     Name or path to your gtf/gff file.
  -o OUTPATH, --outpath OUTPATH
                        Directory of your output counts file. The counts file
                        will be named.
  -e EMAIL, --email EMAIL
                        Email address to send script completion to.

*Ensure that htseq-count is in your path.


Merge output counts files

In order to prep your data for DESeq2, limma or edgeR, it's best to have 1 merged
counts file instead of multiple files produced from the htseq-count-cluster script. We offer this
as a standalone script as it may be useful to keep those files separate.

merge-counts -d path/to/cluster-output/
Help message for mergecounts.py
usage: mergecounts.py [-h] -d DIRECTORY

Merge multiple counts tables into 1 counts .csv file.

Your output file will be named:  merged_counts_table.csv

optional arguments:
  -h, --help            show this help message and exit
  -d DIRECTORY, --directory DIRECTORY
                        Path to folder of counts files.

ToDo

  • Monitor jobs.
  • Enhance wrapper input for other use cases.
  • Add example data.

Maintainers

Shaurita Hutchins | @sdhutchins |

Rob Gilmore | @grabear |

Help

Please feel free to open an issue if you have a question/feedback/problem
or submit a pull request to add a feature/refactor/etc. to this project.