Releases: eead-csic-compbio/get_homologues
binaries
Compressed TAR file with binaries to be downloaded after cloning the source repository. This should be done with install.pl
get_homologues-est
Updates of release 20180828 (v3.1.3):
06062018: improved ANI computation by skipping self-taxon BLAST hits
06062018: fixed POCP computation; now it is [100(C1 + C2)/(T1 + T2)]
06062018: fixed POCS computation; now it is [100(C1 + C2)/(T1 + T2)] and T1/T2 are #nr seqs
23072018: updated Grid Engine instructions in manuals
28082018: bug fixed: get_homologues-pl -X now removes previous DIAMOND results of one vs others when new genomes are added
get_homologues-est
These are the relevant updates release 20180524 (v3.1.2):
- sequences longer than $MAXSEQLENGTH are skipped (by default $MAXSEQLENGTH=25kb)
- compare_clusters.pl -m now produces also pangenome_genes file listing sequences names in each cluster
- compare_clusters.pl -m now sorts taxon names when printing pangenome matrices
- added -P option to compute perc of conserved proteins (POCP) in get_homologues.pl
- added -P option to compute perc of conserved sequences (POCS) in get_homologues-est.pl
- simplified headers in output FASTA files of get_homologues-est down to: >first_non-blank_word [source_taxon.fna]
- modified phyTools::check_variants_FASTA_alignment to compute also private variants if listsA & B are passed
- annotate_cluster.pl now can take -A/-B lists of taxa to compute private variants in the aligned sequences of the cluster
- updated descriptions of annotate_cluster.pl in manuals
get_homologues-est
These are the relevant updates release 20180313 (v3.1.1):
modified sort_blast_results so that it now can compress individual BLAST result files with global binary $SORTBIN
get_homologues.pl and get_homologues-est.pl now compress individual BLAST files by default ($COMPRESSBLAST=1)
added global $MAXSEQLENGTH to get_homologues-est.pl to warn of long sequences, which often cause downstream problems
binaries
Compressed TAR file with binaries to be downloaded after cloning the source repository. This should be done with install.pl
get_homologues-est
Release 20180103 (v3.1.0) ships with the following changes:
added hclustering of ANdist matrix in plot_matrix_heatmap.sh for convenient cluster delimitations at distance cutoffs of 6,5,4 which correspond to ANI values of 94%, 95% and 96%, respectively
compare_clusters.pl -m now produces also pangenome CSV file for Scoary GWAS analyses with Fisher's Exact test
updated manuals with option to compute cluster intersection matrices with parse_pangenome_matrix -x
explained transposed CSV pangenome matrix for software Scoary in manual
added option 'force' to install.pl so that it can install with no supervision
added option 'no_databases' to install.pl for building docker images
removes the invariant (core-genome) and singleton (cloud-genome) columns before computing distances @ hcluster_matrix.sh
updated example figure created with plot_matrix_heatmap.sh in the manual
renamed hcluster_matrix.sh to hcluster_pangenome_matrix.sh
added links to GET_PHYLOMARKERS
get_homologues-est
Release 20171023 (v3.0.9) ships with the following changes:
removed especial chars >,<,& from cluster names in get_homologues.pl and get_homologues-est.pl
updated table of occupancy classes in the manual
added options -a and -X and improved documentation of hcluster_matrix.sh & plot_matrix_heatmap.sh
despite the increase in size, updated BLAST+ to ncbi-blast-2.6.0+ as it handles better than 2.2.27+ alignments with low complexity
get_homologues-est
Release 20170918 (v3.0.8) ships with the following bug fixes and changes:
added options -x , -c <0|1> and -f to hcluster_matrix.sh
added oneliner to transpose matrix to compare_clusters.pl
fixed parsing of filenames with -I in cases where input files are like numbers.faa
Cheers
get_homologues-est
Release 20170828 (v3.0.7) ships with the following bug fixes and changes:
compare_clusters.pl now prints lists of genes in intersections of two sets when comparing 3 cluster sets
compare_clusters.pl -m now produces a FASTA version of the binary pangenome matrix so that fully-labelled trees can be inferred with software such as IQ-TREE
added question to FAQ section in manuals explaining a way to compute ML pangenome tress with boostrap and aLRT support (Thanks Uriel Alonso and Ruben Sancho)
updated manuals and plot_matrix_heatmap.sh with options -r (remove column names and cell contents) and -k (set name for color key X-axis)
added options -d (max no. decimals) and -x (filter matrix with regex) to plot_matrix_heatmap.sh
added parse_pangenome_matrix.pl -x to compute cluster intersection between taxa in a pangenome matrix
fixed bug in compare_clusters.pl when .cluster_list file is not parsed, due to previous changes in find_taxa_FASTA_array_headers
get_homologues-est
Release 20170807 (v3.0.6) ships with the following bug fixes and changes:
parse_pangenome_matrix.pl -S now takes an integer to indicate the minimum occupancy requested of clusters
improved description of annotate_cluster.pl -h
increased length of sequence names in annotate_cluster.pl
added parsimony-informative sites in headers of blunt-end clusters produced by in annotate_cluster.pl -b
annotate_cluster.pl now shows unaligned cluster sequences and prints number of taxa in the alignment
annotate_cluster.pl now removes temporary files
added sub collapse_taxon_alignments to lib/phyTools.pm
added annotate_cluster.pl -c 40 to collapse alignments of sequences from same taxon
added Pfam domains to collapsed sequences
initialize compartments in parse_pangenome_matrix.pl to zero if empty before plotting (thanks Felipe Lira!)
transcripts2cdsCPP.pl & transcripts2cdsCPP.pl now print name of offending files with '+' chars
temp blastdb file closed properly in annotate_clusters.pl
corrected intergenic clusters produced with get_homologues.pl -g when using prokka-annotated GenBank files
updated get_homologues.pl -g and checked this section in the manual
extract_*_genbank subs in lib/phyTools.pm now parse LOCUS when accession is not available in GenBank files, such as those made with PROKKA