Skip to content

Commit

Permalink
VSEARCH 1.0.4: allpairs_global and more
Browse files Browse the repository at this point in the history
  • Loading branch information
torognes committed Dec 8, 2014
1 parent c21ae53 commit 2fe6c62
Show file tree
Hide file tree
Showing 19 changed files with 731 additions and 74 deletions.
16 changes: 11 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The aim of this project is to create an alternative to the [USEARCH](http://www.
* be as accurate or more accurate than usearch
* be as fast or faster than usearch

We have implemented a tool called VSEARCH which supports searching, clustering, chimera detection, dereplication, sorting and masking (commands `--usearch_global`, `--cluster_smallmem`, `--cluster_fast`, `--uchime_ref`, `--uchime_denovo`, `--derep_fulllength`, `--sortbysize`, `--sortbylength` and `--maskfasta`, as well as almost all their options).
We have implemented a tool called VSEARCH which supports searching, clustering, chimera detection, dereplication, sorting and masking (commands `--usearch_global`, `--cluster_smallmem`, `--cluster_fast`, `--uchime_ref`, `--uchime_denovo`, `--derep_fulllength`, `--sortbysize`, `--sortbylength`, `--maskfasta` and `--allpairs_global`, as well as almost all their options).

VSEARCH stands for vectorized search, as the tool takes advantage of parallelism in the form of SIMD vectorization as well as multiple threads to perform accurate alignments at high speed. VSEARCH uses an optimal global aligner (full dynamic programming Needleman-Wunsch), in contrast to USEARCH which by default uses a heuristic seed and extend aligner. This results in more accurate alignments and overall improved sensitivity (recall) with VSEARCH, especially for alignments with gaps.

Expand All @@ -26,22 +26,22 @@ VSEARCH does not support amino acid sequences or local alignments. These feature

In the example below, VSEARCH will identify sequences in the file database.fsa that are at least 90% identical on the plus strand to the query sequences in the file queries.fsa and write the results to the file alnout.txt.

`./vsearch-1.0.3-linux-x86_64 --usearch_global queries.fsa --db database.fsa --id 0.9 --alnout alnout.txt`
`./vsearch-1.0.4-linux-x86_64 --usearch_global queries.fsa --db database.fsa --id 0.9 --alnout alnout.txt`

## Download and install

The latest releases of VSEARCH are available [here](https://github.com/torognes/vsearch/releases).

Binary executables of VSEARCH are available in the `bin` folder for [GNU/Linux on x86-64 systems](https://github.com/torognes/vsearch/blob/master/bin/vsearch-1.0.3-linux-x86_64) and [Apple Mac OS X on x86-64 systems](https://github.com/torognes/vsearch/blob/master/bin/vsearch-1.0.3-osx-x86_64). These executables include support for input files compressed by zlib and bzip2 (with files usually ending in .gz or .bz2).
Binary executables of VSEARCH are available in the `bin` folder for [GNU/Linux on x86-64 systems](https://github.com/torognes/vsearch/blob/master/bin/vsearch-1.0.4-linux-x86_64) and [Apple Mac OS X on x86-64 systems](https://github.com/torognes/vsearch/blob/master/bin/vsearch-1.0.4-osx-x86_64). These executables include support for input files compressed by zlib and bzip2 (with files usually ending in .gz or .bz2).

Download the appropriate executable and make a symbolic link in a folder included in your `$PATH` from `vsearch` to the appropriate binary. You may use the following commands (assuming `~/bin` is in your `$PATH`):

```sh
cd ~
mkdir -p bin
cd bin
wget https://github.com/torognes/vsearch/releases/download/v1.0.3/vsearch-1.0.3-linux-x86_64
ln -s vsearch-1.0.3-linux-x86_64 vsearch
wget https://github.com/torognes/vsearch/releases/download/v1.0.4/vsearch-1.0.4-linux-x86_64
ln -s vsearch-1.0.4-linux-x86_64 vsearch
```

Substitute `linux` with `osx` in those lines if you're on a Mac.
Expand Down Expand Up @@ -180,6 +180,11 @@ Masking options:
* `--output_no_hits`
* `--qmask dust|none|soft` (Default dust)

Pairwise alignment options (most searching options also apply):

* `--allpairs_global <filename>`
* `--acceptall`

Searching options:

* `--alnout <filename>`
Expand Down Expand Up @@ -268,6 +273,7 @@ File | Description
---|---
**align.cc** | New Needleman-Wunsch global alignment, serial. Only for testing.
**align_simd.cc** | SIMD parallel global alignment of 1 query with 8 database sequences
**allpairs.cc** | All-vs-all optimal global pairwise alignment (no heuristics)
**arch.cc** | Architecture specific code (Mac/Linux)
**bitmap.cc** | Implementation of bitmaps
**chimera.cc** | Chimera detection
Expand Down
40 changes: 38 additions & 2 deletions doc/vsearch.1
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
.\" ============================================================================
.TH vsearch 1 "December 6, 2014" "version 1.0.3" "USER COMMANDS"
.TH vsearch 1 "December 8, 2014" "version 1.0.4" "USER COMMANDS"
.\" ============================================================================
.SH NAME
vsearch \(em chimera detection, clustering, dereplication, masking, searching, shuffling and sorting of amplicons from metagenomic projects.
vsearch \(em chimera detection, clustering, dereplication, masking, pairwise alignment, searching, shuffling and sorting of amplicons from metagenomic projects.
.\" ============================================================================
.SH SYNOPSIS
.\" left justified, ragged right
Expand Down Expand Up @@ -37,6 +37,12 @@ Masking:
[\fIoptions\fR]
.PP
.RE
Pairwise alignment:
.RS
\fBvsearch\fR --allpairs_global \fIfastafile\fR
(--alnout | --blast6out | --uc | --userout) \fIoutputfile\fR (--acceptall | --id \fIreal\fR) [\fIoptions\fR]
.PP
.RE
Searching:
.RS
\fBvsearch\fR --usearch_global \fIfastafile\fR --db \fIfastafile\fR
Expand Down Expand Up @@ -528,6 +534,25 @@ default is to launch one thread per available logical core.
.RE
.PP
.\" ----------------------------------------------------------------------------
Pairwise alignment options:
.RS
.TP 9
.BI --allpairs_global \0filename
Perform optimal global pairwise alignments of all vs all sequences
in the specified FASTA file. The results of the n * (n-1) / 2 alignments
are written to the result files specified with --alnout, --uc, --matched,
--notmatched, --fastapairs, --blast6out or --userout. Specify either the
--acceptall option to output all results or specify an identity level with
--id. Most other accept/reject options (see Searching options below) may
also be used. Only the plus strand of the sequences are aligned. This
command is multi-threaded.
.TP
.BI --acceptall
Write the results of all alignments to output files. This option overrides
all other accept / reject options (e.g. --id).
.RE
.PP
.\" ----------------------------------------------------------------------------
Searching options:
.RS
.TP 9
Expand Down Expand Up @@ -1257,6 +1282,14 @@ sequences with an abundance equal to or greater than 2:
\fIqueries_sorted.fas\fR --relabel sampleA_ --sizeout --minsize 2
.RE
.PP
Align all sequences in a database with each other and output all pairwise
alignments:
.PP
.RS
\fBvsearch\fR --allpairs_global \fIdatabase.fas\fR
--alnout \fIresults.aln\fR --acceptall
.RE
.PP
Search queries in a reference database, with a 80%-similarity
threshold, take terminal gaps into account when calculating pairwise
similarities:
Expand Down Expand Up @@ -1357,6 +1390,9 @@ Bug fixes (ssse3/sse41 requirement, memory leak)
.TP
.BR v1.0.3\~ "released December 6th, 2014"
Bug fix (now writes help to stdout instead of stderr)
.TP
.BR v1.0.4\~ "released December 8th, 2014"
Added --allpairs_global option. Reduced memory requirements slightly. Removed memory leaks.
.LP
.\" ============================================================================
.\" TODO:
Expand Down
Binary file modified doc/vsearch_manual.pdf
Binary file not shown.
6 changes: 4 additions & 2 deletions src/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -37,13 +37,15 @@ OBJS=cityhash/city.o \
align.o align_simd.o arch.o bitmap.o chimera.o cluster.o cpu_sse2.o \
cpu_ssse3.o db.o dbindex.o derep.o maps.o mask.o minheap.o msa.o \
query.o results.o search.o searchcore.o showalign.o shuffle.o \
sortbylength.o sortbysize.o unique.o userfields.o util.o vsearch.o
sortbylength.o sortbysize.o unique.o userfields.o util.o vsearch.o \
allpairs.o

DEPS=Makefile cityhash/city.h cityhash/config.h \
align.h align_simd.h arch.h bitmap.h chimera.h cluster.h cpu.h db.h \
dbindex.h derep.h maps.h mask.h minheap.h msa.h query.h results.h \
search.h searchcore.h showalign.h shuffle.h sortbylength.h \
sortbysize.h unique.h userfields.h util.h vsearch.h
sortbysize.h unique.h userfields.h util.h vsearch.h \
allpairs.h

.SUFFIXES:.o .cc

Expand Down
6 changes: 4 additions & 2 deletions src/Makefile.BZLIB
Original file line number Diff line number Diff line change
Expand Up @@ -37,13 +37,15 @@ OBJS=cityhash/city.o \
align.o align_simd.o arch.o bitmap.o chimera.o cluster.o cpu_sse2.o \
cpu_ssse3.o db.o dbindex.o derep.o maps.o mask.o minheap.o msa.o \
query.o results.o search.o searchcore.o showalign.o shuffle.o \
sortbylength.o sortbysize.o unique.o userfields.o util.o vsearch.o
sortbylength.o sortbysize.o unique.o userfields.o util.o vsearch.o \
allpairs.o

DEPS=Makefile cityhash/city.h cityhash/config.h \
align.h align_simd.h arch.h bitmap.h chimera.h cluster.h cpu.h db.h \
dbindex.h derep.h maps.h mask.h minheap.h msa.h query.h results.h \
search.h searchcore.h showalign.h shuffle.h sortbylength.h \
sortbysize.h unique.h userfields.h util.h vsearch.h
sortbysize.h unique.h userfields.h util.h vsearch.h \
allpairs.h

.SUFFIXES:.o .cc

Expand Down
6 changes: 4 additions & 2 deletions src/Makefile.ZLIB
Original file line number Diff line number Diff line change
Expand Up @@ -37,13 +37,15 @@ OBJS=cityhash/city.o \
align.o align_simd.o arch.o bitmap.o chimera.o cluster.o cpu_sse2.o \
cpu_ssse3.o db.o dbindex.o derep.o maps.o mask.o minheap.o msa.o \
query.o results.o search.o searchcore.o showalign.o shuffle.o \
sortbylength.o sortbysize.o unique.o userfields.o util.o vsearch.o
sortbylength.o sortbysize.o unique.o userfields.o util.o vsearch.o \
allpairs.o

DEPS=Makefile cityhash/city.h cityhash/config.h \
align.h align_simd.h arch.h bitmap.h chimera.h cluster.h cpu.h db.h \
dbindex.h derep.h maps.h mask.h minheap.h msa.h query.h results.h \
search.h searchcore.h showalign.h shuffle.h sortbylength.h \
sortbysize.h unique.h userfields.h util.h vsearch.h
sortbysize.h unique.h userfields.h util.h vsearch.h \
allpairs.h

.SUFFIXES:.o .cc

Expand Down
6 changes: 4 additions & 2 deletions src/Makefile.static
Original file line number Diff line number Diff line change
Expand Up @@ -37,13 +37,15 @@ OBJS=cityhash/city.o \
align.o align_simd.o arch.o bitmap.o chimera.o cluster.o cpu_sse2.o \
cpu_ssse3.o db.o dbindex.o derep.o maps.o mask.o minheap.o msa.o \
query.o results.o search.o searchcore.o showalign.o shuffle.o \
sortbylength.o sortbysize.o unique.o userfields.o util.o vsearch.o
sortbylength.o sortbysize.o unique.o userfields.o util.o vsearch.o \
allpairs.o

DEPS=Makefile cityhash/city.h cityhash/config.h \
align.h align_simd.h arch.h bitmap.h chimera.h cluster.h cpu.h db.h \
dbindex.h derep.h maps.h mask.h minheap.h msa.h query.h results.h \
search.h searchcore.h showalign.h shuffle.h sortbylength.h \
sortbysize.h unique.h userfields.h util.h vsearch.h
sortbysize.h unique.h userfields.h util.h vsearch.h \
allpairs.h

.SUFFIXES:.o .cc

Expand Down
Loading

0 comments on commit 2fe6c62

Please sign in to comment.