Skip to content

Commit

Permalink
new tree partitioning algorithm, Nexus format input support, Thread m…
Browse files Browse the repository at this point in the history
…ode argument for switching between deterministic and non-deterministic, SPR steps parallel implementation and some minor changes.
  • Loading branch information
cesarpomar committed Dec 10, 2021
1 parent bc5487c commit f7d117a
Show file tree
Hide file tree
Showing 9 changed files with 432 additions and 176 deletions.
4 changes: 2 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ option(USE_AVX "enable/disable AVX" OFF)
option(USE_AVX2 "enable/disable AVX2" OFF)
option(USE_AVX512 "enable/disable AVX512" OFF)


add_definitions(-DCLI11_BOOST_OPTIONAL=0)
#add_definitions(-DNDEBUG)

ADD_SUBDIRECTORY("libs/CLI11")
Expand Down Expand Up @@ -62,7 +62,7 @@ if (MSVC)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /arch:SSE2 /D__SSE2__=1")
endif ()
else ()
set(CMAKE_CXX_FLAGS "-Wall -O3")
set(CMAKE_CXX_FLAGS "-Wall -O0")
if (USE_AVX512)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mavx512f")
elseif (USE_AVX2)
Expand Down
26 changes: 15 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,25 +83,29 @@ On the other hand, VeryFastTree has its own extra arguments which have been grou
It allows to specify the number of threads (*n*) used in the parallel execution. If this option is not set, the corresponding value will be obtained from the environment variable *OMP\_NUM\_THREADS*. This is the same approach followed by FastTree-2. If *n=1*, VeryFastTree behaves in the same way than FastTree-2 compiled without the *-DOPENMP* flag.

- **-threads-level [level]**
It allows to change the degree of parallelization. If level is *0*, VeryFastTree uses the same parallelization strategy followed by FastTree-2. If level is *1* (by default), VeryFastTree accelerates the rounds of ML NNIs using its tree partitioning method.
It allows to change the degree of parallelization. If level is *0*, VeryFastTree uses the same parallelization strategy followed by FastTree-2. If level is *1* (by default), VeryFastTree accelerates the rounds of ML NNIs using its tree partitioning method. If level is *2*, VeryFastTree accelerates the rounds of ML NNIs and SPR steps using its tree partitioning method(It only can used with datasets much larger than 2^sprlength).

- **-threads-mode [mode]**
It allows to change the mode of parallelization. If level is *0*, VeryFastTree uses all parallel parts of FastTree-2 including non-deterministic. If level is 1 (by default), VeryFastTree only use deterministic parallelization parts.

- **-thread-subtrees [num\_subtrees]**
It sets a maximum number of subtrees assigned to a thread. This option could increase the accuracy for small datasets containing large sequences at the expense of reducing the workload balance among threads.
It sets a maximum number of subtrees assigned to the threads. This option could increase the accuracy for small datasets containing large sequences at the expense of reducing the workload balance among threads.

- **-double-precision**
Use double precision arithmetic. Therefore, it is equivalent to compile FastTree-2 with *-DUSE\_DOUBLE*.

- **-ext [type]**
It enables the vector extensions:
- **none**: Operations are performed with the native programming language operators. In addition, loops are unrolled with the aim of providing hints to the compiler for applying some optimization (including vectorization).
- **SSE3**: (default) Arithmetic operations are performed using SSE3 vector intrinsics. Each instruction operates on 128 bit registers, which could contain four 32-bit floats or two 64-bit doubles.
- **AVX**: Arithmetic operations are performed using AVX vector intrinsics. Each instruction operates on 256 bit registers, which could contain eight 32-bit floats or four 64-bits doubles.
- **AVX2**: Similar to AVX, but some arithmetic operations are performed using additional AVX2 vector intrinsics not included in the AVX instruction set. Each instruction operates on 256 bit registers, which could contain eight 32-bit floats or four 64-bit doubles).
- **AVX512**: Arithmetic operations are performed using AVX512 vector intrinsics. Each instruction operates on 512 bit registers, which could contain sixteen 32-bit floats or eight 64-bits doubles.
- **AUTO**: (default) selects AVX2 when *-double-precision* is used and SEE3 otherwise. If one extension is not available, the previous level is used.
- **NONE**: Operations are performed with the native programming language operators. In addition, loops are unrolled with the aim of providing hints to the compiler for applying some optimization (including vectorization).
- **SSE3**: Arithmetic operations are performed using SSE3 vector intrinsics. Each instruction operates on 128 bit registers, which could contain four 32-bit floats or two 64-bit doubles.
- **AVX**: Arithmetic operations are performed using AVX vector intrinsics. Each instruction operates on 256 bit registers, which could contain eight 32-bit floats or four 64-bits doubles.
- **AVX2**: Similar to AVX, but some arithmetic operations are performed using additional AVX2 vector intrinsics not included in the AVX instruction set. Each instruction operates on 256 bit registers, which could contain eight 32-bit floats or four 64-bit doubles).
- **AVX512**: Arithmetic operations are performed using AVX512 vector intrinsics. Each instruction operates on 512 bit registers, which could contain sixteen 32-bit floats or eight 64-bits doubles.

- **-fastexp [implementation]**
This option is used to select an alternative implementation for the exponential function (e<sup>x</sup>), which has a significant impact on performance:
- **0**: (default) Use the *exp* function included in the built-in math library with double precision.
- **1**: Use the *exp* function included in the built-in math library with simple precision (not recommended together with *-double-precision* option).
- **2**: Use a very efficient and fast implementation to compute an accurate approximation of e<sup>x</sup> using double precision arithmetic.
- **3**: Use a very efficient and fast implementation to compute an accurate approximation of e<sup>x</sup> using simple precision arithmetic (not recommended together with *-double-precision* option).
- **0**: (default) Use the *exp* function included in the built-in math library with double precision.
- **1**: Use the *exp* function included in the built-in math library with simple precision (not recommended together with *-double-precision* option).
- **2**: Use a very efficient and fast implementation to compute an accurate approximation of e<sup>x</sup> using double precision arithmetic.
- **3**: Use a very efficient and fast implementation to compute an accurate approximation of e<sup>x</sup> using simple precision arithmetic (not recommended together with *-double-precision* option).
32 changes: 19 additions & 13 deletions main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -424,27 +424,33 @@ void cli(CLI::App &app, std::string &name, std::string &version, std::string &fl
type_name("n")->check(Min(1))->envname("OMP_NUM_THREADS")->group(optimizations);

app.add_option("-thread-subtrees", options.threadSubtrees,
"set a maximum number of subtrees assigned to a thread. This option could increase "
"sets a maximum number of subtrees assigned to the threads. This option could increase "
"the accuracy for small datasets containing large sequences at the expense of reducing "
"the workload balance among threads")->type_name("n")
->check(Min(1))->group(optimizations);

app.add_option("-threads-level", options.threadsLevel,
"to change the degree of parallelization. If level is 0, VeryFastTree uses the same "
"parallelization strategy followed by FastTree-2. If level is 1 (by default), VeryFastTree "
"accelerates the rounds of ML NNIs using its tree partitioning method")->
type_name("lvl")->check(CLI::Range(0, 1))->group(optimizations);

app.add_flag("-double-precision", options.doublePrecision, "use double precision arithmetic")->
"accelerates the rounds of ML NNIs using its tree partitioning method. If level is 2, VeryFastTree "
" accelerates the rounds of ML NNIs and SPR steps using its tree partitioning method"
"(It only can used with datasets much larger than 2^sprlength).")->
type_name("lvl")->check(CLI::Range(0, 2))->group(optimizations);

app.add_option("-threads-mode", options.deterministic,
"to change the mode of parallelization. If level is 0, VeryFastTree uses all parallel parts "
"of FastTree-2 including non-deterministic. If level is 1 (by default), VeryFastTree "
" only use deterministic parallelization parts.")->
type_name("mode")->check(CLI::Range(0, 1))->group(optimizations);

app.add_flag("-double-precision", options.doublePrecision, "uses double precision arithmetic")->
group(optimizations);

app.add_set_ignore_case("-ext", options.extension, {"NONE", "SSE", "SSE3", "AVX", "AVX2", "AVX512"},
app.add_set_ignore_case("-ext", options.extension, {"AUTO", "NONE", "SSE", "SSE3", "AVX", "AVX2", "AVX512"},
"to speed up computations enabling the vector extensions. "
"Available: NONE, SSE3 (default), AVX, AVX2 or AVX512")->type_name("name")->group(optimizations)->
#if (defined __SSE2__) || (defined __AVX__)
default_val("SSE3");
#else
default_val("NONE");
#endif
"Available: AUTO(default), NONE, SSE, SSE3 , AVX, AVX2 or AVX512")->type_name("name")
->group(optimizations)->default_val("AUTO");


app.add_option("-fastexp", options.fastexp,
"to select an alternative implementation for the exponential function exp(x), which has "
Expand All @@ -455,7 +461,7 @@ void cli(CLI::App &app, std::string &name, std::string &version, std::string &fl
"of exp(x) using simple precision (not recommended with -double-precision option)")->
type_name("lvl")->check(CLI::Range(0, 3))->group(optimizations);

for (auto &c:options.extension) { c = (char) std::toupper(c); }
for (auto &c: options.extension) { c = (char) std::toupper(c); }


auto deprecated = "Deprecated";
Expand Down
154 changes: 144 additions & 10 deletions src/Alignment.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,141 @@ void Alignment::readAlignment() {
std::string buf;

readline(fp, buf);
if (buf[0] == '>') {
if (buf.find("#NEXUS") == 0) {
/* It is a Nexus file */
bool characters = false;
do {
/* loop blocks lines */
if (buf[0] == 'b' || buf[0] == 'B') {
std::for_each(buf.begin(), buf.end(), [](char &c) { c = std::tolower(c); });
if (buf.find("characters") != buf.npos) {
characters = true;
break;
}
}
} while (readline(fp, buf));
if (!characters) {
throw std::invalid_argument("No characters block found");
}

int64_t ntax = -1, nchar = -1;
bool interleave = false;
bool matrix = false;
char fgap = '-';
char fmatchchar = '.';

auto readValue = [&buf](const std::string &name) {
int64_t pos = buf.find(name);
pos+=name.size();
while (isspace(buf[pos])) { pos++; }
if(buf[pos++] != '='){return (int64_t) -1;}
while (isspace(buf[pos])) { pos++; }
return pos;
};

while (readline(fp, buf)) {
/* loop header lines */
std::for_each(buf.begin(), buf.end(), [](char &c) { c = std::tolower(c); });

if (buf.find("dimensions") != buf.npos) {
int64_t pos;
pos = readValue("nchar");
if (pos > 0) {
nchar = std::atoll(&buf[pos]);
}
pos = readValue("ntax");
if (pos > 0) {
ntax = std::atoll(&buf[pos]);
}
} else if (buf.find("format") != buf.npos) {
int64_t pos;
pos = readValue("interleave");
if (pos > 0) {
interleave = buf[pos] == 'y';
}
pos = readValue("gap");
if (pos > 0) {
fgap = buf[pos];
}
pos = readValue("matchchar");
if (pos > 0) {
fmatchchar = buf[pos];
}
} else if (buf.find("matrix") != buf.npos) {
matrix = true;
break;
} else {
log << "Warning! Command ignored: " << buf << std::endl;
}
}
if (!matrix) {
throw std::invalid_argument("No matrix command found in characters block");
}

if(ntax > 0){
seqs.reserve(ntax);
names.reserve(ntax);
}
int64_t seqi = 0;
while (readline(fp, buf)) {
/* loop matrix lines */
int64_t pos = 0;
while (isspace(buf[pos])) { pos++; }
if (buf[pos] == ';') {
break;
}
if(pos == '\n'){
if(interleave){
seqi = 0;
}
continue;
}
int64_t init = pos;
if(buf[pos] == '\'' || buf[pos] == '"'){
pos++;
while(buf[pos] != buf[init] || pos == (int64_t)buf.size()){pos++;}
init++;
}else{
while (!isspace(buf[pos]) || pos == (int64_t)buf.size()) { pos++; }
}
if(pos == (int64_t)buf.size()){
throw std::invalid_argument("Wrong sequence name format: " + buf);
}
if(seqi == (int64_t)seqs.size()){
names.emplace_back(buf, init, pos - init);
}
pos++;
if(seqi <= (int64_t)seqs.size()){
seqs.emplace_back();
if(nchar > 0){
seqs.back().reserve(nchar);
}
}

for(;pos < (int64_t)buf.size();pos++){
if(isspace(buf[pos])){
continue;
}
if(buf[pos] == fgap){
seqs[seqi].push_back('-');
}else if(buf[pos] == fmatchchar && seqi > 0){
seqs[seqi].push_back(seqs[seqi - 1 ][seqs[seqi].size()]);
}else{
seqs[seqi].push_back(buf[pos]);
}
}

seqi++;
}
if(!seqs.empty()){
nPos = seqs[0].size();
}

if (ntax > 0 && (int64_t)seqs.size() != ntax) {
throw std::invalid_argument(strformat("Wrong number of sequences: expected %ld", ntax));
}

} else if (buf[0] == '>') {
/* FASTA, truncate names at any of these */
auto nameStop = (options.bQuote ? "'\t" : "(),: \t"); /* bQuote supports the -quote option */
auto seqSkip = " \t"; /* skip these characters in the sequence */
Expand Down Expand Up @@ -45,7 +179,7 @@ void Alignment::readAlignment() {
}
auto &seq = seqs[names.size() - 1];
seq.append(buf.begin(), buf.begin() + nKeep);
if ((int64_t)seq.size() > nPos) {
if ((int64_t) seq.size() > nPos) {
nPos = seq.size();
}
}
Expand Down Expand Up @@ -108,7 +242,7 @@ void Alignment::readAlignment() {
}
for (; j < buf.size(); j++) {
if (buf[j] != ' ') {
if ((int64_t)seqs[iSeq].size() >= nPos) {
if ((int64_t) seqs[iSeq].size() >= nPos) {
throw std::invalid_argument(strformat(
"Too many characters (expected %ld) for sequence named %s\nSo far have:\n%s",
nPos, names[iSeq].c_str(), seqs[iSeq].c_str()));
Expand All @@ -124,7 +258,7 @@ void Alignment::readAlignment() {
iSeq, names[iSeq].c_str(), seqs[iSeq].c_str()) << std::endl;
}
iSeq++;
if (iSeq == nSeq && (int64_t)seqs[0].size() == nPos) {
if (iSeq == nSeq && (int64_t) seqs[0].size() == nPos) {
break; /* finished alignment */
}
}/* end else non-empty phylip line */
Expand All @@ -134,8 +268,8 @@ void Alignment::readAlignment() {
}
}
/* Check lengths of sequences */
for (int64_t i = 0; i < (int64_t)seqs.size(); i++) {
if ((int64_t)seqs[i].size() != nPos) {
for (int64_t i = 0; i < (int64_t) seqs.size(); i++) {
if ((int64_t) seqs[i].size() != nPos) {
throw std::invalid_argument(strformat(
"Wrong number of characters for %s: expected %ld but have %ld instead.\n"
"This sequence may be truncated, or another sequence may be too long.",
Expand All @@ -146,8 +280,8 @@ void Alignment::readAlignment() {
/* If nucleotide sequences, replace U with T and N with X */
bool findDot = false;
#pragma omp parallel for schedule(static), reduction(||:findDot)
for (int64_t i = 0; i < (int64_t)seqs.size(); i++) {
for (int64_t j = 0; j < (int64_t)seqs[i].size(); j++) {
for (int64_t i = 0; i < (int64_t) seqs.size(); i++) {
for (int64_t j = 0; j < (int64_t) seqs[i].size(); j++) {
if (seqs[i][j] == '.') {
seqs[i][j] = '-';
findDot = true;
Expand Down Expand Up @@ -188,7 +322,7 @@ Uniquify::Uniquify(const Alignment &aln) {
alnNext.resize(aln.seqs.size(), -1); /* i in aln -> next, or -1 */
alnToUniq.resize(aln.seqs.size(), -1); /* i in aln -> iUnique; many -> -1 */

for (int64_t i = 0; i < (int64_t)aln.seqs.size(); i++) {
for (int64_t i = 0; i < (int64_t) aln.seqs.size(); i++) {
assert(hashseqs.find(aln.seqs[i]) != nullptr);
int64_t first = hashseqs[aln.seqs[i]];
if (first == i) {
Expand All @@ -209,5 +343,5 @@ Uniquify::Uniquify(const Alignment &aln) {
alnToUniq[i] = alnToUniq[last];
}
}
assert((int64_t)nUniqueSeq == (int64_t)uniqueSeq.size());
assert((int64_t) nUniqueSeq == (int64_t) uniqueSeq.size());
}
2 changes: 1 addition & 1 deletion src/Constants.h
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ namespace veryfasttree {
const std::string codesStringNT = "ACGT";

const std::string name = "VeryFastTree";
const std::string version = "3.0.1";
const std::string version = "3.1.0";
const std::string compileFlags =
"(OpenMP"
#ifdef __AVX__
Expand Down
Loading

0 comments on commit f7d117a

Please sign in to comment.