new tree partitioning algorithm, Nexus format input support, Thread m…

…ode argument for switching between deterministic and non-deterministic, SPR steps parallel implementation and some minor changes.
citiususc · Dec 10, 2021 · f7d117a · f7d117a
1 parent bc5487c
commit f7d117a
Show file tree

Hide file tree

Showing 9 changed files with 432 additions and 176 deletions.
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -10,7 +10,7 @@ option(USE_AVX "enable/disable AVX" OFF)
 option(USE_AVX2 "enable/disable AVX2" OFF)
 option(USE_AVX512 "enable/disable AVX512" OFF)
 
-
+add_definitions(-DCLI11_BOOST_OPTIONAL=0)
 #add_definitions(-DNDEBUG)
 
 ADD_SUBDIRECTORY("libs/CLI11")
@@ -62,7 +62,7 @@ if (MSVC)
         set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /arch:SSE2 /D__SSE2__=1")
     endif ()
 else ()
-    set(CMAKE_CXX_FLAGS "-Wall -O3")
+    set(CMAKE_CXX_FLAGS "-Wall -O0")
     if (USE_AVX512)
         set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mavx512f")
     elseif (USE_AVX2)

diff --git a/README.md b/README.md
@@ -83,25 +83,29 @@ On the other hand, VeryFastTree has its own extra arguments which have been grou
 It allows to specify the number of threads (*n*) used in the parallel execution. If this option is not set, the corresponding value will be obtained from the environment variable *OMP\_NUM\_THREADS*. This is the same approach followed by FastTree-2. If *n=1*, VeryFastTree behaves in the same way than FastTree-2 compiled without the *-DOPENMP* flag.
 
 - **-threads-level [level]**
-It allows to change the degree of parallelization. If level is *0*, VeryFastTree uses the same parallelization strategy followed by FastTree-2. If level is *1* (by default), VeryFastTree accelerates the rounds of ML NNIs using its tree partitioning method.
+It allows to change the degree of parallelization. If level is *0*, VeryFastTree uses the same parallelization strategy followed by FastTree-2. If level is *1* (by default), VeryFastTree accelerates the rounds of ML NNIs using its tree partitioning method. If level is *2*, VeryFastTree accelerates the rounds of ML NNIs and SPR steps using its tree partitioning method(It only can used with datasets much larger than 2^sprlength).
+
+- **-threads-mode [mode]**
+  It allows to change the mode of parallelization. If level is *0*, VeryFastTree uses all parallel parts of FastTree-2 including non-deterministic. If level is 1 (by default), VeryFastTree only use deterministic parallelization parts.
 
 - **-thread-subtrees [num\_subtrees]**
-It sets a maximum number of subtrees assigned to a thread. This option could increase the accuracy for small datasets containing large sequences at the expense of reducing the workload balance among threads.
+It sets a maximum number of subtrees assigned to the threads. This option could increase the accuracy for small datasets containing large sequences at the expense of reducing the workload balance among threads.
 
 - **-double-precision**
 Use double precision arithmetic. Therefore, it is equivalent to compile FastTree-2 with *-DUSE\_DOUBLE*.
 
 - **-ext [type]**
 It enables the vector extensions:
-	- **none**: Operations are performed with the native programming language operators. In addition, loops are unrolled with the aim of providing hints to the compiler for applying some optimization (including vectorization).
-	- **SSE3**: (default) Arithmetic operations are performed using SSE3 vector intrinsics. Each instruction operates on 128 bit registers, which could contain four 32-bit floats or two 64-bit doubles.
-	- **AVX**: Arithmetic operations are performed using AVX vector intrinsics. Each instruction operates on 256 bit registers, which could contain eight 32-bit floats or four 64-bits doubles.
-	- **AVX2**: Similar to AVX, but some arithmetic operations are performed using  additional AVX2 vector intrinsics not included in the AVX instruction set. Each instruction operates on 256 bit registers, which could contain eight 32-bit floats or four 64-bit doubles).
-	- **AVX512**: Arithmetic operations are performed using AVX512 vector intrinsics. Each instruction operates on 512 bit registers, which could contain sixteen 32-bit floats or eight 64-bits doubles.
+    - **AUTO**: (default) selects AVX2 when *-double-precision* is used and SEE3 otherwise. If one extension is not available, the previous level is used.
+    - **NONE**: Operations are performed with the native programming language operators. In addition, loops are unrolled with the aim of providing hints to the compiler for applying some optimization (including vectorization).
+    - **SSE3**: Arithmetic operations are performed using SSE3 vector intrinsics. Each instruction operates on 128 bit registers, which could contain four 32-bit floats or two 64-bit doubles.
+    - **AVX**: Arithmetic operations are performed using AVX vector intrinsics. Each instruction operates on 256 bit registers, which could contain eight 32-bit floats or four 64-bits doubles.
+    - **AVX2**: Similar to AVX, but some arithmetic operations are performed using  additional AVX2 vector intrinsics not included in the AVX instruction set. Each instruction operates on 256 bit registers, which could contain eight 32-bit floats or four 64-bit doubles).
+    - **AVX512**: Arithmetic operations are performed using AVX512 vector intrinsics. Each instruction operates on 512 bit registers, which could contain sixteen 32-bit floats or eight 64-bits doubles.
 
 - **-fastexp [implementation]**  
 This option is used to select an alternative implementation for the exponential function (e<sup>x</sup>), which has a significant impact on performance:
-	- **0**: (default) Use the *exp* function included in the built-in math library with double precision.
-	- **1**: Use the *exp* function included in the built-in math library with simple precision (not recommended together with *-double-precision* option).
-	- **2**: Use a very efficient and fast implementation to compute an accurate approximation of e<sup>x</sup> using double precision arithmetic.
-	- **3**: Use a very efficient and fast implementation to compute an accurate approximation of e<sup>x</sup> using simple precision arithmetic (not recommended together with *-double-precision* option).
+    - **0**: (default) Use the *exp* function included in the built-in math library with double precision.
+    - **1**: Use the *exp* function included in the built-in math library with simple precision (not recommended together with *-double-precision* option).
+    - **2**: Use a very efficient and fast implementation to compute an accurate approximation of e<sup>x</sup> using double precision arithmetic.
+    - **3**: Use a very efficient and fast implementation to compute an accurate approximation of e<sup>x</sup> using simple precision arithmetic (not recommended together with *-double-precision* option).
diff --git a/main.cpp b/main.cpp
@@ -424,27 +424,33 @@ void cli(CLI::App &app, std::string &name, std::string &version, std::string &fl
             type_name("n")->check(Min(1))->envname("OMP_NUM_THREADS")->group(optimizations);
 
     app.add_option("-thread-subtrees", options.threadSubtrees,
-                   "set a maximum number of subtrees assigned to a thread. This option could increase "
+                   "sets a maximum number of subtrees assigned to the threads. This option could increase "
                    "the accuracy for small datasets containing large sequences at the expense of reducing "
                    "the workload balance among threads")->type_name("n")
             ->check(Min(1))->group(optimizations);
+
     app.add_option("-threads-level", options.threadsLevel,
                    "to change the degree of parallelization. If level is 0, VeryFastTree uses the same "
                    "parallelization strategy followed by FastTree-2. If level is 1 (by default), VeryFastTree "
-                   "accelerates the rounds of ML NNIs using its tree partitioning method")->
-            type_name("lvl")->check(CLI::Range(0, 1))->group(optimizations);
-
-    app.add_flag("-double-precision", options.doublePrecision, "use double precision arithmetic")->
+                   "accelerates the rounds of ML NNIs using its tree partitioning method. If level is 2, VeryFastTree "
+                   " accelerates the rounds of ML NNIs and SPR steps using its tree partitioning method"
+                   "(It only can used with datasets much larger than 2^sprlength).")->
+            type_name("lvl")->check(CLI::Range(0, 2))->group(optimizations);
+
+    app.add_option("-threads-mode", options.deterministic,
+                   "to change the mode of parallelization. If level is 0, VeryFastTree uses all parallel parts "
+                   "of FastTree-2 including non-deterministic. If level is 1 (by default), VeryFastTree "
+                   " only use deterministic parallelization parts.")->
+            type_name("mode")->check(CLI::Range(0, 1))->group(optimizations);
+
+    app.add_flag("-double-precision", options.doublePrecision, "uses double precision arithmetic")->
             group(optimizations);
 
-    app.add_set_ignore_case("-ext", options.extension, {"NONE", "SSE", "SSE3", "AVX", "AVX2", "AVX512"},
+    app.add_set_ignore_case("-ext", options.extension, {"AUTO", "NONE", "SSE", "SSE3", "AVX", "AVX2", "AVX512"},
                             "to speed up computations enabling the vector extensions. "
-                            "Available: NONE, SSE3 (default), AVX, AVX2 or AVX512")->type_name("name")->group(optimizations)->
-            #if (defined __SSE2__) || (defined __AVX__)
-            default_val("SSE3");
-            #else
-            default_val("NONE");
-            #endif
+                            "Available: AUTO(default), NONE, SSE, SSE3 , AVX, AVX2 or AVX512")->type_name("name")
+            ->group(optimizations)->default_val("AUTO");
+
 
     app.add_option("-fastexp", options.fastexp,
                    "to select an alternative implementation for the exponential function exp(x), which has "
@@ -455,7 +461,7 @@ void cli(CLI::App &app, std::string &name, std::string &version, std::string &fl
                    "of exp(x) using simple precision (not recommended with -double-precision option)")->
             type_name("lvl")->check(CLI::Range(0, 3))->group(optimizations);
 
-    for (auto &c:options.extension) { c = (char) std::toupper(c); }
+    for (auto &c: options.extension) { c = (char) std::toupper(c); }
 
 
     auto deprecated = "Deprecated";

diff --git a/src/Alignment.cpp b/src/Alignment.cpp
@@ -15,7 +15,141 @@ void Alignment::readAlignment() {
     std::string buf;
 
     readline(fp, buf);
-    if (buf[0] == '>') {
+    if (buf.find("#NEXUS") == 0) {
+        /* It is a Nexus file */
+        bool characters = false;
+        do {
+            /* loop blocks lines */
+            if (buf[0] == 'b' || buf[0] == 'B') {
+                std::for_each(buf.begin(), buf.end(), [](char &c) { c = std::tolower(c); });
+                if (buf.find("characters") != buf.npos) {
+                    characters = true;
+                    break;
+                }
+            }
+        } while (readline(fp, buf));
+        if (!characters) {
+            throw std::invalid_argument("No characters block found");
+        }
+
+        int64_t ntax = -1, nchar = -1;
+        bool interleave = false;
+        bool matrix = false;
+        char fgap = '-';
+        char fmatchchar = '.';
+
+        auto readValue = [&buf](const std::string &name) {
+            int64_t pos = buf.find(name);
+            pos+=name.size();
+            while (isspace(buf[pos])) { pos++; }
+            if(buf[pos++] != '='){return (int64_t) -1;}
+            while (isspace(buf[pos])) { pos++; }
+            return pos;
+        };
+
+        while (readline(fp, buf)) {
+            /* loop header lines */
+            std::for_each(buf.begin(), buf.end(), [](char &c) { c = std::tolower(c); });
+
+            if (buf.find("dimensions") != buf.npos) {
+                int64_t pos;
+                pos = readValue("nchar");
+                if (pos > 0) {
+                    nchar = std::atoll(&buf[pos]);
+                }
+                pos = readValue("ntax");
+                if (pos > 0) {
+                    ntax = std::atoll(&buf[pos]);
+                }
+            } else if (buf.find("format") != buf.npos) {
+                int64_t pos;
+                pos = readValue("interleave");
+                if (pos > 0) {
+                    interleave = buf[pos] == 'y';
+                }
+                pos = readValue("gap");
+                if (pos > 0) {
+                    fgap = buf[pos];
+                }
+                pos = readValue("matchchar");
+                if (pos > 0) {
+                    fmatchchar = buf[pos];
+                }
+            } else if (buf.find("matrix") != buf.npos) {
+                matrix = true;
+                break;
+            } else {
+                log << "Warning! Command  ignored: " << buf << std::endl;
+            }
+        }
+        if (!matrix) {
+            throw std::invalid_argument("No matrix command found in characters block");
+        }
+
+        if(ntax > 0){
+            seqs.reserve(ntax);
+            names.reserve(ntax);
+        }
+        int64_t seqi = 0;
+        while (readline(fp, buf)) {
+            /* loop matrix lines */
+            int64_t pos = 0;
+            while (isspace(buf[pos])) { pos++; }
+            if (buf[pos] == ';') {
+                break;
+            }
+            if(pos == '\n'){
+                if(interleave){
+                    seqi = 0;
+                }
+                continue;
+            }
+            int64_t init = pos;
+            if(buf[pos] == '\'' || buf[pos] == '"'){
+                pos++;
+                while(buf[pos] != buf[init] || pos == (int64_t)buf.size()){pos++;}
+                init++;
+            }else{
+                while (!isspace(buf[pos]) || pos == (int64_t)buf.size()) { pos++; }
+            }
+            if(pos == (int64_t)buf.size()){
+                throw std::invalid_argument("Wrong sequence name format: " + buf);
+            }
+            if(seqi == (int64_t)seqs.size()){
+                names.emplace_back(buf, init, pos - init);
+            }
+            pos++;
+            if(seqi <= (int64_t)seqs.size()){
+                seqs.emplace_back();
+                if(nchar > 0){
+                    seqs.back().reserve(nchar);
+                }
+            }
+
+            for(;pos < (int64_t)buf.size();pos++){
+                if(isspace(buf[pos])){
+                    continue;
+                }
+                if(buf[pos] == fgap){
+                    seqs[seqi].push_back('-');
+                }else if(buf[pos] == fmatchchar && seqi > 0){
+                    seqs[seqi].push_back(seqs[seqi - 1 ][seqs[seqi].size()]);
+                }else{
+                    seqs[seqi].push_back(buf[pos]);
+                }
+            }
+
+            seqi++;
+        }
+        if(!seqs.empty()){
+            nPos = seqs[0].size();
+        }
+
+        if (ntax > 0 && (int64_t)seqs.size() != ntax) {
+            throw std::invalid_argument(strformat("Wrong number of sequences: expected %ld", ntax));
+        }
+
+    } else if (buf[0] == '>') {
         /* FASTA, truncate names at any of these */
         auto nameStop = (options.bQuote ? "'\t" : "(),: \t"); /* bQuote supports the -quote option */
         auto seqSkip = " \t";    /* skip these characters in the sequence */
@@ -45,7 +179,7 @@ void Alignment::readAlignment() {
                 }
                 auto &seq = seqs[names.size() - 1];
                 seq.append(buf.begin(), buf.begin() + nKeep);
-                if ((int64_t)seq.size() > nPos) {
+                if ((int64_t) seq.size() > nPos) {
                     nPos = seq.size();
                 }
             }
@@ -108,7 +242,7 @@ void Alignment::readAlignment() {
                 }
                 for (; j < buf.size(); j++) {
                     if (buf[j] != ' ') {
-                        if ((int64_t)seqs[iSeq].size() >= nPos) {
+                        if ((int64_t) seqs[iSeq].size() >= nPos) {
                             throw std::invalid_argument(strformat(
                                     "Too many characters (expected %ld) for sequence named %s\nSo far have:\n%s",
                                     nPos, names[iSeq].c_str(), seqs[iSeq].c_str()));
@@ -124,7 +258,7 @@ void Alignment::readAlignment() {
                                      iSeq, names[iSeq].c_str(), seqs[iSeq].c_str()) << std::endl;
                 }
                 iSeq++;
-                if (iSeq == nSeq && (int64_t)seqs[0].size() == nPos) {
+                if (iSeq == nSeq && (int64_t) seqs[0].size() == nPos) {
                     break; /* finished alignment */
                 }
             }/* end else non-empty phylip line */
@@ -134,8 +268,8 @@ void Alignment::readAlignment() {
         }
     }
     /* Check lengths of sequences */
-    for (int64_t i = 0; i < (int64_t)seqs.size(); i++) {
-        if ((int64_t)seqs[i].size() != nPos) {
+    for (int64_t i = 0; i < (int64_t) seqs.size(); i++) {
+        if ((int64_t) seqs[i].size() != nPos) {
             throw std::invalid_argument(strformat(
                     "Wrong number of characters for %s: expected %ld but have %ld instead.\n"
                     "This sequence may be truncated, or another sequence may be too long.",
@@ -146,8 +280,8 @@ void Alignment::readAlignment() {
     /* If nucleotide sequences, replace U with T and N with X */
     bool findDot = false;
     #pragma omp parallel for schedule(static), reduction(||:findDot)
-    for (int64_t i = 0; i < (int64_t)seqs.size(); i++) {
-        for (int64_t j = 0; j < (int64_t)seqs[i].size(); j++) {
+    for (int64_t i = 0; i < (int64_t) seqs.size(); i++) {
+        for (int64_t j = 0; j < (int64_t) seqs[i].size(); j++) {
             if (seqs[i][j] == '.') {
                 seqs[i][j] = '-';
                 findDot = true;
@@ -188,7 +322,7 @@ Uniquify::Uniquify(const Alignment &aln) {
     alnNext.resize(aln.seqs.size(), -1);   /* i in aln -> next, or -1 */
     alnToUniq.resize(aln.seqs.size(), -1); /* i in aln -> iUnique; many -> -1 */
 
-    for (int64_t i = 0; i < (int64_t)aln.seqs.size(); i++) {
+    for (int64_t i = 0; i < (int64_t) aln.seqs.size(); i++) {
         assert(hashseqs.find(aln.seqs[i]) != nullptr);
         int64_t first = hashseqs[aln.seqs[i]];
         if (first == i) {
@@ -209,5 +343,5 @@ Uniquify::Uniquify(const Alignment &aln) {
             alnToUniq[i] = alnToUniq[last];
         }
     }
-    assert((int64_t)nUniqueSeq == (int64_t)uniqueSeq.size());
+    assert((int64_t) nUniqueSeq == (int64_t) uniqueSeq.size());
 }
diff --git a/src/Constants.h b/src/Constants.h
@@ -54,7 +54,7 @@ namespace veryfasttree {
         const std::string codesStringNT = "ACGT";
 
         const std::string name = "VeryFastTree";
-        const std::string version = "3.0.1";
+        const std::string version = "3.1.0";
         const std::string compileFlags =
                 "(OpenMP"
                 #ifdef __AVX__