Skip to content
cpockrandt edited this page Nov 18, 2019 · 15 revisions

This wiki is under construction and will be updated. Some of the answers might not be complete yet.

Which algorithm to choose for indexing?

At first: Both algorithms will build the exact same index. GenMap offers two algorithms for suffix array construction (needed for building the FM index) with different resource requirements and running times.

In terms of running time Skew7 (-A skew) performs much better on repetitive data, but most of the algorithm is not parallelized. Radixsort (-A radix) however is fully parallelized but is not recommended for repetitive data.

In terms of space consumption Skew7 uses large amounts of secondary memory in your TMP directory, Radixsort uses large amounts of main memory. As long as you have enough secondary memory (approx. 20 times the size of the input fasta), we recommend using Skew.

Which alphabets are allowed?

GenMap can only handle nucleotide sequences (A, C, G, T/U, N). If you load files including other letters (such as IUPAC notation for ambiguous bases), GenMap will print a warning and convert them to N.

How to load raw files (.map, .freq8, .freq16) in C++?

If you want to load the raw output with the frequency or mappability vector into your program, you can use the following snippet:

#include <vector>
#include <fstream>
#include <iostream>
#include <iterator>

template <typename value_t>
void load(std::vector<value_t> & vec, std::string && path)
{
    std::ifstream file(path, std::ios::binary);
    if (!file.eof() && !file.fail())
    {
        file.seekg(0, std::ios_base::end);
        std::streampos fileSize = file.tellg();
        vec.resize(fileSize / sizeof(value_t));
        file.seekg(0, std::ios_base::beg);
        file.read(reinterpret_cast<char*>(&vec[0]), fileSize);
        file.close();
        return;
    }
    // something went wrong ...
}

int main(int argc, char ** argv)
{
    // load mappability vector
    std::vector<float> mappability;
    load(mappability, "c.map");
    // print mappability vector
    std::copy(mappability.begin(), mappability.end(), std::ostream_iterator<float>(std::cout, " "));
    std::cout << '\n';

    // load frequency vector (for freq16 please use uint16_t)
    std::vector<uint8_t> frequency;
    load(frequency, "c.freq8");
    // print frequency vector
    std::copy(frequency.begin(), frequency.end(), std::ostream_iterator<int>(std::cout, " "));
    std::cout << '\n';

    return 0;
}
Clone this wiki locally