Skip to content

biocpp/biocpp-io

Folders and files

NameName
Last commit message
Last commit date

Latest commit

c7f1b67 · May 12, 2023
May 5, 2023
Apr 3, 2023
May 12, 2023
May 12, 2023
May 12, 2023
Oct 21, 2022
Oct 24, 2022
Aug 29, 2022
Apr 3, 2023
Nov 4, 2021
May 5, 2023

Repository files navigation

The BioC++ Input/Output Library

This is the I/O library of the BioC++ project. It provides the io module which offers easy-to-use interfaces for the following formats:

  • txt formats: plain-text files (line-wise reading) and delimited files (e.g. CSV and TSV).
  • seq formats: FastA and FastQ.
  • var formats: VCF and BCF.

The primary goal of this library is to offer higher level abstractions than the C libraries typically used in this domain (e.g. htslib) while at the same time offering an excellent performance. It hopes to offer a modern, well-integrated design that covers most typical I/O use-cases Bioinformaticians encounter.

The library provides stand-alone implementations of the formats (and does not require htslib). Support for reading and writing SAM/BAM/CRAM will be available in separate library until full implementations are available here.

Please see the online documentation for more details.

Attention: this library is currently a work-in-progress, and interfaces are not yet stable.

Example

Simple reading of a FastA-file that is transparently decompressed:

bio::io::seq::reader reader{"example.fasta.gz"};

for (auto & rec : reader)
{
  fmt::print("ID:  {}\n", rec.id);
  fmt::print("Seq: {}\n", rec.seq);
}

Reading a variant file and writing a new one that only contains variants that "PASS":

bio::io::var::reader reader{"example.vcf.gz"};
bio::io::var::writer writer{"example.bcf"};

for (auto & rec : reader)
  if (rec.filter.empty() || (rec.filter.size() == 1 && rec.filter[0] == "PASS"))
    writer.push_back(rec);

The format is transparently converted from compressed VCF to BCF if files have the respective extensions / magic headers.

Easy to use

  • Header-only → just drop it in your source code or include it as a git submodule!
  • The BioC++ core library is the only hard requirement.
  • No build-system and no configure steps required.
  • Optional CMake support available.
  • Integrates well with the C++20 standard library.

Dependencies

requirement version comment
compiler GCC ≥ 11 no other compiler is currently supported!
required libs BioC++ core = 0.7
optional libs zlib ≥ 1.2 required for *.gz and *.bcf support
bzip2 ≥ 1.0 required for *.bz2 file support

Quick-Setup

  • The recommended way to install BioC++ libraries is to follow the instructions in the meta-repository: https://github.com/biocpp/biocpp
  • CMake is recommended, because it finds dependencies automatically and sets required flags/paths.
  • Quick instructions without CMake:
    1. Clone biocpp-core and biocpp-io into e.g. ~/devel.
    2. Assuming you have ZLib and Bzip2 in system paths, do:
g++ -O3 -DNDEBUG -Wall -Wextra -std=c++20           \
    -I ~/devel/biocpp-core/include                  \
    -I ~/devel/biocpp-io/include                    \
    -DBIOCPP_IO_HAS_ZLIB=1 -DBIOCPP_IO_HAS_BZIP2=1  \
    your_file.cpp
  • If you want to use {fmt}, add -I ~/devel/fmt/include -D FMT_HEADER_ONLY=1.