stats_cpp is a bare-bones header only C++ statistics library. It is designed to be fast and easy to use with the ability to be added to existing projects without much overhead.
Adding stats_cpp to your project is very simple. Simply add Statistics.hpp
to your project, and then include it as a header file. You should be able to use the stats
namespace which includes all the functions and structures.
Firstly, note that the data points must be stored in a vector in order to use the functions provided.
Most of the data are stored in special structures. Currently, some types of properties are only available through these structures and cannot be evaluated using function calls. The most important type of this structures is oneVarStats
. More details on how to use this and other structures are described below.
Some properties have multiple ways of calculating. It is recommended to use the function getOneVarStats()
, provided that the property can be calculated that way. This way, any other future property that might be needed in the future is already calculated and stored in the memory.
Currently, the following properties can be calculated using the library:
std::vector<double> data { ... };
// using normal addition
double sum_1 = stats::simpleSum(&data);
// using Improved Kahan–Babuška algorithm
double sum_2 = stats::complexSum(&data);
// getOneVarStats uses the Improved Kahan–Babuška algorithm
stats::oneVarStats ovs = stats::getOneVarStats(&data);
double sum_3 = ovs.sum;
std::vector<double> data { ... };
double a_mean_1 = stats::arithmeticMean(data);
stats::oneVarStats ovs = stats::getOneVarStats(data);
double a_mean_2 = ovs.mean;
std::vector<double> data { ... };
double h_mean = stats::harmonicMean(data);
std::vector<double> data { ... };
double g_mean = geometricMean(data);
std::vector<double> data { ... };
stats::oneVarStats ovs = stats::getOneVarStats(data);
double median = ovs.median;
double mode = ovs.mode;
std::vector<double> data { ... };
stats::oneVarStats ovs = stats::getOneVarStats(data);
double variance = ovs.variance;
double standard_deviation = ovs.std;
std::vector<double> data { ... };
double size_1 = data.size();
stats::oneVarStats ovs = stats::getOneVarStats(data);
double size_2 = ovs.size;
std::vector<double> data { ... };
stats::oneVarStats ovs = stats::getOneVarStats(data);
double min = ovs.min;
double max = ovs.max;
std::vector<double> data { ... };
stats::oneVarStats ovs = stats::getOneVarStats(data);
double q1 = ovs.q1;
double q3 = ovs.q3;
double iqr = ovs.iqr;
std::vector<double> data { ... };
double value = X_VALUE;
double z1 = stats::calcZScore(value, data);
stats::oneVarStats ovs = stats::getOneVarStats(data);
double z2 = stats::calcZScore(value, ovs);
double mean = MEAN_VALUE;
double std = STANDARD_DEVIATION_VALUE;
double z3 = stats::calcZScore(value, mean, std);
double zScore = SOME_SCORE;
double cdf = stats::normalCDF(zScore);
double newZScore = stats::invNormalCDF(cdf);
std::vector<double> data { ... };
double value = SOME_VALUE;
double p1 = calcPValue(value, data);
stats::OneVarStats ovs = stats::getOneVarStats(data);
double p2 = calcPValue(value, ovs);
double mean = MEAN_VALUE;
double std = STANDARD_DEVIATION_VALUE;
double p3 = calcPValue(value, mean, std);
double confidence = 0.05;
double size = 100;
double parameter = P;
stats::Interval ci = stats::calcInterval(confidence, size, parameter);
const std::vector<std::pair<double, double>> pairs = { {x1, y1}, {x2, y2}, ... };
stats::LinearRegression lr = stats::calcLinearRegression(pairs);
This repo contains a very simple profiler in Profiler.hpp
that could be used for benchmarking. An example is given in examples/Benchmark.cpp
. Using the same program, the following benchmark was done:
Number of Data Points | Number of Seconds |
---|---|
1000 | 0.0002 |
10,000 | 0.0114 |
100,000 | 1.1793 |
500,000 | 34.0256 |
Note that the benchmark was compiled with clang++ -O3
.