-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic Programming c++ implementation #9
base: main
Are you sure you want to change the base?
Changes from 24 commits
bb10165
d13dfc7
5bbbaf2
930430a
379897a
e35cff1
e8ccb7c
fccca35
0c64950
fe6bac1
245a92f
0af1b08
66e2104
a4cff6f
9ccf1fe
8da0fe0
943a6c0
98ee45c
744b928
b8dd0f2
c7e6920
7771157
f7287f8
47aedc0
4fd3a58
4dd27f8
f51ff1c
c0afe6f
77e3e00
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
name: Install Non-Python Dependencies | ||
description: Installs system packages needed to build dupin. | ||
runs: | ||
using: "composite" | ||
steps: | ||
- name: install-dependencies-linux | ||
if: runner.os == 'Linux' | ||
shell: bash | ||
run: | | ||
sudo apt-get update | ||
sudo apt-get install libtbb12 libtbb-dev libeigen3-dev ninja-build | ||
- name: install-dependencies-macos | ||
if: runner.os == 'macOS' | ||
shell: bash | ||
run: | | ||
brew update | ||
brew install tbb eigen ninja | ||
- name: install-dependencies-windows | ||
if: runner.os == 'Windows' | ||
shell: bash | ||
run: choco install tbb eigen3 ninja |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
cmake_minimum_required(VERSION 3.8) | ||
project(dupin VERSION 0.0.1 LANGUAGES CXX) | ||
|
||
set(DEFAULT_BUILD_TYPE "Release") | ||
if(NOT CMAKE_BUILD_TYPE) | ||
set(CMAKE_BUILD_TYPE ${DEFAULT_BUILD_TYPE}) | ||
endif() | ||
|
||
find_package(Eigen3 REQUIRED) | ||
find_package(TBB REQUIRED) | ||
# Use modern method for Python binding | ||
set(PYBIND11_NEWPYTHON ON) | ||
find_package(pybind11 CONFIG REQUIRED) | ||
|
||
add_subdirectory(src) | ||
add_subdirectory(dupin) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
# Defaults to site-packages so only need package name | ||
install(DIRECTORY ./ DESTINATION dupin FILES_MATCHING PATTERN "*.py") |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
"""Implements dynamic programming class for optimal segementation algorithm.""" | ||
import _DynP | ||
import numpy as np | ||
|
||
|
||
class DynP: | ||
"""Detects the change points in a time series. | ||
|
||
Attributes | ||
---------- | ||
data: np.ndarray | ||
Matrix storing the time series data. | ||
num_bkps: int | ||
Number of change points to detect. | ||
jump: int | ||
Interval for checking potential change points. Changing will | ||
not provide optimal detection, but will reduce runtime. | ||
min_size: int | ||
Minimum size of a segment. Changing will not provide optimal | ||
detection, but will reduce runtime. | ||
""" | ||
|
||
def __init__( | ||
self, data: np.ndarray, num_bkps: int, jump: int, min_size: int | ||
): | ||
"""Initialize the DynamicProgramming instance with given parameters.""" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you document a class's |
||
self.dynp = _DynP.DynamicProgramming(data, num_bkps, jump, min_size) | ||
npkamath marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
def set_num_threads(self, num_threads: int): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Long term dev wise, if additional CPP methods will be added we will most likely want num_threads to be controlled on the level of the whole module? Would preprocessor be a good place to have this? @b-butler thoughts? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Adding this to as a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so I would just move this function to util.py right? would just have to import _DynP in util.py? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes |
||
"""Set the number of threads for parallelization. | ||
|
||
Parameters | ||
---------- | ||
num_threads: int | ||
The number of threads to use during computation. Default | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not a parameter default. If you want to add a comment about the default parallelization it should go above this section. Also, the message should state that we use all available cores unless set otherwise. |
||
is determined automatically. | ||
""" | ||
self.dynp.set_threads(num_threads) | ||
|
||
def fit(self, num_bkps: int) -> list: | ||
npkamath marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"""Calculate the cost matrix and return the breakpoints. | ||
|
||
Parameters | ||
---------- | ||
num_bkps: int | ||
number of change points to detect. | ||
|
||
Returns | ||
------- | ||
list: A list of integers representing the breakpoints. | ||
""" | ||
return self.dynp.fit() | ||
npkamath marked this conversation as resolved.
Show resolved
Hide resolved
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
list(APPEND dupin_cxx "dupininterface.cpp" "dupin.h" "dupin.cpp") | ||
|
||
pybind11_add_module(_dupin ${dupin_cxx}) | ||
|
||
set_target_properties(_dupin PROPERTIES | ||
CXX_STANDARD 17 | ||
CMAKE_CXX_STANDARD_REQUIRED True | ||
) | ||
|
||
target_include_directories(_dupin PRIVATE | ||
${EIGEN3_INCLUDE_DIR} | ||
${TBB_INCLUDE_DIRS} | ||
) | ||
|
||
target_link_libraries(_dupin PRIVATE TBB::tbb) | ||
target_compile_definitions(_dupin PRIVATE VERSION_INFO=${PROJECT_VERSION}) | ||
target_compile_options(_dupin PRIVATE -O2 -march=native) | ||
# Installs C++ extension into the root of the Python package | ||
install(TARGETS _dupin LIBRARY DESTINATION dupin) |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,165 @@ | ||||||
#include <iostream> | ||||||
#include <iomanip> | ||||||
#include <limits> | ||||||
#include <unordered_map> | ||||||
#include <vector> | ||||||
#include <Eigen/Dense> | ||||||
#include <tbb/blocked_range2d.h> | ||||||
#include <tbb/global_control.h> | ||||||
#include <tbb/parallel_for.h> | ||||||
#include "dupin.h" | ||||||
|
||||||
using namespace std; | ||||||
using namespace Eigen; | ||||||
Comment on lines
+12
to
+13
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't use the namespace, explicitly scope them. It makes the code more readable. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
If you use |
||||||
|
||||||
DynamicProgramming::DynamicProgramming() | ||||||
: num_bkps(1), num_parameters(0), num_timesteps(0), jump(1), min_size(3) {} | ||||||
|
||||||
DynamicProgramming::DynamicProgramming(const Eigen::MatrixXd &data, int num_bkps_, | ||||||
int jump_, int min_size_) | ||||||
: data(data), num_bkps(num_bkps_), | ||||||
jump(jump_), min_size(min_size_) { | ||||||
num_timesteps = data.rows(); | ||||||
num_parameters = data.cols(); | ||||||
} | ||||||
|
||||||
void DynamicProgramming::scale_data() { | ||||||
Eigen::VectorXd min_val = data.colwise().minCoeff(); | ||||||
Eigen::VectorXd max_val = data.colwise().maxCoeff(); | ||||||
Eigen::VectorXd range = max_val - min_val; | ||||||
|
||||||
for (int j = 0; j < num_parameters; ++j) { | ||||||
if (range(j) == 0.0) { | ||||||
data.col(j).setZero(); | ||||||
} else { | ||||||
data.col(j) = (data.col(j).array() - min_val(j)) / range(j); | ||||||
} | ||||||
} | ||||||
} | ||||||
void DynamicProgramming::regression_setup(linear_fit_struct &lfit) { | ||||||
lfit.x = Eigen::VectorXd::LinSpaced(num_timesteps, 0, num_timesteps - 1) / | ||||||
(num_timesteps - 1); | ||||||
lfit.y = data; | ||||||
} | ||||||
|
||||||
Eigen::VectorXd DynamicProgramming::regression_line(int start, int end, int dim, | ||||||
linear_fit_struct &lfit) { | ||||||
int n = end - start; | ||||||
Eigen::VectorXd x = lfit.x.segment(start, n); | ||||||
Eigen::VectorXd y = lfit.y.col(dim).segment(start, n); | ||||||
|
||||||
double x_mean = x.mean(); | ||||||
double y_mean = y.mean(); | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could probably save some time by saving and computing the cumulative sum and then computing the mean through ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What were your thoughts @npkamath? |
||||||
|
||||||
Eigen::VectorXd x_centered = x.array() - x_mean; | ||||||
Eigen::VectorXd y_centered = y.array() - y_mean; | ||||||
|
||||||
double slope = x_centered.dot(y_centered) / x_centered.squaredNorm(); | ||||||
double intercept = y_mean - slope * x_mean; | ||||||
|
||||||
return x.unaryExpr( | ||||||
[slope, intercept](double xi) { return slope * xi + intercept; }); | ||||||
} | ||||||
|
||||||
double DynamicProgramming::l2_cost(Eigen::MatrixXd &predicted_y, int start, int end) { | ||||||
Eigen::MatrixXd diff = predicted_y.block(start, 0, end - start, num_parameters) - | ||||||
data.block(start, 0, end - start, num_parameters); | ||||||
return std::sqrt(diff.array().square().sum()); | ||||||
} | ||||||
|
||||||
Eigen::MatrixXd DynamicProgramming::predicted(int start, int end, | ||||||
linear_fit_struct &lfit) { | ||||||
Eigen::MatrixXd predicted_y(num_timesteps, num_parameters); | ||||||
for (int i = 0; i < num_parameters; ++i) { | ||||||
predicted_y.block(start, i, end - start, 1) = | ||||||
regression_line(start, end, i, lfit); | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should compute the regression for all dimensions simultaneously; it should be faster. Also, is there a way to avoid the copying of data and have the prediction method take a reference to the data to store in? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Had an attempt at it but it is not returning the right type to pass into the next function; I will try a different approach, maybe not with Eigen |
||||||
} | ||||||
return predicted_y; | ||||||
} | ||||||
|
||||||
double DynamicProgramming::cost_function(int start, int end) { | ||||||
linear_fit_struct lfit; | ||||||
regression_setup(lfit); | ||||||
Comment on lines
+86
to
+87
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Make the constructor for |
||||||
Eigen::MatrixXd predicted_y = predicted(start, end, lfit); | ||||||
return l2_cost(predicted_y, start, end); | ||||||
} | ||||||
|
||||||
void DynamicProgramming::initialize_cost_matrix() { | ||||||
scale_data(); | ||||||
cost_matrix.initialize(num_timesteps); | ||||||
tbb::parallel_for(tbb::blocked_range<int>(0, num_timesteps), | ||||||
[&](const tbb::blocked_range<int> &r) { | ||||||
for (int i = r.begin(); i < r.end(); ++i) { | ||||||
for (int j = i + min_size; j < num_timesteps; ++j) { | ||||||
cost_matrix(i, j) = cost_function(i, j); | ||||||
} | ||||||
} | ||||||
}); | ||||||
cost_computed = true; | ||||||
} | ||||||
|
||||||
std::pair<double, std::vector<int>> DynamicProgramming::seg(int start, int end, | ||||||
int num_bkps) { | ||||||
MemoKey key = {start, end, num_bkps}; | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
auto it = memo.find(key); | ||||||
if (it != memo.end()) { | ||||||
return it->second; | ||||||
} | ||||||
if (num_bkps == 0) { | ||||||
return {cost_matrix(start, end), {end}}; | ||||||
} | ||||||
|
||||||
std::pair<double, std::vector<int>> best = {std::numeric_limits<double>::infinity(), {}}; | ||||||
|
||||||
for (int bkp = start + min_size; bkp < end; bkp++) { | ||||||
if ((bkp - start) >= min_size && (end - bkp) >= min_size) { | ||||||
npkamath marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
auto left = seg(start, bkp, num_bkps - 1); | ||||||
auto right = seg(bkp, end, 0); | ||||||
double cost = left.first + right.first; | ||||||
if (cost < best.first) { | ||||||
best.first = cost; | ||||||
best.second = left.second; | ||||||
best.second.push_back(bkp); | ||||||
best.second.insert(best.second.end(), right.second.begin(), | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't right always be one element long? If so, we could just push back. |
||||||
right.second.end()); | ||||||
} | ||||||
} | ||||||
} | ||||||
|
||||||
memo[key] = best; | ||||||
return best; | ||||||
} | ||||||
|
||||||
std::vector<int> DynamicProgramming::compute_breakpoints() { | ||||||
auto result = seg(0, num_timesteps - 1, num_bkps); | ||||||
std::vector<int> breakpoints = result.second; | ||||||
std::sort(breakpoints.begin(), breakpoints.end()); | ||||||
npkamath marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
breakpoints.erase(std::unique(breakpoints.begin(), breakpoints.end()), | ||||||
breakpoints.end()); | ||||||
npkamath marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
return breakpoints; | ||||||
} | ||||||
|
||||||
std::vector<int> DynamicProgramming::fit(int num_bkps_in){ | ||||||
npkamath marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
num_bkps = num_bkps_in; | ||||||
npkamath marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
if (!cost_computed){ | ||||||
initialize_cost_matrix(); | ||||||
} | ||||||
return compute_breakpoints(); | ||||||
} | ||||||
|
||||||
void set_parallelization(int num_threads) { | ||||||
static tbb::global_control gc(tbb::global_control::max_allowed_parallelism, | ||||||
num_threads); | ||||||
} | ||||||
|
||||||
DynamicProgramming::UpperTriangularMatrix & | ||||||
DynamicProgramming::getCostMatrix() { | ||||||
return cost_matrix; | ||||||
} | ||||||
|
||||||
void DynamicProgramming::setCostMatrix( | ||||||
const DynamicProgramming::UpperTriangularMatrix &value) { | ||||||
cost_matrix = value; | ||||||
} | ||||||
Comment on lines
+165
to
+168
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should anyone be setting the full cost matrix? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nope! honestly the setter functions aren't needed anymore. I can probably delete them and just set the getters as actual functions rather than properties. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Resolved but not deleted. |
||||||
|
||||||
int main() { return 0; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this class is supposed to be used by users, Perhaps some additional documentation would be useful here alongside a usage example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that something that goes into the comments of this class or in the documentation/tutorial for dupin? I can add this in comments if needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation here. A tutorial using
DynP
may be useful in the future.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeh, so check for example SweepDetector class in: https://github.com/glotzerlab/dupin/blob/main/dupin/detect/detect.py
You might want to write what this class is used for and what implementation does it use in 2-3 short sentences.
Additionally, if this class should be used differently to the most of the other classes you might want to add a very short example of how to use it. Not sure if thats the case? For example: