-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic Programming c++ implementation #9
base: main
Are you sure you want to change the base?
Changes from all commits
bb10165
d13dfc7
5bbbaf2
930430a
379897a
e35cff1
e8ccb7c
fccca35
0c64950
fe6bac1
245a92f
0af1b08
66e2104
a4cff6f
9ccf1fe
8da0fe0
943a6c0
98ee45c
744b928
b8dd0f2
c7e6920
7771157
f7287f8
47aedc0
4fd3a58
4dd27f8
f51ff1c
c0afe6f
77e3e00
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
name: Install Non-Python Dependencies | ||
description: Installs system packages needed to build dupin. | ||
runs: | ||
using: "composite" | ||
steps: | ||
- name: install-dependencies-linux | ||
if: runner.os == 'Linux' | ||
shell: bash | ||
run: | | ||
sudo apt-get update | ||
sudo apt-get install libtbb12 libtbb-dev libeigen3-dev ninja-build | ||
- name: install-dependencies-macos | ||
if: runner.os == 'macOS' | ||
shell: bash | ||
run: | | ||
brew update | ||
brew install tbb eigen ninja | ||
- name: install-dependencies-windows | ||
if: runner.os == 'Windows' | ||
shell: bash | ||
run: choco install tbb eigen3 ninja |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
cmake_minimum_required(VERSION 3.8) | ||
project(dupin VERSION 0.0.1 LANGUAGES CXX) | ||
|
||
set(DEFAULT_BUILD_TYPE "Release") | ||
if(NOT CMAKE_BUILD_TYPE) | ||
set(CMAKE_BUILD_TYPE ${DEFAULT_BUILD_TYPE}) | ||
endif() | ||
|
||
find_package(Eigen3 REQUIRED) | ||
find_package(TBB REQUIRED) | ||
# Use modern method for Python binding | ||
set(PYBIND11_NEWPYTHON ON) | ||
find_package(pybind11 CONFIG REQUIRED) | ||
|
||
add_subdirectory(src) | ||
add_subdirectory(dupin) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
# Defaults to site-packages so only need package name | ||
install(DIRECTORY ./ DESTINATION dupin FILES_MATCHING PATTERN "*.py") |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
"""Implements dynamic programming class for optimal segementation algorithm.""" | ||
import _dupin | ||
import numpy as np | ||
|
||
|
||
class DynP: | ||
"""Detects the change points in a time series. | ||
|
||
Attributes | ||
---------- | ||
data: np.ndarray | ||
Matrix storing the time series data. | ||
num_bkps: int | ||
Number of change points to detect. | ||
jump: int | ||
Interval for checking potential change points. Changing will | ||
not provide optimal detection, but will reduce runtime. | ||
min_size: int | ||
Minimum size of a segment. Changing will not provide optimal | ||
detection, but will reduce runtime. | ||
|
||
|
||
Methods | ||
------- | ||
__init__(self, data: np.ndarray, num_bkps: int, jump: int, min_size: int) | ||
Initializes the DynamicProgramming instance with the time series data | ||
and parameters. | ||
set_num_threads(self, num_threads: int) | ||
Sets the number of threads to be used for parallel computation. | ||
fit(self, num_bkps: int) -> list | ||
Calculates the cost matrix and identifies the optimal breakpoints in | ||
the time series data. | ||
Comment on lines
+23
to
+32
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These get documented by the method's docstring itself. |
||
|
||
Example Usage | ||
------------- | ||
>>> import numpy as np | ||
>>> from dynp import DynP | ||
>>> data = np.random.rand(100, 1) # Simulated time series data | ||
>>> num_bkps = 3 # Number of breakpoints to detect | ||
>>> jump = 1 # Interval for checking potential breakpoints | ||
>>> min_size = 3 # Minimum size of a segment | ||
>>> model = Dynp(data, num_bkps, jump, min_size) | ||
>>> breakpoints = model.fit(num_bkps) | ||
>>> print(breakpoints) | ||
""" | ||
|
||
def __init__( | ||
self, data: np.ndarray, num_bkps: int, jump: int, min_size: int | ||
): | ||
"""Initialize the DynamicProgramming instance with given parameters.""" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you document a class's |
||
self._dupin = _dupin.DynamicProgramming(data, num_bkps, jump, min_size) | ||
|
||
def set_num_threads(self, num_threads: int): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Long term dev wise, if additional CPP methods will be added we will most likely want num_threads to be controlled on the level of the whole module? Would preprocessor be a good place to have this? @b-butler thoughts? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Adding this to as a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so I would just move this function to util.py right? would just have to import _DynP in util.py? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes |
||
"""Set the number of threads for parallelization. | ||
|
||
Parameters | ||
---------- | ||
num_threads: int | ||
The number of threads to use during computation. Default | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not a parameter default. If you want to add a comment about the default parallelization it should go above this section. Also, the message should state that we use all available cores unless set otherwise. |
||
is determined automatically. | ||
""" | ||
self._dupin.set_threads(num_threads) | ||
|
||
def fit(self, num_breakpoints: int) -> list[int]: | ||
"""Calculate the cost matrix and return the breakpoints. | ||
|
||
Parameters | ||
---------- | ||
num_bkps: int | ||
number of change points to detect. | ||
|
||
Returns | ||
------- | ||
list: A list of integers representing the breakpoints. | ||
""" | ||
return self._dupin.fit(num_breakpoints) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
list(APPEND dupin_cxx "module.cpp" "dupin.h" "dupin.cpp") | ||
|
||
pybind11_add_module(_dupin ${dupin_cxx}) | ||
|
||
set_target_properties(_dupin PROPERTIES | ||
CXX_STANDARD 17 | ||
CMAKE_CXX_STANDARD_REQUIRED True | ||
) | ||
|
||
target_include_directories(_dupin PRIVATE | ||
${EIGEN3_INCLUDE_DIR} | ||
${TBB_INCLUDE_DIRS} | ||
) | ||
|
||
target_link_libraries(_dupin PRIVATE TBB::tbb) | ||
target_compile_definitions(_dupin PRIVATE VERSION_INFO=${PROJECT_VERSION}) | ||
target_compile_options(_dupin PRIVATE -O2 -march=native) | ||
# Installs C++ extension into the root of the Python package | ||
install(TARGETS _dupin LIBRARY DESTINATION dupin) |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,170 @@ | ||||||
#include <iostream> | ||||||
#include <iomanip> | ||||||
#include <limits> | ||||||
#include <unordered_map> | ||||||
#include <vector> | ||||||
#include <Eigen/Dense> | ||||||
#include <tbb/blocked_range2d.h> | ||||||
#include <tbb/global_control.h> | ||||||
#include <tbb/parallel_for.h> | ||||||
#include "dupin.h" | ||||||
|
||||||
using namespace std; | ||||||
using namespace Eigen; | ||||||
Comment on lines
+12
to
+13
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't use the namespace, explicitly scope them. It makes the code more readable. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
If you use |
||||||
|
||||||
DynamicProgramming::DynamicProgramming() | ||||||
: num_features(0), num_timesteps(0), jump(1), min_size(3), cost_matrix(0) {} | ||||||
|
||||||
|
||||||
DynamicProgramming::DynamicProgramming(const Eigen::MatrixXd &data, | ||||||
int jump_, int min_size_) | ||||||
: data(data), jump(jump_), min_size(min_size_), cost_matrix(data.rows()) { | ||||||
num_timesteps = data.rows(); | ||||||
num_features = data.cols(); | ||||||
} | ||||||
|
||||||
void DynamicProgramming::scale_data() { | ||||||
Eigen::VectorXd min_val = data.colwise().minCoeff(); | ||||||
Eigen::VectorXd max_val = data.colwise().maxCoeff(); | ||||||
Eigen::VectorXd range = max_val - min_val; | ||||||
|
||||||
for (int j = 0; j <num_features; ++j) { | ||||||
if (range(j) == 0.0) { | ||||||
data.col(j).setZero(); | ||||||
} else { | ||||||
data.col(j) = (data.col(j).array() - min_val(j)) / range(j); | ||||||
} | ||||||
} | ||||||
} | ||||||
void DynamicProgramming::regression_setup(linear_fit_struct &lfit) { | ||||||
lfit.x = Eigen::VectorXd::LinSpaced(num_timesteps, 0, num_timesteps - 1) / | ||||||
(num_timesteps - 1); | ||||||
lfit.y = data; | ||||||
} | ||||||
|
||||||
//work in progress, the rowwise colwise is messing up | ||||||
Eigen::MatrixXd DynamicProgramming::regression_lines(int start, int end, linear_fit_struct &lfit) { | ||||||
int n = end - start; | ||||||
Eigen::VectorXd x = lfit.x.segment(start, n); | ||||||
Eigen::MatrixXd y = lfit.y.block(start, 0, n, num_features); | ||||||
|
||||||
// Ensure x is in a two-dimensional form for broadcasting | ||||||
Eigen::MatrixXd x_matrix = x.replicate(1, num_features); | ||||||
|
||||||
// Calculate means | ||||||
double x_mean = x.mean(); | ||||||
Eigen::VectorXd y_mean = y.colwise().mean(); | ||||||
|
||||||
// Center the data around 0 | ||||||
Eigen::MatrixXd x_centered = x_matrix.colwise() - Eigen::VectorXd::Constant(n, x_mean); | ||||||
Eigen::MatrixXd y_centered = y.rowwise() - y_mean.transpose(); | ||||||
|
||||||
// Calculate slopes for each feature | ||||||
Eigen::VectorXd slope = (x_centered.array() * y_centered.array()).colwise().sum() / x_centered.array().square().sum(); | ||||||
|
||||||
// Calculate intercepts for each feature | ||||||
Eigen::VectorXd intercept = y_mean.array() - slope.array() * x_mean; | ||||||
|
||||||
// everything till this line is functioning fine; I might be overcomplicating it | ||||||
Eigen::MatrixXd regression_lines = (x_matrix.array().colwise() - x_mean).colwise() * slope.array() + intercept.transpose().array(); | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you just need to do something like |
||||||
|
||||||
return regression_lines; | ||||||
} | ||||||
|
||||||
double DynamicProgramming::l2_cost(const Eigen::MatrixXd &predicted_y, int start, int end) { | ||||||
Eigen::MatrixXd diff = predicted_y.block(start, 0, end - start, num_features) - | ||||||
data.block(start, 0, end - start, num_features); | ||||||
return std::sqrt(diff.array().square().sum()); | ||||||
} | ||||||
|
||||||
void DynamicProgramming::predicted(int start, int end, linear_fit_struct &lfit, | ||||||
Eigen::MatrixXd &predicted_y) { | ||||||
predicted_y.block(start, 0, end - start, num_features) = regression_lines(start, end, lfit); | ||||||
} | ||||||
|
||||||
double DynamicProgramming::cost_function(int start, int end) { | ||||||
linear_fit_struct lfit; | ||||||
regression_setup(lfit); | ||||||
Comment on lines
+86
to
+87
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Make the constructor for |
||||||
|
||||||
Eigen::MatrixXd predicted_y(num_timesteps, num_features); | ||||||
predicted(start, end, lfit, predicted_y); // Fill the predicted_y matrix | ||||||
|
||||||
return l2_cost(predicted_y, start, end); | ||||||
} | ||||||
|
||||||
void DynamicProgramming::initialize_cost_matrix() { | ||||||
scale_data(); | ||||||
tbb::parallel_for(tbb::blocked_range<int>(0, num_timesteps), | ||||||
[&](const tbb::blocked_range<int> &r) { | ||||||
for (int i = r.begin(); i < r.end(); ++i) { | ||||||
for (int j = i + min_size; j < num_timesteps; ++j) { | ||||||
cost_matrix(i, j) = cost_function(i, j); | ||||||
} | ||||||
} | ||||||
}); | ||||||
cost_computed = true; | ||||||
} | ||||||
|
||||||
std::pair<double, std::vector<int>> DynamicProgramming::seg(int start, int end, | ||||||
int num_bkps) { | ||||||
MemoKey key = {start, end, num_bkps}; | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
auto it = memo.find(key); | ||||||
if (it != memo.end()) { | ||||||
return it->second; | ||||||
} | ||||||
if (num_bkps == 0) { | ||||||
return {cost_matrix(start, end), {end}}; | ||||||
} | ||||||
|
||||||
std::pair<double, std::vector<int>> best = {std::numeric_limits<double>::infinity(), {}}; | ||||||
|
||||||
for (int bkp = start + min_size; bkp < end; bkp++) { | ||||||
if ((bkp - start) < min_size || (end - bkp) < min_size) { | ||||||
continue; | ||||||
} | ||||||
auto left = seg(start, bkp, num_bkps - 1); | ||||||
auto right = seg(bkp, end, 0); | ||||||
double cost = left.first + right.first; | ||||||
if (cost < best.first) { | ||||||
best.first = cost; | ||||||
best.second = left.second; | ||||||
best.second.push_back(bkp); | ||||||
best.second.insert(best.second.end(), right.second.begin(), | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't right always be one element long? If so, we could just push back. |
||||||
right.second.end()); | ||||||
} | ||||||
} | ||||||
} | ||||||
|
||||||
memo[key] = best; | ||||||
return best; | ||||||
} | ||||||
|
||||||
std::vector<int> DynamicProgramming::compute_breakpoints(int num_bkps) { | ||||||
auto result = seg(0, num_timesteps - 1, num_bkps); | ||||||
std::vector<int> breakpoints = result.second; | ||||||
return breakpoints; | ||||||
} | ||||||
|
||||||
std::vector<int> DynamicProgramming::fit(int num_bkps){ | ||||||
if (!cost_computed){ | ||||||
initialize_cost_matrix(); | ||||||
} | ||||||
return compute_breakpoints(num_bkps); | ||||||
} | ||||||
|
||||||
void set_parallelization(int num_threads) { | ||||||
static tbb::global_control gc(tbb::global_control::max_allowed_parallelism, | ||||||
num_threads); | ||||||
} | ||||||
|
||||||
DynamicProgramming::UpperTriangularMatrix & | ||||||
DynamicProgramming::getCostMatrix() { | ||||||
return cost_matrix; | ||||||
} | ||||||
|
||||||
void DynamicProgramming::setCostMatrix( | ||||||
const DynamicProgramming::UpperTriangularMatrix &value) { | ||||||
cost_matrix = value; | ||||||
} | ||||||
Comment on lines
+165
to
+168
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should anyone be setting the full cost matrix? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nope! honestly the setter functions aren't needed anymore. I can probably delete them and just set the getters as actual functions rather than properties. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Resolved but not deleted. |
||||||
|
||||||
int main() { return 0; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this class is supposed to be used by users, Perhaps some additional documentation would be useful here alongside a usage example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that something that goes into the comments of this class or in the documentation/tutorial for dupin? I can add this in comments if needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation here. A tutorial using
DynP
may be useful in the future.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeh, so check for example SweepDetector class in: https://github.com/glotzerlab/dupin/blob/main/dupin/detect/detect.py
You might want to write what this class is used for and what implementation does it use in 2-3 short sentences.
Additionally, if this class should be used differently to the most of the other classes you might want to add a very short example of how to use it. Not sure if thats the case? For example: