-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic Programming c++ implementation #9
base: main
Are you sure you want to change the base?
Conversation
I did not delete the pull_request file from .github; I misunderstood and edited it and added it to my my own root directory. The original file is unchancged. |
Also format C++ files using clang-format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add dupin/detect/dynp.py
file that creates wrapper around DynamicProgramming
class.
Sorry, I don't have a more complete review, but I will continue it later this week. I wanted to go ahead and give you tasks that you could start on.
Also, let me know if you have questions on the new structure of the package.
using namespace std; | ||
using namespace Eigen; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use the namespace, explicitly scope them. It makes the code more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using namespace std; | |
using namespace Eigen; |
If you use std::
and Eigen::
then you don't need these lines.
src/dupin.h
Outdated
memo; | ||
|
||
int num_bkps; // Number of breakpoints to detect. | ||
int num_parameters; // Number of features in the dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be known from the passed matrix no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of this was just for testing/setting back in ye old days of input.txt. I can just make it use data.size now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still to be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep this we should be more elaborate and use num_features
and num_breakpoints
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if possible, expand on the variable names throughout. It will help future developers work with the code.
src/dupin.h
Outdated
Eigen::VectorXd x; // z Independent variable (time steps). | ||
}; | ||
|
||
public: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of these should be private or protected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many of these methods either don't need to exist the n_timesteps
n_parameters
stuff or should be private/protected. The constructor, setters, getters, and anything exported to Python should remain here.
still have the new function to add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comments for now. Thanks for the quick work.
src/dupin.h
Outdated
void initialize(int n) { | ||
length = n; | ||
matrix.resize(n * (n + 1) / 2, 0.0); | ||
row_indices.resize(n); | ||
for (int row = 0; row < n; ++row) { | ||
row_indices[row] = row * (2 * length - row + 1) / 2; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think yes because without a global cost_matrix variable, I will have to pass by reference the entire cost_matrix through the recursion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean can't we just initialize the matrix and discard the initialization function. Why do we create it and then size it?
src/dupin.h
Outdated
memo; | ||
|
||
int num_bkps; // Number of breakpoints to detect. | ||
int num_parameters; // Number of features in the dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still to be done.
src/dupin.h
Outdated
Eigen::VectorXd x; // z Independent variable (time steps). | ||
}; | ||
|
||
public: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many of these methods either don't need to exist the n_timesteps
n_parameters
stuff or should be private/protected. The constructor, setters, getters, and anything exported to Python should remain here.
void DynamicProgramming::setCostMatrix( | ||
const DynamicProgramming::UpperTriangularMatrix &value) { | ||
cost_matrix = value; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should anyone be setting the full cost matrix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope! honestly the setter functions aren't needed anymore. I can probably delete them and just set the getters as actual functions rather than properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved but not deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comments as you address my other ones (I you have resolved one of my points mark so on the comments page and it will be easier for my to know what you have handled.
dupin/detect/dynp.py
Outdated
""" | ||
self.dynp.set_threads(num_threads) | ||
|
||
def return_breakpoints(self) -> list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you clarify which part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you meant return_breakpoints, I removed it and just kept fit.
…ommit, rearranged class to private
New commit should address most of the changes you requested: let me know if the private/public structure works or else I can rewrite some of the functions |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #9 +/- ##
==========================================
- Coverage 76.08% 0.00% -76.09%
==========================================
Files 18 19 +1
Lines 1104 1113 +9
Branches 234 0 -234
==========================================
- Hits 840 0 -840
- Misses 221 1113 +892
+ Partials 43 0 -43 ☔ View full report in Codecov by Sentry. |
class DynP: | ||
"""Detects the change points in a time series. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this class is supposed to be used by users, Perhaps some additional documentation would be useful here alongside a usage example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that something that goes into the comments of this class or in the documentation/tutorial for dupin? I can add this in comments if needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation here. A tutorial using DynP
may be useful in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeh, so check for example SweepDetector class in: https://github.com/glotzerlab/dupin/blob/main/dupin/detect/detect.py
You might want to write what this class is used for and what implementation does it use in 2-3 short sentences.
Additionally, if this class should be used differently to the most of the other classes you might want to add a very short example of how to use it. Not sure if thats the case? For example:
def compute(self, orientations):
"""Calculates the per-particle and global order parameter.
Example::
>>> orientations = np.array([[1, 0, 0, 0]] * 100)
>>> director = np.array([1, 1, 0])
>>> nematic = freud.order.Nematic(director)
>>> nematic.compute(orientations)
freud.order.Nematic(u=[...])
Args:
orientations (:math:`\left(N_{particles}, 4 \right)` :class:`numpy.ndarray`):
Orientations to calculate the order parameter.
"""
"""Initialize the DynamicProgramming instance with given parameters.""" | ||
self.dynp = _DynP.DynamicProgramming(data, num_bkps, jump, min_size) | ||
|
||
def set_num_threads(self, num_threads: int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long term dev wise, if additional CPP methods will be added we will most likely want num_threads to be controlled on the level of the whole module? Would preprocessor be a good place to have this? @b-butler thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this to as a _DynP
module function and exporting it to dupin/util.py
is probably the best solution in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I would just move this function to util.py right? would just have to import _DynP in util.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
Installs: - TBB - Eigen3 - Ninja
Cannot use a reusable workflow as a step only as a job
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is looking good. I tried to give a thorough review this time since I am going on parental leave tomorrow. No rush on getting this done, as I won't be looking at GitHub for a couple of weeks, likely.
Also, I switched the build system to https://scikit-build-core.readthedocs.io/en/latest/getting_started.html. If you have problems building the package look there.
We need tests. See example tests in tests/detect/test_detect.py
.
class DynP: | ||
"""Detects the change points in a time series. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation here. A tutorial using DynP
may be useful in the future.
src/dupin.h
Outdated
void initialize(int n) { | ||
length = n; | ||
matrix.resize(n * (n + 1) / 2, 0.0); | ||
row_indices.resize(n); | ||
for (int row = 0; row < n; ++row) { | ||
row_indices[row] = row * (2 * length - row + 1) / 2; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean can't we just initialize the matrix and discard the initialization function. Why do we create it and then size it?
src/dupin.h
Outdated
memo; | ||
|
||
int num_bkps; // Number of breakpoints to detect. | ||
int num_parameters; // Number of features in the dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep this we should be more elaborate and use num_features
and num_breakpoints
.
src/dupin.h
Outdated
memo; | ||
|
||
int num_bkps; // Number of breakpoints to detect. | ||
int num_parameters; // Number of features in the dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if possible, expand on the variable names throughout. It will help future developers work with the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@npkamath Thank you for all the hard work. This is probably the last thorough review I will give. I am happy to give it a quick glance though if you'd like and feel free to reach out to me.
@DomFijan is taking over the day-to-day management of the package and can help you in the finishing touches. @joaander is also a great resource for specific questions on coding practices. If either of them advise you a different way than I did feel free to disregard my prior comments.
I am approving as to not block any future developments, but please don't merge until @DomFijan approves.
Methods | ||
------- | ||
__init__(self, data: np.ndarray, num_bkps: int, jump: int, min_size: int) | ||
Initializes the DynamicProgramming instance with the time series data | ||
and parameters. | ||
set_num_threads(self, num_threads: int) | ||
Sets the number of threads to be used for parallel computation. | ||
fit(self, num_bkps: int) -> list | ||
Calculates the cost matrix and identifies the optimal breakpoints in | ||
the time series data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These get documented by the method's docstring itself.
def __init__( | ||
self, data: np.ndarray, num_bkps: int, jump: int, min_size: int | ||
): | ||
"""Initialize the DynamicProgramming instance with given parameters.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you document a class's __init__.py
method, you need to document its parameters here as well. The alternative is to document both parameters and attributes in the class's docstring.
Parameters | ||
---------- | ||
num_threads: int | ||
The number of threads to use during computation. Default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a parameter default. If you want to add a comment about the default parallelization it should go above this section. Also, the message should state that we use all available cores unless set otherwise.
using namespace std; | ||
using namespace Eigen; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using namespace std; | |
using namespace Eigen; |
If you use std::
and Eigen::
then you don't need these lines.
double slope = x_centered.dot(y_centered) / x_centered.squaredNorm(); | ||
double intercept = y_mean - slope * x_mean; | ||
// everything till this line is functioning fine; I might be overcomplicating it | ||
Eigen::MatrixXd regression_lines = (x_matrix.array().colwise() - x_mean).colwise() * slope.array() + intercept.transpose().array(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you just need to do something like x_matrix * slope.replicate(x.size(), 1) + interept.replicate(x.size(), 1)
though I am not too familiar with Eigen.
// Structure for storing linear regression parameters. | ||
struct linear_fit_struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be a good idea, and hopefully not too much work as the code is pretty modular already.
src/dupin.cpp
Outdated
double x_mean = x.mean(); | ||
double y_mean = y.mean(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What were your thoughts @npkamath?
Description
Added a c++ implementation of the dynamic programming class. Header, cpp, dependencies and requirements are all in the the Dynp folder.
Motivation
Optimizes the dynp class to use significantly less time and memory; optimized using c++ eigen and intel tbb libraries.
Do Changes Introduce Breaking Changes
Yes.
Have you (if appropriate)