-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor TANE-based algorithms #378
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
c70c5bc
to
0a0ef54
Compare
59a8991
to
98159e0
Compare
13016d9
to
84cf071
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Firstly, I would like to suggest getting rid of the second pull request and doing everything within this one pull request. That is, adding the refactoring from the second pull request to this pull request.
I also have some suggestions regarding a small change in the code structure, I suggest splitting the files as follows:
├── tane
│ ├── enums.h
│ ├── model
│ │ ├── lattice_level.cpp
│ │ ├── lattice_level.h
│ │ ├── lattice_vertex.cpp
│ │ └── lattice_vertex.h
│ ├── pfdtane.cpp
│ ├── pfdtane.h
│ ├── tane_common.cpp
│ ├── tane_common.h
│ ├── tane.cpp
│ └── tane.h
That is, we will separate the base class into a separate file and the implementation of the algorithms into a separate file.
a7b89f6
to
0b202ac
Compare
0b202ac
to
5ea3e11
Compare
@vs9h I've fixed the issues with this PR. You mentioned another PR #396 , but that PR is still a draft and it rather introduces a few performance enhances into the algorithm and does not affect the architecture. The current PR blocks some other PRs, that's why I've kept only changes that are related to this PR (refactoring) for this moment. What do you think? |
5ea3e11
to
57905a7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, let's deal with this pull request first. It's probably better this way.
Also, split commits into at least two (tests in a separate commit) |
628ce89
to
646bc0f
Compare
@vs9h I've fixed these issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think we've come to something pretty good, I just left one small comment about public method in TaneCommon
(after fixing that I will immediately click approve).
But I have some more interesting ideas (probably for the future) to improve the current solution a bit from an architectural point of view.
Still, I really don't like the idea of adding such public methods to the PFDTane
class. I think it would be much better to try to somehow avoid using them.
I don't like that we put some information about the implementation in the public part, although we could avoid it somehow. All we need is to be able to get error information for a particular dependency.
For example, we could try to add a method that would allow the user to get information about the error of a particular functional dependency. (user can pass RawFD
and get the information about the error of this dependency: static double PFDTane::CalculatePFDError(const RawFD& rawfd)
).
Or another interesting idea is to add a separate class for pfd (which will probably inherit from fd) and in this class we will also store information about the error level of the dependency (there will probably be a lot of challenges when trying to add this, how well it fits into the existing code).
In this case, of course, we will have to add a new PFDAlgorithm
class. This will add several other problems that will need to be solved. For example, we will need to do something with the PliBasedFDAlgorithm
class, since it inherits from FDAlgorithm
, and so on.
And we will also have to change the inheritance hierarchy, we will have to try to take out the common part with the implementation of algorithms. We can certainly avoid inheritance from the base class (TaneCommon
) if we take a close look at the existing implementations of common parts for some algorithms (for example, for Pyro
and HyFD
).
At the same time, we will most likely be able to completely get rid of the use of virtual methods.
All the thoughts that I wrote above are just reflections on possible improvements and, in my opinion, they will help us get a better solution from an architectural point of view. If we can also give the user precise information about the error, then this will be very convenient for the user. But at the same time, the current solution is acceptable.
646bc0f
to
400f9e8
Compare
400f9e8
to
250f0ba
Compare
Generalize Tane and PFDTane, add additional tests.
In order to check if the refactoring caused any performance loss, following experiments were performed.
The discovery task was run as
cli.py --task=afd --algo=tane --error=0.05 --table=...
with new and original versions of TANE implementation. Following heavy datasets were utilized: EpicMeds.csv, adult.csv, EpicVitals.csv.Following list demonstrates measured running time of the old and new algorithms, correspondingly (confidence intervals of 95%, with 10 iterations):