Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor TANE-based algorithms #378

Merged
merged 2 commits into from
Sep 23, 2024

Conversation

iliya-b
Copy link
Contributor

@iliya-b iliya-b commented Mar 22, 2024

Generalize Tane and PFDTane, add additional tests.

In order to check if the refactoring caused any performance loss, following experiments were performed.
The discovery task was run as cli.py --task=afd --algo=tane --error=0.05 --table=... with new and original versions of TANE implementation. Following heavy datasets were utilized: EpicMeds.csv, adult.csv, EpicVitals.csv.

Following list demonstrates measured running time of the old and new algorithms, correspondingly (confidence intervals of 95%, with 10 iterations):

  1. EpicMeds.csv (old) 59.715925465099986 +- 0.1869874511220996
  2. EpicMeds.csv (new) 59.5840122977 +- 0.06763601341304505
  3. adult.csv (old) 24.654166058699996 +- 0.06323832294394492
  4. adult.csv (new) 24.76226707977778 +- 0.09297212157319155
  5. EpicVitals.csv (old) 10.6707755998 +- 0.11612311140862534
  6. EpicVitals.csv (new) 10.7569084586 +- 0.0103879548810794

@iliya-b iliya-b changed the title Generalize TANE-based algorithms, Refactor TANE-based algorithms Mar 22, 2024
@iliya-b iliya-b marked this pull request as draft March 22, 2024 22:38
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

src/core/algorithms/fd/tane/tane.h Show resolved Hide resolved
@iliya-b iliya-b force-pushed the pfdtane-generalize branch 2 times, most recently from c70c5bc to 0a0ef54 Compare March 23, 2024 13:08
@iliya-b iliya-b force-pushed the pfdtane-generalize branch 11 times, most recently from 59a8991 to 98159e0 Compare April 17, 2024 20:04
@iliya-b iliya-b marked this pull request as ready for review April 17, 2024 20:24
@iliya-b iliya-b marked this pull request as draft April 18, 2024 13:38
@iliya-b iliya-b force-pushed the pfdtane-generalize branch 4 times, most recently from 13016d9 to 84cf071 Compare April 18, 2024 22:25
@iliya-b iliya-b marked this pull request as ready for review April 19, 2024 07:57
@iliya-b
Copy link
Contributor Author

iliya-b commented Apr 19, 2024

@vs9h I've fixed the architectural issues with Tane and PFDTane algorithms as you suggested in PR #300

Copy link
Collaborator

@vs9h vs9h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, I would like to suggest getting rid of the second pull request and doing everything within this one pull request. That is, adding the refactoring from the second pull request to this pull request.

I also have some suggestions regarding a small change in the code structure, I suggest splitting the files as follows:

├── tane
│ ├── enums.h
│ ├── model
│ │ ├── lattice_level.cpp
│ │ ├── lattice_level.h
│ │ ├── lattice_vertex.cpp
│ │ └── lattice_vertex.h
│ ├── pfdtane.cpp
│ ├── pfdtane.h
│ ├── tane_common.cpp
│ ├── tane_common.h
│ ├── tane.cpp
│ └── tane.h

That is, we will separate the base class into a separate file and the implementation of the algorithms into a separate file.

src/core/algorithms/fd/tane/tane.h Outdated Show resolved Hide resolved
src/core/algorithms/fd/tane/tane.cpp Outdated Show resolved Hide resolved
src/core/algorithms/fd/tane/tane.cpp Outdated Show resolved Hide resolved
src/core/algorithms/fd/tane/tane.cpp Outdated Show resolved Hide resolved
src/core/algorithms/fd/tane/tane.cpp Outdated Show resolved Hide resolved
src/core/algorithms/fd/tane/tane.h Outdated Show resolved Hide resolved
@iliya-b iliya-b force-pushed the pfdtane-generalize branch 5 times, most recently from a7b89f6 to 0b202ac Compare September 10, 2024 13:44
@iliya-b
Copy link
Contributor Author

iliya-b commented Sep 10, 2024

@vs9h I've fixed the issues with this PR. You mentioned another PR #396 , but that PR is still a draft and it rather introduces a few performance enhances into the algorithm and does not affect the architecture. The current PR blocks some other PRs, that's why I've kept only changes that are related to this PR (refactoring) for this moment. What do you think?

@iliya-b iliya-b requested a review from vs9h September 10, 2024 15:02
@egshnov egshnov mentioned this pull request Sep 10, 2024
Copy link
Collaborator

@vs9h vs9h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let's deal with this pull request first. It's probably better this way.

src/core/algorithms/fd/tane/tane.h Outdated Show resolved Hide resolved
src/core/algorithms/fd/tane/tane.cpp Outdated Show resolved Hide resolved
src/core/algorithms/fd/tane/tane.cpp Outdated Show resolved Hide resolved
src/core/algorithms/fd/tane/pfdtane.h Outdated Show resolved Hide resolved
src/core/algorithms/fd/tane/pfdtane.cpp Show resolved Hide resolved
src/core/algorithms/fd/tane/pfdtane.cpp Outdated Show resolved Hide resolved
src/core/algorithms/fd/tane/pfdtane.h Show resolved Hide resolved
src/core/algorithms/fd/tane/pfdtane.cpp Outdated Show resolved Hide resolved
src/core/algorithms/fd/tane/pfdtane.cpp Outdated Show resolved Hide resolved
src/core/algorithms/fd/tane/tane_common.cpp Outdated Show resolved Hide resolved
@vs9h
Copy link
Collaborator

vs9h commented Sep 15, 2024

Also, split commits into at least two (tests in a separate commit)

@iliya-b iliya-b force-pushed the pfdtane-generalize branch 3 times, most recently from 628ce89 to 646bc0f Compare September 16, 2024 20:14
@iliya-b
Copy link
Contributor Author

iliya-b commented Sep 16, 2024

@vs9h I've fixed these issues.

@iliya-b iliya-b requested a review from vs9h September 16, 2024 21:05
Copy link
Collaborator

@vs9h vs9h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think we've come to something pretty good, I just left one small comment about public method in TaneCommon (after fixing that I will immediately click approve).

But I have some more interesting ideas (probably for the future) to improve the current solution a bit from an architectural point of view.

Still, I really don't like the idea of ​​adding such public methods to the PFDTane class. I think it would be much better to try to somehow avoid using them.

I don't like that we put some information about the implementation in the public part, although we could avoid it somehow. All we need is to be able to get error information for a particular dependency.

For example, we could try to add a method that would allow the user to get information about the error of a particular functional dependency. (user can pass RawFD and get the information about the error of this dependency: static double PFDTane::CalculatePFDError(const RawFD& rawfd)).

Or another interesting idea is to add a separate class for pfd (which will probably inherit from fd) and in this class we will also store information about the error level of the dependency (there will probably be a lot of challenges when trying to add this, how well it fits into the existing code).

In this case, of course, we will have to add a new PFDAlgorithm class. This will add several other problems that will need to be solved. For example, we will need to do something with the PliBasedFDAlgorithm class, since it inherits from FDAlgorithm, and so on.

And we will also have to change the inheritance hierarchy, we will have to try to take out the common part with the implementation of algorithms. We can certainly avoid inheritance from the base class (TaneCommon) if we take a close look at the existing implementations of common parts for some algorithms (for example, for Pyro and HyFD).

At the same time, we will most likely be able to completely get rid of the use of virtual methods.

All the thoughts that I wrote above are just reflections on possible improvements and, in my opinion, they will help us get a better solution from an architectural point of view. If we can also give the user precise information about the error, then this will be very convenient for the user. But at the same time, the current solution is acceptable.

src/core/algorithms/fd/tane/tane_common.h Outdated Show resolved Hide resolved
@chernishev chernishev merged commit 53a86ea into Desbordante:main Sep 23, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants