Refactor TANE-based algorithms #378

iliya-b · 2024-03-22T22:31:48Z

Generalize Tane and PFDTane, add additional tests.

In order to check if the refactoring caused any performance loss, following experiments were performed.
The discovery task was run as cli.py --task=afd --algo=tane --error=0.05 --table=... with new and original versions of TANE implementation. Following heavy datasets were utilized: EpicMeds.csv, adult.csv, EpicVitals.csv.

Following list demonstrates measured running time of the old and new algorithms, correspondingly (confidence intervals of 95%, with 10 iterations):

EpicMeds.csv (old) 59.715925465099986 +- 0.1869874511220996
EpicMeds.csv (new) 59.5840122977 +- 0.06763601341304505
adult.csv (old) 24.654166058699996 +- 0.06323832294394492
adult.csv (new) 24.76226707977778 +- 0.09297212157319155
EpicVitals.csv (old) 10.6707755998 +- 0.11612311140862534
EpicVitals.csv (new) 10.7569084586 +- 0.0103879548810794

github-actions

clang-tidy made some suggestions

src/core/algorithms/fd/tane/tane.h

iliya-b · 2024-04-19T08:51:21Z

@vs9h I've fixed the architectural issues with Tane and PFDTane algorithms as you suggested in PR #300

vs9h

Firstly, I would like to suggest getting rid of the second pull request and doing everything within this one pull request. That is, adding the refactoring from the second pull request to this pull request.

I also have some suggestions regarding a small change in the code structure, I suggest splitting the files as follows:

├── tane
│ ├── enums.h
│ ├── model
│ │ ├── lattice_level.cpp
│ │ ├── lattice_level.h
│ │ ├── lattice_vertex.cpp
│ │ └── lattice_vertex.h
│ ├── pfdtane.cpp
│ ├── pfdtane.h
│ ├── tane_common.cpp
│ ├── tane_common.h
│ ├── tane.cpp
│ └── tane.h

That is, we will separate the base class into a separate file and the implementation of the algorithms into a separate file.

src/core/algorithms/fd/tane/tane.h

src/core/algorithms/fd/tane/tane.cpp

src/core/algorithms/fd/tane/tane.h

iliya-b · 2024-09-10T15:02:10Z

@vs9h I've fixed the issues with this PR. You mentioned another PR #396 , but that PR is still a draft and it rather introduces a few performance enhances into the algorithm and does not affect the architecture. The current PR blocks some other PRs, that's why I've kept only changes that are related to this PR (refactoring) for this moment. What do you think?

vs9h

Okay, let's deal with this pull request first. It's probably better this way.

src/core/algorithms/fd/tane/tane.h

src/core/algorithms/fd/tane/tane.cpp

src/core/algorithms/fd/tane/pfdtane.h

src/core/algorithms/fd/tane/pfdtane.cpp

src/core/algorithms/fd/tane/pfdtane.h

src/core/algorithms/fd/tane/pfdtane.cpp

src/core/algorithms/fd/tane/tane_common.cpp

vs9h · 2024-09-15T20:06:46Z

Also, split commits into at least two (tests in a separate commit)

iliya-b · 2024-09-16T21:05:50Z

@vs9h I've fixed these issues.

vs9h

Okay, I think we've come to something pretty good, I just left one small comment about public method in TaneCommon (after fixing that I will immediately click approve).

But I have some more interesting ideas (probably for the future) to improve the current solution a bit from an architectural point of view.

Still, I really don't like the idea of adding such public methods to the PFDTane class. I think it would be much better to try to somehow avoid using them.

I don't like that we put some information about the implementation in the public part, although we could avoid it somehow. All we need is to be able to get error information for a particular dependency.

For example, we could try to add a method that would allow the user to get information about the error of a particular functional dependency. (user can pass RawFD and get the information about the error of this dependency: static double PFDTane::CalculatePFDError(const RawFD& rawfd)).

Or another interesting idea is to add a separate class for pfd (which will probably inherit from fd) and in this class we will also store information about the error level of the dependency (there will probably be a lot of challenges when trying to add this, how well it fits into the existing code).

In this case, of course, we will have to add a new PFDAlgorithm class. This will add several other problems that will need to be solved. For example, we will need to do something with the PliBasedFDAlgorithm class, since it inherits from FDAlgorithm, and so on.

And we will also have to change the inheritance hierarchy, we will have to try to take out the common part with the implementation of algorithms. We can certainly avoid inheritance from the base class (TaneCommon) if we take a close look at the existing implementations of common parts for some algorithms (for example, for Pyro and HyFD).

At the same time, we will most likely be able to completely get rid of the use of virtual methods.

All the thoughts that I wrote above are just reflections on possible improvements and, in my opinion, they will help us get a better solution from an architectural point of view. If we can also give the user precise information about the error, then this will be very convenient for the user. But at the same time, the current solution is acceptable.

src/core/algorithms/fd/tane/tane_common.h

iliya-b changed the title ~~Generalize TANE-based algorithms,~~ Refactor TANE-based algorithms Mar 22, 2024

iliya-b marked this pull request as draft March 22, 2024 22:38

github-actions bot reviewed Mar 22, 2024

View reviewed changes

src/core/algorithms/fd/tane/tane.h Show resolved Hide resolved

iliya-b force-pushed the pfdtane-generalize branch 2 times, most recently from c70c5bc to 0a0ef54 Compare March 23, 2024 13:08

iliya-b force-pushed the pfdtane-generalize branch 11 times, most recently from 59a8991 to 98159e0 Compare April 17, 2024 20:04

iliya-b marked this pull request as ready for review April 17, 2024 20:24

iliya-b marked this pull request as draft April 18, 2024 13:38

iliya-b force-pushed the pfdtane-generalize branch 4 times, most recently from 13016d9 to 84cf071 Compare April 18, 2024 22:25

iliya-b marked this pull request as ready for review April 19, 2024 07:57

vs9h requested changes Aug 17, 2024

View reviewed changes

iliya-b force-pushed the pfdtane-generalize branch 5 times, most recently from a7b89f6 to 0b202ac Compare September 10, 2024 13:44

iliya-b force-pushed the pfdtane-generalize branch from 0b202ac to 5ea3e11 Compare September 10, 2024 13:47

iliya-b requested a review from vs9h September 10, 2024 15:02

egshnov mentioned this pull request Sep 10, 2024

Add new measures to Tane #458

Merged

iliya-b force-pushed the pfdtane-generalize branch from 5ea3e11 to 57905a7 Compare September 14, 2024 19:44

vs9h requested changes Sep 15, 2024

View reviewed changes

iliya-b force-pushed the pfdtane-generalize branch 3 times, most recently from 628ce89 to 646bc0f Compare September 16, 2024 20:14

iliya-b requested a review from vs9h September 16, 2024 21:05

vs9h reviewed Sep 20, 2024

View reviewed changes

src/core/algorithms/fd/tane/tane_common.h Outdated Show resolved Hide resolved

iliya-b force-pushed the pfdtane-generalize branch from 646bc0f to 400f9e8 Compare September 23, 2024 10:54

vs9h approved these changes Sep 23, 2024

View reviewed changes

iliya-b added 2 commits September 23, 2024 21:19

Generalize TANE and PFDTANE algorithms

7e05324

Add tests of PFDTane algorithm

250f0ba

chernishev force-pushed the pfdtane-generalize branch from 400f9e8 to 250f0ba Compare September 23, 2024 18:19

chernishev merged commit 53a86ea into Desbordante:main Sep 23, 2024
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor TANE-based algorithms #378

Refactor TANE-based algorithms #378

iliya-b commented Mar 22, 2024 •

edited

Loading

github-actions bot left a comment

iliya-b commented Apr 19, 2024

vs9h left a comment

iliya-b commented Sep 10, 2024

vs9h left a comment

vs9h commented Sep 15, 2024

iliya-b commented Sep 16, 2024

vs9h left a comment

Refactor TANE-based algorithms #378

Refactor TANE-based algorithms #378

Conversation

iliya-b commented Mar 22, 2024 • edited Loading

github-actions bot left a comment

Choose a reason for hiding this comment

iliya-b commented Apr 19, 2024

vs9h left a comment

Choose a reason for hiding this comment

iliya-b commented Sep 10, 2024

vs9h left a comment

Choose a reason for hiding this comment

vs9h commented Sep 15, 2024

iliya-b commented Sep 16, 2024

vs9h left a comment

Choose a reason for hiding this comment

iliya-b commented Mar 22, 2024 •

edited

Loading