-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathTODO
251 lines (222 loc) · 10.2 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
[2012-10-29]
BUG-FIXES:
** Fix a bad_cast error arising when mining rule with different
head/body types. See /home/minerule/selex/minerules for an example.
-- IMPLEMENTARE UNA FUNZIONE SAME_TYPE (o CHECK_TYPE) per verificare
se due itemtype puntano allo stesso tipo di elementi. Usare questa
funzione in BodyMap::add prima di confrontare head e body items.
** Insert statements are not compatible with postgres syntax. It should
be fixed so that the code can run on both postgres and mysql.
[older]
BUG-FIXES:
* Fix a bad_cast error arising when mining rule with different
head/body types. See /home/minerule/selex/minerules for an example.
* Bug IDIncrementalAlgorithm does not filter out rules having
small confidence/support values.
* Bug that occurs when processing queries which imply null
constraints (for instance if I ask A>'z' and there is no
A satisfying such a constraint, we remove the constraint).
Unfortunately, it seems that we do not update the list properly
(we obtain something as A>'').
* Bug in the MRDATABASE library (reported by Careggio via email).
Support evaluation bug... (it is likely to be related to the
sql types in the table. For instance, columns of type varchar
(instead of char) type may cause this problem.
HIGH PRIORITY TO-DO:
OPTIMIZER:
Implement the complete dominance recognition functionality.
(Up to now, it implements the dominances due to logic
implications, but not the ones due to the constants in
the predicates).
Implement a ``greedy covering'' based optimization step.
This should become the "core" of the optimizer, it should
try to recover new results from past ones by means of
intersections (xor unions) of past result sets.
INCREMENTAL ALGORITHMS:
Implement non-item-dependent incremental algorithms
(Partially done: the CDIncrementalAlgorithm written
Subtasks: Integrate Marco's Mining Indexes into the Minerule
and implement bitmap based ones.
NEW ALGORITHMS:
New mining algorithms which exploit item dependent
properties? They may exploit, for instance, pre-built
aggregate functions.
TO-DO plan for version 1.0.3(final):
OPTION HANDLING:
In v1.0.3 has been added the possibility of using modifiers
in the form "%m" inside logfile expressions. The modifiers
are substituted with the filename from which the minerule
is read or with "mr" if no filename have been specified at
the time the option is parsed. In order to
support better this feature the parsing of the options
should be improved. In particular all options involving
the text of the minerule (i.e., -i and -m), should be
processed before any other, and the parsing of the minerule
should be anticipated. In this way the parameter substitution
would work regardless the ordering of the options (which
is otherwise important since it defines the ordering of
option overriding).
DONE[
Moreover a new option which support
the substitution of the name of the minerule instead of
the name of the file in which it is containted may be
easily introduced (in this case may be it should be
better to use "%i" for the modifiers which refers to the
filename and "%m" for the one which refers to the minerule
name)].
TO-DO plan for version 1.2
BUG-FIXES: (relax-predicate-bug)
It seems that the optimizer does not relax the predicates
correctly when the predicate is of the form 'A<x' and
there are no values in the db less than 'x' (it's likely
that something similar happens for predicate '>').At this
time it seems that the optimizer substite the predicate
with 'A<=' which is clearly wrong. One possible fix is
to find the min(y) in the DB and substite the predicate with
'A<y'. Another fix is to substite the whole predicate with
'1=0' (may be this last hint is best since it works also
when the DB is empty).
OPTIMIZER CATALOG:
Split mr_query in two tables. The idea is to keep separate
the list of queries which have been really executed, since
it is not useful to run the optimizer on queries which
have been found to be equivalent to some previous query.
We could have two tables:
mr_all_queries
mr_opt_queries
In mr_all_queries will be stored the original text of
all queries which have been issued by the user , while
in mr_opt_queries will be stored the normalized text of
queries which have been really executed.
As a side effect we have that we can rebuild the query
database by deleting mr_opt_queries and rerun all queries
in mr_all_queries (this is useful when the underlying
database changes, but the user wants to use an update
version of all past results).
OPTIMIZER:
Extends the predicate relaxation function in order to
work with EVERY predicates, not only with < and >.
For instance if the user write a condition like A<=100,
but 100 CANNOT be found in the db, the value should
be substituted with the maximal value which satisfies
A<=100.
ALGORITHMS:
Gestire parametri head/body cadinality.
partitionWithClusters - DONE
partitionBase - DONE (*)
fpGrowth - DONE (*)
(*) - In questo momento e' gestito tramite un filtro in fase di
scrittura delle regole. Sarebbe probabilmente MOLTO piu'
efficiente fare qualcosa sullo stile di quanto fatto per
partitionWithClusters.
In particolare: in partitionBase, si puo' smettere di
andare in profondita' non appena levelIn (vedi codice in
partitionBaseLauncher) e' maggiore o uguale al massimo
tra bodyCardinalities().getMax() e
headCardinalities().getMax().
In aggiunta, nella procedura che genera effettivamente le
regole si puo' evitare di procedere oltre ogni volta che si
genera una regola che non soddisfa le condizioni.
Qualcosa di simile si puo' fare anche per fpGrowth.
LOGS:
This is a low low low low priority task... it would
be nice to add colors to the log output. Ansy colors
can be displayed using the escape sequence \e[x;ym
where x and y are foreground and background colors.
For instance printf("\e[33;40mprova\e[0m")
prints the string
"prova" in brown on black and then reset the colors
to their default values.
Note that one should be aware of terminal support for
colors before trying anything like this.
In order to implement the color support the following
has to be done:
update MRLogger in order to support colors
update MineruleOptions with an options which
stop/enable the use of colors
everywhere check if the terminal support colors.
Here it follows a table containing the ANSI codes
which may be helpful:
Code Effect
--------------------------------------------------
0 Reset to default, as set in Terminal Preferences.
1 Bold
4 Underline
5 Blink
7 Reverse video, swaps the foreground (text) and
background colors.
Color Text Background
--------------------------------------------------
Black 30 40
Red 31 41
Green 32 42
Yellow 33 43
Blue 34 44
Magenta 35 45
Cyan 36 46
White 37 47
Prefix a color with 1; in order to obtain its highlighted
version (for instance \e[33;40m actually display brown on
black, this is becaus brown is the not-highlighted version
of yellow, \e[1;33;40m displays yellow on black).
To check for terminal capabilities seems to be hard, maybe
it is better to use a user pref (setting it to false by
default... which is the safest setting).
TO-DO plan for version 1.4
MRDATABASE:
Add support for smart pointers:
Rationale: the HEAD BODY GROUP CID etc. etc. are
read only values -> they can be shared by all object
which refers to them. Using smart pointers should
improve very much the memory footprint of the program
(actually this is true for string values and composite
attributes).
Add support for on-the-fly coding.
If we add support for smart pointers we could
wrap those object within a new one which can hold
an integer value which informs about the mutual ordering
of values. The idea is: After a first db read has been
done we sort the values and assigns an integer to each
of them accordingly to their position in the sorted
structure. Once this is done we can compare the values
using this integer value instead of doing time-consuming
comparison among string or structures.
[new 13/2/2017] may be a possible approach can be to
use an hash table to hold a single representative for
each value load from the db. To do so, each value will
need to be wrapped into an hashable structure specifying
how to hash it. The idea is that the value used by the
algorithms would be the pointer to these structures, which
can then be compared directly for equality (which is the
only operation needed by most/all algorithms).
PREPROCESSING:
Merge the preprocessor (i.e., classes in PreProcessor)
in the project (???) (note that if support for on-the-fly
coding is added preprocessing should become rarely useful!)
ALGORITHMS:
Implement a better version of partition and FpGrowth
Implement the incremental algorithms (partially done
since v1.0.6 which adds the ID and CD incremental
algorithms)
OPTIMIZER:
Recognize Dominance relations
DONE since v1.0.1! Substitute A<x with A<=y
This is needed in order to complete the 'canonization'
of minerules. The y value is the first value in the db
which satisfies A<x. Not sure about this: we
introduce another access to the db each time we start
a new minerule. Note that we do need to do that only
for the current minerule (not for the ones in the
catalog since we can assume that they are already
in canonical form).
LOGS:
Implement log onto a database table (???)
MAKEFILES:
Switch to autoconf/automake (???)
TO-DO plan for version 2.0
Clean the source code. In particular get rids of
old c structures which has been inherited by Ortale's
predicate parser. This may be hard since those structures
are nowadays used by many algorithms (most notably: parts
of the optimizer)