KGpip is evaluated against state-of-the-art on the 121 benchmark datasets shown bellow:
ID | Dataset | Rows | Columns | Classes | Numerical | Categorical | Textual | Size (MB) | Task | Source | Papers |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | adult | 48842 | 14 | 2 | 6 | 8 | 0 | 5.7 | binary | AutoML | FLAML, AL |
2 | airlines | 539383 | 7 | 2 | 4 | 3 | 0 | 18.3 | binary | AutoML | FLAML |
3 | albert | 425240 | 78 | 2 | 78 | 0 | 0 | 155.4 | binary | AutoML | FLAML |
4 | Amazon_employee_access | 32769 | 9 | 2 | 9 | 0 | 0 | 1.9 | binary | AutoML | FLAML |
5 | APSFailure | 76000 | 170 | 2 | 170 | 0 | 0 | 74.8 | binary | AutoML | FLAML |
6 | Australian | 690 | 14 | 2 | 14 | 0 | 0 | 0 | binary | AutoML | FLAML |
7 | bank-marketing | 45211 | 16 | 2 | 7 | 9 | 0 | 3.5 | binary | AutoML | FLAML |
8 | blood-transfusion-service-center | 748 | 4 | 2 | 4 | 0 | 0 | 0 | binary | AutoML | FLAML |
9 | christine | 5418 | 1636 | 2 | 1636 | 0 | 0 | 31.4 | binary | AutoML | FLAML |
10 | credit-g | 1000 | 20 | 2 | 7 | 13 | 0 | 0.1 | binary | AutoML | FLAML |
11 | guillermo | 20000 | 4296 | 2 | 4296 | 0 | 0 | 424.5 | binary | AutoML | FLAML |
12 | higgs | 98050 | 28 | 2 | 28 | 0 | 0 | 43.3 | binary | AutoML | FLAML, VolcanoML |
13 | jasmine | 2984 | 144 | 2 | 144 | 0 | 0 | 1.7 | binary | AutoML | FLAML |
14 | kc1 | 2109 | 21 | 2 | 21 | 0 | 0 | 0.1 | binary | AutoML | FLAML, VolcanoML |
15 | KDDCup09_appetency | 50000 | 230 | 2 | 192 | 38 | 0 | 32.8 | binary | AutoML | FLAML |
16 | kr-vs-kp | 3196 | 36 | 2 | 0 | 36 | 0 | 0.5 | binary | AutoML | FLAML |
17 | MiniBooNE | 130064 | 50 | 2 | 50 | 0 | 0 | 69.4 | binary | AutoML | FLAML |
18 | nomao | 34465 | 118 | 2 | 118 | 0 | 0 | 19.3 | binary | AutoML | FLAML |
19 | numerai28.6 | 96320 | 21 | 2 | 21 | 0 | 0 | 34.9 | binary | AutoML | FLAML |
20 | phoneme | 5404 | 5 | 2 | 5 | 0 | 0 | 0.3 | binary | AutoML | FLAML, VolcanoML |
21 | riccardo | 20000 | 4296 | 2 | 4296 | 0 | 0 | 414 | binary | AutoML | FLAML |
22 | sylvine | 5124 | 20 | 2 | 20 | 0 | 0 | 0.4 | binary | AutoML | FLAML |
23 | car | 1728 | 6 | 4 | 0 | 6 | 0 | 0.1 | multi-class | AutoML | FLAML |
24 | cnae-9 | 1080 | 856 | 9 | 856 | 0 | 0 | 1.8 | multi-class | AutoML | FLAML |
25 | connect-4 | 67557 | 42 | 3 | 42 | 0 | 0 | 5.5 | multi-class | AutoML | FLAML |
26 | covertype | 581012 | 54 | 7 | 54 | 0 | 0 | 71.7 | multi-class | AutoML | FLAML, AL |
27 | dilbert | 10000 | 2000 | 5 | 2000 | 0 | 0 | 176 | multi-class | AutoML | FLAML |
28 | dionis | 416188 | 60 | 355 | 60 | 0 | 0 | 110.1 | multi-class | AutoML | FLAML |
29 | fabert | 8237 | 800 | 7 | 800 | 0 | 0 | 13 | multi-class | AutoML | FLAML |
30 | Fashion-MNIST | 70000 | 784 | 10 | 784 | 0 | 0 | 148 | multi-class | AutoML | FLAML |
31 | helena | 65196 | 27 | 100 | 27 | 0 | 0 | 14.6 | multi-class | AutoML | FLAML |
32 | jannis | 83733 | 54 | 4 | 54 | 0 | 0 | 36.7 | multi-class | AutoML | FLAML |
33 | jungle_chess_2pcs_raw_endgame_complete | 44819 | 6 | 3 | 6 | 0 | 0 | 0.6 | multi-class | AutoML | FLAML |
34 | mfeat-factors | 2000 | 216 | 10 | 216 | 0 | 0 | 1.4 | multi-class | AutoML | FLAML |
35 | robert | 10000 | 7200 | 10 | 7200 | 0 | 0 | 268.1 | multi-class | AutoML | FLAML |
36 | segment | 2310 | 19 | 7 | 19 | 0 | 0 | 0.3 | multi-class | AutoML | FLAML, VolcanoML |
37 | shuttle | 58000 | 9 | 7 | 9 | 0 | 0 | 1.5 | multi-class | AutoML | FLAML |
38 | vehicle | 846 | 18 | 4 | 18 | 0 | 0 | 0.1 | multi-class | AutoML | FLAML |
39 | volkert | 58310 | 180 | 10 | 180 | 0 | 0 | 65.1 | multi-class | AutoML | FLAML |
40 | 2dplanes | 40768 | 10 | - | 10 | 0 | 0 | 2.4 | regression | PMLB | FLAML |
41 | bng_breastTumor | 116640 | 9 | - | 9 | 0 | 0 | 6 | regression | PMLB | FLAML |
42 | bng_echomonths | 17496 | 9 | - | 9 | 0 | 0 | 2.3 | regression | PMLB | FLAML |
43 | bng_lowbwt | 31104 | 9 | - | 9 | 0 | 0 | 2.4 | regression | PMLB | FLAML |
44 | bng_pbc | 1000000 | 18 | - | 18 | 0 | 0 | 220.8 | regression | PMLB | FLAML |
45 | bng_pharynx | 1000000 | 10 | - | 10 | 0 | 0 | 68.6 | regression | PMLB | FLAML |
46 | bng_pwLinear | 177147 | 10 | - | 10 | 0 | 0 | 10.6 | regression | PMLB | FLAML |
47 | fried | 40768 | 10 | - | 10 | 0 | 0 | 8.1 | regression | PMLB | FLAML |
48 | house_16H | 22784 | 16 | - | 16 | 0 | 0 | 5.8 | regression | PMLB | FLAML |
49 | house_8L | 22784 | 8 | - | 8 | 0 | 0 | 2.8 | regression | PMLB | FLAML |
50 | houses | 20640 | 8 | - | 8 | 0 | 0 | 1.8 | regression | PMLB | FLAML |
51 | mv | 40768 | 11 | - | 11 | 0 | 0 | 5.9 | regression | PMLB | FLAML |
52 | poker | 1025010 | 10 | - | 10 | 0 | 0 | 23 | regression | PMLB | FLAML |
53 | pol | 15000 | 48 | - | 48 | 0 | 0 | 3 | regression | PMLB | FLAML |
54 | breast_cancer_wisconsin | 569 | 30 | 2 | 30 | 0 | 0 | 0.1 | binary | PMLB | AL |
55 | detecting-insults-in-social-commentary | 3947 | 2 | 2 | 0 | 1 | 1 | 0.8 | binary | Kaggle | AL |
56 | fri_c1_1000_25 | 1000 | 25 | 2 | 25 | 0 | 0 | 0.2 | binary | OpenML | AL |
57 | Hill_Valley_with_noise | 1212 | 100 | 2 | 100 | 0 | 0 | 0.8 | binary | PMLB | AL |
58 | Hill_Valley_without_noise | 1212 | 100 | 2 | 100 | 0 | 0 | 1.5 | binary | PMLB | AL |
59 | ionosphere | 351 | 34 | 2 | 34 | 0 | 0 | 0.1 | binary | PMLB | AL |
60 | MagicTelescope | 19020 | 11 | 2 | 11 | 0 | 0 | 1.5 | binary | OpenML | AL |
61 | OVA_Breast | 1545 | 10936 | 2 | 10936 | 0 | 0 | 103.3 | binary | OpenML | AL |
62 | pc4 | 1458 | 37 | 2 | 37 | 0 | 0 | 0.2 | binary | OpenML | AL, VolcanoML |
63 | quake | 2178 | 3 | 2 | 3 | 0 | 0 | 0 | binary | OpenML | AL, VolcanoML |
64 | sick | 3772 | 29 | 2 | 7 | 22 | 0 | 0.3 | binary | OpenML | AL, VolcanoML |
65 | spambase | 4601 | 57 | 2 | 57 | 0 | 0 | 1.1 | binary | PMLB | AL, VolcanoML |
66 | titanic | 891 | 11 | 2 | 6 | 4 | 1 | 0.1 | binary | Kaggle | AL |
67 | car_evaluation | 1728 | 21 | 4 | 21 | 0 | 0 | 0.1 | multi-class | PMLB | AL |
68 | glass | 205 | 9 | 5 | 9 | 0 | 0 | 0 | multi-class | PMLB | AL |
69 | kropt | 28056 | 6 | 18 | 3 | 3 | 0 | 0.5 | multi-class | OpenML | AL, VolcanoML |
70 | mnist_784 | 70000 | 784 | 10 | 784 | 0 | 0 | 122 | multi-class | OpenML | AL, VolcanoML |
71 | sentiment-analysis-on-movie-reviews | 156060 | 3 | 5 | 2 | 0 | 1 | 8.1 | multi-class | Kaggle | AL |
72 | splice | 3190 | 61 | 3 | 0 | 61 | 0 | 0.4 | multi-class | OpenML | AL |
73 | spooky-author-identification | 19579 | 2 | 3 | 0 | 1 | 1 | 3.1 | multi-class | Kaggle | AL |
74 | wine_quality_red | 1599 | 11 | 6 | 11 | 0 | 0 | 0.1 | multi-class | PMLB | AL |
75 | wine_quality_white | 4898 | 11 | 7 | 11 | 0 | 0 | 0.3 | multi-class | PMLB | AL |
76 | housing-prices | 1460 | 80 | - | 37 | 43 | 0 | 0.4 | regression | Kaggle | AL |
77 | mercedes-benz-greener-manufacturing | 4209 | 377 | - | 369 | 8 | 0 | 3.1 | regression | Kaggle | AL |
78 | ailerons | 13750 | 40 | 2 | 40 | 0 | 0 | 2.2 | binary | OpenML | VolcanoML |
79 | analcatdata_supreme | 4052 | 7 | 2 | 7 | 0 | 0 | 0.1 | binary | OpenML | VolcanoML |
80 | bank32nh_833 | 8192 | 32 | 2 | 32 | 0 | 0 | 2.1 | binary | OpenML | VolcanoML |
81 | cpu_act_761 | 8192 | 21 | 2 | 21 | 0 | 0 | 0.7 | binary | OpenML | VolcanoML |
82 | cpu_small_735 | 8192 | 12 | 2 | 12 | 0 | 0 | 0.4 | binary | OpenML | VolcanoML |
83 | delta_ailerons | 7129 | 5 | 2 | 5 | 0 | 0 | 0.3 | binary | OpenML | VolcanoML |
84 | delta_elevators | 9517 | 6 | 2 | 6 | 0 | 0 | 0.3 | binary | OpenML | VolcanoML |
85 | eeg-eye-state | 14980 | 14 | 2 | 14 | 0 | 0 | 1.6 | binary | OpenML | VolcanoML |
86 | electricity | 45312 | 8 | 2 | 8 | 0 | 0 | 2.9 | binary | OpenML | VolcanoML |
87 | jm1 | 10885 | 21 | 2 | 21 | 0 | 0 | 0.8 | binary | OpenML | VolcanoML |
88 | kin8nm_807 | 8192 | 8 | 2 | 8 | 0 | 0 | 0.6 | binary | OpenML | VolcanoML |
89 | mammography | 11183 | 6 | 2 | 6 | 0 | 0 | 0.8 | binary | OpenML | VolcanoML |
90 | mc1 | 9466 | 38 | 2 | 38 | 0 | 0 | 1 | binary | OpenML | VolcanoML |
91 | ozone-level-8hr | 2534 | 72 | 2 | 72 | 0 | 0 | 0.9 | binary | OpenML | VolcanoML |
92 | page-blocks | 5473 | 10 | 2 | 10 | 0 | 0 | 0.2 | binary | OpenML | VolcanoML |
93 | pollen_871 | 3848 | 5 | 2 | 5 | 0 | 0 | 0.1 | binary | OpenML | VolcanoML |
94 | puma32H_752 | 8192 | 32 | 2 | 32 | 0 | 0 | 2.3 | binary | OpenML | VolcanoML |
95 | puma8NH_816 | 8192 | 8 | 2 | 8 | 0 | 0 | 0.6 | binary | OpenML | VolcanoML |
96 | space_ga_737 | 3107 | 6 | 2 | 6 | 0 | 0 | 0.2 | binary | OpenML | VolcanoML |
97 | waveform-5000 | 5000 | 40 | 2 | 40 | 0 | 0 | 1 | binary | OpenML | VolcanoML |
98 | wind_847 | 6574 | 14 | 2 | 14 | 0 | 0 | 0.4 | binary | OpenML | VolcanoML |
99 | abalone | 4177 | 8 | 28 | 7 | 1 | 0 | 0.2 | multi-class | OpenML | VolcanoML |
100 | optdigits | 5620 | 64 | 10 | 64 | 0 | 0 | 0.8 | multi-class | OpenML | VolcanoML |
101 | pendigits | 10992 | 16 | 10 | 16 | 0 | 0 | 0.7 | multi-class | OpenML | VolcanoML |
102 | satimage | 6430 | 36 | 6 | 36 | 0 | 0 | 2.1 | multi-class | OpenML | VolcanoML |
103 | bank32nh_558 | 8192 | 32 | - | 32 | 0 | 0 | 2.4 | regression | OpenML | VolcanoML |
104 | bank8FM | 8192 | 8 | - | 8 | 0 | 0 | 0.6 | regression | OpenML | VolcanoML |
105 | cpu_act_573 | 8192 | 21 | - | 21 | 0 | 0 | 1 | regression | OpenML | VolcanoML |
106 | cpu_small_227 | 8192 | 12 | - | 12 | 0 | 0 | 0.6 | regression | OpenML | VolcanoML |
107 | debutanizer | 2394 | 7 | - | 7 | 0 | 0 | 0.2 | regression | OpenML | VolcanoML |
108 | kin8nm_189 | 8192 | 8 | - | 8 | 0 | 0 | 1.1 | regression | OpenML | VolcanoML |
109 | Moneyball | 1232 | 14 | - | 12 | 2 | 0 | 0.1 | regression | OpenML | VolcanoML |
110 | pollen_529 | 3848 | 5 | - | 5 | 0 | 0 | 0.2 | regression | OpenML | VolcanoML |
111 | puma32H_308 | 8192 | 32 | - | 32 | 0 | 0 | 2.7 | regression | OpenML | VolcanoML |
112 | puma8NH_225 | 8192 | 8 | - | 8 | 0 | 0 | 0.7 | regression | OpenML | VolcanoML |
113 | rainfall_bangladesh | 16755 | 3 | - | 1 | 2 | 0 | 0.4 | regression | OpenML | VolcanoML |
114 | socmob | 1156 | 5 | - | 1 | 4 | 0 | 0.1 | regression | OpenML | VolcanoML |
115 | space_ga_507 | 3107 | 6 | - | 6 | 0 | 0 | 0.5 | regression | OpenML | VolcanoML |
116 | stock | 950 | 9 | - | 9 | 0 | 0 | 0.1 | regression | OpenML | VolcanoML |
117 | sulfur | 10081 | 6 | - | 6 | 0 | 0 | 0.6 | regression | OpenML | VolcanoML |
118 | us_crime | 1994 | 127 | - | 126 | 1 | 0 | 1.1 | regression | OpenML | VolcanoML |
119 | weather_izmir | 1461 | 9 | - | 9 | 0 | 0 | 0.1 | regression | OpenML | VolcanoML |
120 | wind_503 | 6574 | 14 | - | 14 | 0 | 0 | 0.5 | regression | OpenML | VolcanoML |
121 | witmer_census_1980 | 50 | 5 | - | 4 | 1 | 0 | 0 | regression | OpenML | VolcanoML |