Reducing HoeffdingTree Time&Memory Consumption #1074
-
Hello onlineML community, I am evaluating an extremely imbalanced dataset using the following model: However, the memory and time required for evaluation is extremely high (x100000) compared to UOB and OOB when using the same data set. I was curious about the reasons for this. UOB uses resampling with replacement, while RandomSampling uses resampling with rejection. Then I thought about checking the parameters of the base learner (HoeffdingTree). My question is: which parameters affect the memory and time consumption the most? I am now experimenting with fine tuning the following parameters: The choice of these parameters is based on my modest understanding of Hoeffding tree algorithms. What do you guys think? Did I choose a wrong approach? Thanks a lot in advance, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Hi @ZahirBilal, nice strategy to compose trees, btw! I recently published a paper where random sampling is applied for regression, and the results are promising indeed :)
If you want to control memory and time, your best bet is to start with Everything I have mentioned so far affects resource usage indirectly. Alternatively, you could change |
Beta Was this translation helpful? Give feedback.
Hi @ZahirBilal, nice strategy to compose trees, btw! I recently published a paper where random sampling is applied for regression, and the results are promising indeed :)
If you want to control memory and time, your best bet is to start with
max_depth
. This is a straightforward way to constrain the trees and reduce memory usage, though it might lead to underfitting if the tree depth is too shallow. Next, I would trygrace_period
to avoid attempting to split too often. Increasing this parameter will give you some speed-ups at the cost of …