Is there good choices of the rewards functions used for one-step updating of the RL policy . #220
Replies: 1 comment 7 replies
-
That's a good and open question. It's true that the number of nodes will increase by one on every That leaves us with things such as primal/dual bounds improvement, so perhaps Primal and Dual Integrals. Alternatively, there are expert policies that give scores for each decision (Pseudo-costs, Strong Branching), so they may be adapted into a reward, but I'm not sure what would be the advantage of doing this over imitation learning. |
Beta Was this translation helpful? Give feedback.
-
We can optimize an instance to be solved and collect the getNNnodes() or Solvingtime() as the eposide reward. Is there good choices of the rewards functions used for one-step updating of the RL policy?
I'm new to this field of B&B. Can anybody give some advices.
Beta Was this translation helpful? Give feedback.
All reactions