Weights optimization #8

cspatharis · 2020-09-25T09:05:18Z

Hello, I wanted to ask a quick question.

From what I understand from the original paper and your blog, we assign a (+1) label to the expert policy and (-1) label to all the other policies. In the optimization function of your implementation I guess that the h_list keeps these label values. My question is why the expert policy has the same label as all the other policies (+1 which later becomes -1 in the h matrix)?

Thanks in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weights optimization #8

Weights optimization #8

cspatharis commented Sep 25, 2020

Weights optimization #8

Weights optimization #8

Comments

cspatharis commented Sep 25, 2020