You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From what I understand from the original paper and your blog, we assign a (+1) label to the expert policy and (-1) label to all the other policies. In the optimization function of your implementation I guess that the h_list keeps these label values. My question is why the expert policy has the same label as all the other policies (+1 which later becomes -1 in the h matrix)?
Thanks in advance.
The text was updated successfully, but these errors were encountered:
Hello, I wanted to ask a quick question.
From what I understand from the original paper and your blog, we assign a (+1) label to the expert policy and (-1) label to all the other policies. In the optimization function of your implementation I guess that the h_list keeps these label values. My question is why the expert policy has the same label as all the other policies (+1 which later becomes -1 in the h matrix)?
Thanks in advance.
The text was updated successfully, but these errors were encountered: