Could you please explain the reason behind a line of code #7

zhoujingzhe · 2018-10-17T09:12:11Z

q = np.sum(np.multiply(z_concat, np.array(self.z)), axis=1)
what is the point in this code when you design which action is optimal?

olavbm · 2018-10-20T17:13:50Z

AFAIK, this line of code convolves the output distribution(z_concat) with the distribution-atoms(self.z).
Resulting in the Q-value for an action in a state.
As this is done in vectorized numpy code, the resulting variable, q, holds all q-values for all actions in all states in the batch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could you please explain the reason behind a line of code #7

Could you please explain the reason behind a line of code #7

zhoujingzhe commented Oct 17, 2018

olavbm commented Oct 20, 2018

Could you please explain the reason behind a line of code #7

Could you please explain the reason behind a line of code #7

Comments

zhoujingzhe commented Oct 17, 2018

olavbm commented Oct 20, 2018