You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
AFAIK, this line of code convolves the output distribution(z_concat) with the distribution-atoms(self.z).
Resulting in the Q-value for an action in a state.
As this is done in vectorized numpy code, the resulting variable, q, holds all q-values for all actions in all states in the batch.
q = np.sum(np.multiply(z_concat, np.array(self.z)), axis=1)
what is the point in this code when you design which action is optimal?
The text was updated successfully, but these errors were encountered: