You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# This is the secret sauce to make Actor & Value Network accept dict inputs
def get_dict_combiner():
combiner_seq = tf_agents.networks.Sequential([tf_agents.networks.NestFlatten(), tf.keras.layers.Concatenate(axis=-1)])
combiner = tf.keras.layers.Lambda(lambda x: combiner_seq(x)[0])
return combiner
actor_net = actor_distribution_network.ActorDistributionNetwork(input_spec, train_tf_env.action_spec(), preprocessing_combiner=get_dict_combiner(), fc_layer_params=(256,128,64))
value_net = value_network.ValueNetwork(input_spec, preprocessing_combiner=get_dict_combiner(), fc_layer_params=(256,128,64))
def splitter_fn(obs_dict): # NOTE: don't modify obs_dict it will cause bugs!
return {k:v for k,v in obs_dict.items() if k!='possible_moves_mask'}, obs_dict['possible_moves_mask']
actor_net=MaskSplitterNetwork(splitter_fn, actor_net, passthrough_mask=True)
value_net=MaskSplitterNetwork(splitter_fn, value_net, passthrough_mask=False)
Yet it appears that sometimes the actor is still guessing actions which are disallowed by the 'possible_moves_mask'. How is this possible? Do I need to explicitly use the mask to prevent guessing of invalid moves somewhere? I was under the impression this happened automatically right?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I've tried using mask splitter network like so:
Yet it appears that sometimes the actor is still guessing actions which are disallowed by the 'possible_moves_mask'. How is this possible? Do I need to explicitly use the mask to prevent guessing of invalid moves somewhere? I was under the impression this happened automatically right?
Beta Was this translation helpful? Give feedback.
All reactions