-
I am able to run the skrl along with isaacgym env, everything is great. I tried to log the variables from action space to tensorboar use agent.track_data api, just like this: 98 agent = TRPO(models=models_trpo,
99 memory=memory,
100 cfg=cfg_trpo,
101 observation_space=env.observation_space,
102 action_space=env.action_space,
103 device=device)
104 print(type(env.action_space))
**105 agent.track_data("target velocity", env.action_space[0])** with the print information, it seems like the env.action_space inherent ed from isaacgym, and it is a gym.space.Box data type. I am wondering is this the right way to log the scalar variables from action_space? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Hi @maths4513 If you want to log the actions taken by the policy (rather than the action space: definition of the action expected by the environment) currently you need to modify the agent or the trainer to include the Also, you can run the training manually and log the variables you want as follow: ...
agent = TRPO(models=models_trpo,
memory=memory,
cfg=cfg_trpo,
observation_space=env.observation_space,
action_space=env.action_space,
device=device)
# Run training manually
# initialize agent
agent.init()
# reset environment
states, infos = env.reset()
import tqdm
timesteps = 1600
for timestep in tqdm.tqdm(range(timesteps)):
# pre-interaction
agent.pre_interaction(timestep=timestep, timesteps=timesteps)
# compute actions
with torch.no_grad():
actions = agent.act(states, timestep=timestep, timesteps=timesteps)[0]
# track actions
agent.track_data("Actions / Action (max)", torch.max(actions).item())
agent.track_data("Actions / Action (min)", torch.min(actions).item())
agent.track_data("Actions / Action (mean)", torch.mean(actions).item())
# step the environments
next_states, rewards, terminated, truncated, infos = env.step(actions)
# record the environments' transitions
with torch.no_grad():
agent.record_transition(states=states,
actions=actions,
rewards=rewards,
next_states=next_states,
terminated=terminated,
truncated=truncated,
infos=infos,
timestep=timestep,
timesteps=timesteps)
# post-interaction
agent.post_interaction(timestep=timestep, timesteps=timesteps)
# update states (environments are reset internally)
states = next_states This will generate the following Tensorboard cards (example for Isaac Gym preview 4 Cartpole environment with PPO): |
Beta Was this translation helpful? Give feedback.
-
Thanks, it works, great!. |
Beta Was this translation helpful? Give feedback.
Hi @maths4513
If you want to log the actions taken by the policy (rather than the action space: definition of the action expected by the environment) currently you need to modify the agent or the trainer to include the
agent.track_data(...)
callAlso, you can run the training manually and log the variables you want as follow: