You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
When running the code on deepmind/pendulum-swingup the training crashes as the action becomes nan. I attach stack trace below (I added some more logging to catch exactly which part of the agent produces nan action, the original error was later when interacting with the environment, but the cause is here). I believe that more envs share this problem as in my previous runs I also experienced this - happened mostly for dog tasks, but as I was using my custom wrapper instead of shimmy I thought that maybe it had been some problem with my wrapper. Now it happens with shimmy so it is not the case of the wrapper but probably some instabilities (maybe with BatchNorm?).
237 Traceback (most recent call last):
238 File "/home/src/crossq/train.py", line 264, in <module>
239 model.learn(total_timesteps=total_timesteps, progress_bar=True, callback=callback_list)
240 File "/home/src/crossq/sbx/sac/sac.py", line 187, in learn
241 return super().learn(
242 ^^^^^^^^^^^^^^
243 File "/home/miniconda3/envs/crossq/lib/python3.11/site-packages/stable_baselines3/common/off_policy_algorithm.py", line 312, in learn
244 rollout = self.collect_rollouts(
245 ^^^^^^^^^^^^^^^^^^^^^^
246 File "/home/miniconda3/envs/crossq/lib/python3.11/site-packages/stable_baselines3/common/off_policy_algorithm.py", line 541, in collect_rollouts
247 actions, buffer_actions = self._sample_action(learning_starts, action_noise, env.num_envs)
248 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
249 File "/home/miniconda3/envs/crossq/lib/python3.11/site-packages/stable_baselines3/common/off_policy_algorithm.py", line 373, in _sample_action
250 unscaled_action, _ = self.predict(self._last_obs, deterministic=False)
251 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
252 File "/home/miniconda3/envs/crossq/lib/python3.11/site-packages/stable_baselines3/common/base_class.py", line 555, in predict
253 return self.policy.predict(observation, state, episode_start, deterministic)
254 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
255 File "/home/src/crossq/sbx/common/policies.py", line 64, in predict
256 actions = self._predict(observation, deterministic=deterministic)
257 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
258 File "/home/src/crossq/sbx/sac/policies.py", line 482, in _predict
259 self.debug_log_action(observation, action, "_predict")
260 File "/home/src/crossq/sbx/sac/policies.py", line 531, in debug_log_action
261 raise ValueError("Action is None")
262 ValueError: Action is None
When the error happens I added printing the state of the actor and the observation. nan values are mostly present in BatchRenorm:
@adityab@danielpalen in order to reproduce this error I provide more information below.
Tested tasks and seeds that crashed:
pendulum-swingup, seeds from 0 to 9 (inclusive)
dog-stand, seed 0
Wandb charts for dog-stand seed 0 (training crashed after 400k steps):
Action values were nan and nan values were present mostly in BatchRenorm layers, but also in some dense layers - similar to the log above with pendulum-swingup.
Hello,
When running the code on deepmind/pendulum-swingup the training crashes as the action becomes
nan
. I attach stack trace below (I added some more logging to catch exactly which part of the agent producesnan
action, the original error was later when interacting with the environment, but the cause is here). I believe that more envs share this problem as in my previous runs I also experienced this - happened mostly fordog
tasks, but as I was using my custom wrapper instead of shimmy I thought that maybe it had been some problem with my wrapper. Now it happens withshimmy
so it is not the case of the wrapper but probably some instabilities (maybe with BatchNorm?).When the error happens I added printing the state of the actor and the observation.
nan
values are mostly present inBatchRenorm
:The log is not complete as it has more than 100KB in size, so I attach just the beginning.
The text was updated successfully, but these errors were encountered: