You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I used the "CarRacing-v3" environment and set the "continuous=False", and created a stable baseline3 DQR agent to train on it, I got error message like this:
According to the error message, I checked the source code of "class CarRacing", and noticed that:
action = action.astype(np.float64)
will be executed even when action is sampled from Discrete Action Space, i.e. when action is an Integer.
Code example
`import gymnasium as gym
from gymnasium.spaces import Tuple, Discrete, Box
from stable_baselines3 import PPO, DQN
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.vec_env import DummyVecEnv, VecVideoRecorder
import numpy as np
env = gym.make(config["env_name"], render_mode="rgb_array", continuous=False,)
env = Monitor(env) # record stats such as returns
env = DummyVecEnv([make_env])
Initialize PPO model
model = DQN(
"MlpPolicy", # Use a multi-layer perceptron policy
env,
verbose=1,
learning_rate=config["learning_rate"],
# n_steps=512, # Number of steps to run for each environment per update
gamma=config["gamma"], # Discount factor
)
Describe the bug
I used the "CarRacing-v3" environment and set the "continuous=False", and created a stable baseline3 DQR agent to train on it, I got error message like this:
According to the error message, I checked the source code of "class CarRacing", and noticed that:
action = action.astype(np.float64)
will be executed even when action is sampled from Discrete Action Space, i.e. when action is an Integer.
Code example
`import gymnasium as gym
from gymnasium.spaces import Tuple, Discrete, Box
from stable_baselines3 import PPO, DQN
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.vec_env import DummyVecEnv, VecVideoRecorder
import numpy as np
config = {
"policy_type": "MlpPolicy",
"total_timesteps": 25000,
"learning_rate": 0.01,
"gamma": 0.95,
"env_name": "CarRacing-v3",
}
env = gym.make(config["env_name"], render_mode="rgb_array", continuous=False,)
env = Monitor(env) # record stats such as returns
env = DummyVecEnv([make_env])
Initialize PPO model
model = DQN(
"MlpPolicy", # Use a multi-layer perceptron policy
env,
verbose=1,
learning_rate=config["learning_rate"],
# n_steps=512, # Number of steps to run for each environment per update
gamma=config["gamma"], # Discount factor
)
Train DQN
model.learn(total_timesteps=config["total_timesteps"], ),
`
System Info
Describe the characteristic of your environment:
Checklist
The text was updated successfully, but these errors were encountered: