Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chapter2 Application Demo #4

Open
PaParaZz1 opened this issue Dec 22, 2022 · 19 comments
Open

Chapter2 Application Demo #4

PaParaZz1 opened this issue Dec 22, 2022 · 19 comments
Labels
application Application analysis or extension discussion Topic discussion

Comments

@PaParaZz1
Copy link
Member

PaParaZz1 commented Dec 22, 2022

在本 issue 中,我们会更新所有和课程第二讲相关的应用 demo 素材

训练代码链接

  • 火箭回收(离散动作空间)

    rocket.mp4
  • 无人机姿态控制(连续动作空间)

    drone.mp4
  • 交通信控(多维离散动作空间)

    cityflow_tiny.mp4
  • 导航控制(混合动作空间:参数化动作空间)

    out.mp4
@PaParaZz1 PaParaZz1 added application Application analysis or extension discussion Topic discussion labels Dec 22, 2022
@PaParaZz1 PaParaZz1 pinned this issue Dec 22, 2022
@EasonQYS
Copy link

期待代码

@PaParaZz1 PaParaZz1 unpinned this issue Feb 24, 2023
@jianzuo
Copy link

jianzuo commented Mar 12, 2023

请问有关于multiDiscrete动作空间的详细对照解析吗,我查看了代码注视文档教程好像只有普通离散动作的。
谢谢!

@PaParaZz1
Copy link
Member Author

请问有关于multiDiscrete动作空间的详细对照解析吗,我查看了代码注视文档教程好像只有普通离散动作的。 谢谢!

其实就是 DI-engine 中的 MultiHead 功能实现,可以先看这边的源码,我们本周内会在课程 repo 这边更新下代码注解文档。

@jianzuo
Copy link

jianzuo commented Mar 15, 2023

明白了,谢谢!

@jianzuo
Copy link

jianzuo commented Mar 25, 2023

您好,
请问您回复说的更新关于multihead的代码注释是在哪可以看到?我最近在尝试用PPO实现输出多维动作。
一直没有弄清楚。谢谢!

@jianzuo
Copy link

jianzuo commented Mar 25, 2023

我跟据讲解尝试了下multihead,但是报错了:

import torch
import torch.nn as nn
import torch.nn.functional as F
class DiscretePolicyNetMultiHead(nn.Module):
    def __init__(self, obs_dim, hidden_dim, action_dim) -> None:
        super(DiscretePolicyNet, self).__init__()
        self.encoder = nn.Sequential(
            nn.Linear(obs_dim, hidden_dim),
            nn.ReLU(),
        )
        self.heads = nn.ModuleList([nn.Linear(hidden_dim, dim) for dim in action_dim])
        
        
    def forward(self, x: torch.Tensor)->torch.Tensor:
        x = self.encoder(x)
        logit = [self.head(x) for head in self.heads]
        return logits
    
def sample_act(logit: torch.Tensor) -> torch.Tensor:
    probs = torch.softmax(logit, dim=-1)
    dists = [torch.distributions.Categorical(probs=prob) for prob in probs]
    return [dist.sample() for dist in dists]

def test_action_multihead():
    B, obs_shape, hidden_shape, action_shape = 4, 10, 32, [6, 3]
    state = torch.rand(B, obs_shape)
    policy_net = DiscretePolicyNet(obs_shape, hidden_shape, action_shape)
    logit = policy_net(state)
    assert logit.shape == (B, action_shape)
    action = sample_act(logit)
    assert action.shape == (B,)
    return action

test_action_multihead()
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_27/530012604.py in <module>
----> 1 test_action_multihead()

/tmp/ipykernel_27/2493506364.py in test_action_multihead()
      2     B, obs_shape, hidden_shape, action_shape = 4, 10, 32, [6, 3]
      3     state = torch.rand(B, obs_shape)
----> 4     policy_net = DiscretePolicyNet(obs_shape, hidden_shape, action_shape)
      5     logit = policy_net(state)
      6     assert logit.shape == (B, action_shape)

/tmp/ipykernel_27/2688308212.py in __init__(self, obs_dim, hidden_dim, action_dim)
      6             nn.ReLU(),
      7         )
----> 8         self.head = nn.Linear(hidden_dim, action_dim)
      9 
     10     def forward(self, x: torch.Tensor)->torch.Tensor:

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/linear.py in __init__(self, in_features, out_features, bias, device, dtype)
     94         self.in_features = in_features
     95         self.out_features = out_features
---> 96         self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
     97         if bias:
     98             self.bias = Parameter(torch.empty(out_features, **factory_kwargs))

TypeError: empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType), but expected one of:
 * (tuple of ints size, *, tuple of names names, torch.memory_format memory_format, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
 * (tuple of SymInts size, *, torch.memory_format memory_format, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)

@PaParaZz1
Copy link
Member Author

我跟据讲解尝试了下multihead,但是报错了:

import torch
import torch.nn as nn
import torch.nn.functional as F
class DiscretePolicyNetMultiHead(nn.Module):
    def __init__(self, obs_dim, hidden_dim, action_dim) -> None:
        super(DiscretePolicyNet, self).__init__()
        self.encoder = nn.Sequential(
            nn.Linear(obs_dim, hidden_dim),
            nn.ReLU(),
        )
        self.heads = nn.ModuleList([nn.Linear(hidden_dim, dim) for dim in action_dim])
        
        
    def forward(self, x: torch.Tensor)->torch.Tensor:
        x = self.encoder(x)
        logit = [self.head(x) for head in self.heads]
        return logits
    
def sample_act(logit: torch.Tensor) -> torch.Tensor:
    probs = torch.softmax(logit, dim=-1)
    dists = [torch.distributions.Categorical(probs=prob) for prob in probs]
    return [dist.sample() for dist in dists]

def test_action_multihead():
    B, obs_shape, hidden_shape, action_shape = 4, 10, 32, [6, 3]
    state = torch.rand(B, obs_shape)
    policy_net = DiscretePolicyNet(obs_shape, hidden_shape, action_shape)
    logit = policy_net(state)
    assert logit.shape == (B, action_shape)
    action = sample_act(logit)
    assert action.shape == (B,)
    return action

test_action_multihead()
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_27/530012604.py in <module>
----> 1 test_action_multihead()

/tmp/ipykernel_27/2493506364.py in test_action_multihead()
      2     B, obs_shape, hidden_shape, action_shape = 4, 10, 32, [6, 3]
      3     state = torch.rand(B, obs_shape)
----> 4     policy_net = DiscretePolicyNet(obs_shape, hidden_shape, action_shape)
      5     logit = policy_net(state)
      6     assert logit.shape == (B, action_shape)

/tmp/ipykernel_27/2688308212.py in __init__(self, obs_dim, hidden_dim, action_dim)
      6             nn.ReLU(),
      7         )
----> 8         self.head = nn.Linear(hidden_dim, action_dim)
      9 
     10     def forward(self, x: torch.Tensor)->torch.Tensor:

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/linear.py in __init__(self, in_features, out_features, bias, device, dtype)
     94         self.in_features = in_features
     95         self.out_features = out_features
---> 96         self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
     97         if bias:
     98             self.bias = Parameter(torch.empty(out_features, **factory_kwargs))

TypeError: empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType), but expected one of:
 * (tuple of ints size, *, tuple of names names, torch.memory_format memory_format, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
 * (tuple of SymInts size, *, torch.memory_format memory_format, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)

现在可以参考这个例子 https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/discrete_tutorial_zh.py#L58

@jianzuo
Copy link

jianzuo commented Mar 28, 2023

谢谢!我根据您的例子重写下。

@lz-8713
Copy link

lz-8713 commented Aug 1, 2023

multiDiscrete动作空间和Discrete动作空间相关的ppo的代码,还有控制交通信号灯的完整代码能分享一下吗?

@zhixiongzh
Copy link

你好,我docker pull了最新的opendilab/ding:nightly-mujoco镜像,然后在里面运行pip install git+https://github.com/zjowowen/gym-pybullet-drones@master,想跑一下drones的例子,但是报错

root@BF4-C-008T7:/workspaces/PPOxFamily# pip install git+https://github.com/zjowowen/gym-pybullet-drones@master
Collecting git+https://github.com/zjowowen/gym-pybullet-drones@master
  Cloning https://github.com/zjowowen/gym-pybullet-drones (to revision master) to /tmp/pip-req-build-wy0jagd4
  Running command git clone --filter=blob:none --quiet https://github.com/zjowowen/gym-pybullet-drones /tmp/pip-req-build-wy0jagd4
  Resolved https://github.com/zjowowen/gym-pybullet-drones to commit b35eed32c251cc69c2d7b0de74dd9a66ca1357b1
  Installing build dependencies ... error
  error: subprocess-exited-with-error
  
  × pip subprocess to install build dependencies did not run successfully.
  │ exit code: 1
  ╰─> [20 lines of output]
      Collecting poetry-core@ git+https://github.com/python-poetry/poetry-core.git@master
        Cloning https://github.com/python-poetry/poetry-core.git (to revision master) to /tmp/pip-install-s945w_8c/poetry-core_d952979d432a40669870b5448a5371f8
        Running command git clone --filter=blob:none --quiet https://github.com/python-poetry/poetry-core.git /tmp/pip-install-s945w_8c/poetry-core_d952979d432a40669870b5448a5371f8
        WARNING: Did not find branch or tag 'master', assuming revision or ref.
        Running command git checkout -q master
        error: pathspec 'master' did not match any file(s) known to git.
        error: subprocess-exited-with-error
      
        × git checkout -q master did not run successfully.
        │ exit code: 1
        ╰─> See above for output.
      
        note: This error originates from a subprocess, and is likely not a problem with pip.
      error: subprocess-exited-with-error
      
      × git checkout -q master did not run successfully.
      │ exit code: 1
      ╰─> See above for output.
      
      note: This error originates from a subprocess, and is likely not a problem with pip.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

我手动安装了poetry-core也不行,感觉是那个master的branch名字要改成main?
@PaParaZz1 请问有什么建议吗?

@zhixiongzh
Copy link

你好,我docker pull了最新的opendilab/ding:nightly-mujoco镜像,然后在里面运行pip install git+https://github.com/zjowowen/gym-pybullet-drones@master,想跑一下drones的例子,但是报错

root@BF4-C-008T7:/workspaces/PPOxFamily# pip install git+https://github.com/zjowowen/gym-pybullet-drones@master
Collecting git+https://github.com/zjowowen/gym-pybullet-drones@master
  Cloning https://github.com/zjowowen/gym-pybullet-drones (to revision master) to /tmp/pip-req-build-wy0jagd4
  Running command git clone --filter=blob:none --quiet https://github.com/zjowowen/gym-pybullet-drones /tmp/pip-req-build-wy0jagd4
  Resolved https://github.com/zjowowen/gym-pybullet-drones to commit b35eed32c251cc69c2d7b0de74dd9a66ca1357b1
  Installing build dependencies ... error
  error: subprocess-exited-with-error
  
  × pip subprocess to install build dependencies did not run successfully.
  │ exit code: 1
  ╰─> [20 lines of output]
      Collecting poetry-core@ git+https://github.com/python-poetry/poetry-core.git@master
        Cloning https://github.com/python-poetry/poetry-core.git (to revision master) to /tmp/pip-install-s945w_8c/poetry-core_d952979d432a40669870b5448a5371f8
        Running command git clone --filter=blob:none --quiet https://github.com/python-poetry/poetry-core.git /tmp/pip-install-s945w_8c/poetry-core_d952979d432a40669870b5448a5371f8
        WARNING: Did not find branch or tag 'master', assuming revision or ref.
        Running command git checkout -q master
        error: pathspec 'master' did not match any file(s) known to git.
        error: subprocess-exited-with-error
      
        × git checkout -q master did not run successfully.
        │ exit code: 1
        ╰─> See above for output.
      
        note: This error originates from a subprocess, and is likely not a problem with pip.
      error: subprocess-exited-with-error
      
      × git checkout -q master did not run successfully.
      │ exit code: 1
      ╰─> See above for output.
      
      note: This error originates from a subprocess, and is likely not a problem with pip.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

我手动安装了poetry-core也不行,感觉是那个master的branch名字要改成main? @PaParaZz1 请问有什么建议吗?

解决了,需要把整个drones的库clone下来,git clone https://github.com/zjowowen/gym-pybullet-drones.git 然后把这行代码requires = ["poetry-core @ git+https://github.com/python-poetry/poetry-core.git@master"]里面的master改成main,然后在那个库里手动pip install -e .就可以安装了

@zjowowen
Copy link

Hi,

This repo [https://github.com/zjowowen/gym-pybullet-drones.git] is updated with the origin repo [https://github.com/utiasDSL/gym-pybullet-drones].

Thanks for reminding us!

@zhixiongzh
Copy link

zhixiongzh commented Aug 29, 2023

@zjowowen
跑通代码后我还是无法复现这个drones_fly_demo, 按照默认参数训练了5e6 steps之后return并没有很好看,然后我加载了最佳的保存模型,record了video之后发现它是从门上面飞过去的而不是从下面传过去的。请问为了达到你们展示的demo的效果还有别的设置吗?
return

@rokey0001
Copy link

您好,我在跑demo时老遇到这样的问题,不知道有没有小伙伴和我有一样的问题。
Traceback (most recent call last):
File "", line 1, in
File "E:\download\anaconda\envs\DILAB\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "E:\download\anaconda\envs\DILAB\lib\multiprocessing\spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
File "E:\download\anaconda\envs\DILAB\lib\site-packages\ding\utils\compression_helper.py", line 24, in setstate
self.data = cloudpickle.loads(data)
TypeError: _generator_ctor() takes from 0 to 1 positional arguments but 2 were given

[10-20 22:34:24] WARNING subprocess reset set seed failed, ignore and continue... subprocess_env_manager.py:263
subprocess exception traceback:
Traceback (most recent call last):
File "E:\download\anaconda\envs\DILAB\lib\multiprocessing\connection.py", line 312, in _recv_bytes
nread, err = ov.GetOverlappedResult(True)
BrokenPipeError: [WinError 109] 管道已结束。

Traceback (most recent call last):
File "E:\download\anaconda\envs\DILAB\lib\site-packages\ding\envs\env_manager\subprocess_env_manager.py", line
259, in reset
ret = self._pipe_parents.recv()
File "E:\download\anaconda\envs\DILAB\lib\multiprocessing\connection.py", line 250, in recv
buf = self._recv_bytes()
File "E:\download\anaconda\envs\DILAB\lib\multiprocessing\connection.py", line 321, in _recv_bytes
raise EOFError
EOFError

wandb: Waiting for W&B process to finish... (failed 1). Press Ctrl-C to abort syncing.
[10-20 22:34:26] ERROR Env 2 reset has exceeded max retries(5) subprocess_env_manager.py:317
[10-20 22:34:26] ERROR Env 1 reset has exceeded max retries(5) subprocess_env_manager.py:317
[10-20 22:34:26] ERROR Env 3 reset has exceeded max retries(5) subprocess_env_manager.py:317
wandb: View run dutiful-pond-1 at: https://wandb.ai/anony-mouse-788424711663011732/bipedalwalker_demo/runs/uomu1uw0?apiKey=dc8282c6be97b578e2fa87aac8b882089ab2adaf
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: .\wandb\run-20231020_223406-uomu1uw0\logs

@huang312
Copy link

huang312 commented Jan 8, 2024

请问仓库中有multidiscretePPO的完整代码和训练过程吗

@Billchan9711
Copy link

求multidiscrete+PPO 控制交通灯代码

@PaParaZz1
Copy link
Member Author

请问仓库中有multidiscretePPO的完整代码和训练过程吗

可以参考 DI-smartcross 中的相关例子,由于 cityflow 环境比较复杂,我们没有直接整合到课程仓库中,所以请移步 DI-smartcross 查看。传送门

@JBGZ-XXB
Copy link

JBGZ-XXB commented Apr 3, 2024

你好,无人机姿态控制(连续动作空间)这个案例的环境代码有么,想参考一下如何用强化学习在接上pid控制器的

@PaParaZz1
Copy link
Member Author

你好,无人机姿态控制(连续动作空间)这个案例的环境代码有么,想参考一下如何用强化学习在接上pid控制器的

所有代码都在本仓库的代码示例中可以找到的,无人机姿态控制的代码是这个链接

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
application Application analysis or extension discussion Topic discussion
Projects
None yet
Development

No branches or pull requests

10 participants