Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest DeepSpeed not supported #1306

Open
jahatef opened this issue Oct 15, 2024 · 0 comments
Open

Latest DeepSpeed not supported #1306

jahatef opened this issue Oct 15, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@jahatef
Copy link
Collaborator

jahatef commented Oct 15, 2024

Describe the bug
When using the Latest Deepspeed with GPT-NeoX:

Traceback (most recent call last):
  File "/home/hatef.4/neox/gpt-neox/train.py", line 35, in <module>
    main()
  File "/home/hatef.4/neox/gpt-neox/train.py", line 31, in main
    pretrain(neox_args=neox_args)
  File "/home/hatef.4/neox/gpt-neox/megatron/training.py", line 296, in pretrain
    iteration = train(
  File "/home/hatef.4/neox/gpt-neox/megatron/training.py", line 1465, in train
    loss_dict, skipped_iter = train_step(
  File "/home/hatef.4/neox/gpt-neox/megatron/training.py", line 1277, in train_step
    reduced_loss = train_step_pipe(
  File "/home/hatef.4/neox/gpt-neox/megatron/training.py", line 1374, in train_step_pipe
Traceback (most recent call last):
    loss = model.train_batch(data_iter=data_iterator)
  File "/home/hatef.4/neox/DeepSpeed/deepspeed/runtime/pipe/engine.py", line 388, in train_batch
    self._exec_schedule(sched)
  File "/home/hatef.4/neox/DeepSpeed/deepspeed/runtime/pipe/engine.py", line 1420, in _exec_schedule
  File "/home/hatef.4/neox/gpt-neox/train.py", line 35, in <module>
    main()
  File "/home/hatef.4/neox/gpt-neox/train.py", line 31, in main
    pretrain(neox_args=neox_args)
  File "/home/hatef.4/neox/gpt-neox/megatron/training.py", line 296, in pretrain
    iteration = train(
    self._exec_instr(**cmd.kwargs)
  File "/home/hatef.4/neox/DeepSpeed/deepspeed/runtime/pipe/engine.py", line 1236, in _exec_optimizer_step
  File "/home/hatef.4/neox/gpt-neox/megatron/training.py", line 1465, in train
    loss_dict, skipped_iter = train_step(
  File "/home/hatef.4/neox/gpt-neox/megatron/training.py", line 1277, in train_step
    reduced_loss = train_step_pipe(
  File "/home/hatef.4/neox/gpt-neox/megatron/training.py", line 1374, in train_step_pipe
    loss = model.train_batch(data_iter=data_iterator)
    if self.global_steps % self.steps_per_print() == 0:
TypeError:   File "/home/hatef.4/neox/DeepSpeed/deepspeed/runtime/pipe/engine.py", line 388, in train_batch
    self._exec_schedule(sched)
unsupported operand type(s) for %: 'int' and 'NoneType'
  File "/home/hatef.4/neox/DeepSpeed/deepspeed/runtime/pipe/engine.py", line 1420, in _exec_schedule
    self._exec_instr(**cmd.kwargs)
  File "/home/hatef.4/neox/DeepSpeed/deepspeed/runtime/pipe/engine.py", line 1236, in _exec_optimizer_step
    if self.global_steps % self.steps_per_print() == 0:
TypeError: unsupported operand type(s) for %: 'int' and 'NoneType'

To Reproduce
Steps to reproduce the behavior:

  1. Use latest DeepSpeed
  2. run training with 1-3B.yml config
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant