LLama 3.1 8b training issue. #129

dhananjaybhandiwad · 2024-09-05T15:03:00Z

Hello authors I am trying to train the draft head for LLama 3.1-8b-instruct, I have the following below error. Inspite of my best efforts of updating all the possible libraries. It still fails.

The following values were not passed to `accelerate launch` and had defaults used instead:
	`--num_processes` was set to a value of `1`
	`--num_machines` was set to a value of `1`
	`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
Detected kernel version 4.18.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
Traceback (most recent call last):
  File "/software/rome/r23.04/Python/3.10.4-GCCcore-11.3.0/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/software/rome/r23.04/Python/3.10.4-GCCcore-11.3.0/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle/train/main.py", line 72, in <module>
    baseconfig = AutoConfig.from_pretrained(args.basepath)
  File "/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle_env/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 989, in from_pretrained
    return config_class.from_dict(config_dict, **unused_kwargs)
  File "/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle_env/lib/python3.10/site-packages/transformers/configuration_utils.py", line 772, in from_dict
    config = cls(**config_dict)
  File "/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle_env/lib/python3.10/site-packages/transformers/models/llama/configuration_llama.py", line 161, in __init__
    self._rope_scaling_validation()
  File "/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle_env/lib/python3.10/site-packages/transformers/models/llama/configuration_llama.py", line 182, in _rope_scaling_validation
    raise ValueError(
ValueError: `rope_scaling` must be a dictionary with two fields, `type` and `factor`, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}
Traceback (most recent call last):
  File "/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle_env/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle_env/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle_env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1097, in launch_command
    simple_launcher(args)
  File "/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle_env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 703, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle_env/bin/python', '-m', 'eagle.train.main', '--basepath', '/data/horse/ws/dhra414f-dhra414f/models--meta-llama--Meta-Llama-3.1-8B-Instruct/snapshots/8c22764a7e3675c50d4c7c9a4edb474456022b16', '--tmpdir', '/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/EAGLE/eagle/data_eagle/sharegpt_0_67999_mufp16', '--cpdir', '/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle/checkpoints', '--configpath', '/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle/train/EAGLE-LLaMA3.1-Instruct-8B.json']' returned non-zero exit status 1.

Do you have a solution for this. Please let me know if I am doing something wrong.

The text was updated successfully, but these errors were encountered:

Liyuhui-12 · 2024-09-11T08:00:28Z

It might be an issue with the version of the transformers package.

870572761 · 2024-10-06T09:15:43Z

In fact, even if you solve this problem, you will find there do not exist the key word about "lm_head.weight". So maybe this code is custom-made for some model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLama 3.1 8b training issue. #129

LLama 3.1 8b training issue. #129

dhananjaybhandiwad commented Sep 5, 2024

Liyuhui-12 commented Sep 11, 2024

870572761 commented Oct 6, 2024

LLama 3.1 8b training issue. #129

LLama 3.1 8b training issue. #129

Comments

dhananjaybhandiwad commented Sep 5, 2024

Liyuhui-12 commented Sep 11, 2024

870572761 commented Oct 6, 2024