Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLama 3.1 8b training issue. #129

Open
dhananjaybhandiwad opened this issue Sep 5, 2024 · 2 comments
Open

LLama 3.1 8b training issue. #129

dhananjaybhandiwad opened this issue Sep 5, 2024 · 2 comments

Comments

@dhananjaybhandiwad
Copy link

Hello authors I am trying to train the draft head for LLama 3.1-8b-instruct, I have the following below error. Inspite of my best efforts of updating all the possible libraries. It still fails.

The following values were not passed to `accelerate launch` and had defaults used instead:
	`--num_processes` was set to a value of `1`
	`--num_machines` was set to a value of `1`
	`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
Detected kernel version 4.18.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
Traceback (most recent call last):
  File "/software/rome/r23.04/Python/3.10.4-GCCcore-11.3.0/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/software/rome/r23.04/Python/3.10.4-GCCcore-11.3.0/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle/train/main.py", line 72, in <module>
    baseconfig = AutoConfig.from_pretrained(args.basepath)
  File "/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle_env/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 989, in from_pretrained
    return config_class.from_dict(config_dict, **unused_kwargs)
  File "/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle_env/lib/python3.10/site-packages/transformers/configuration_utils.py", line 772, in from_dict
    config = cls(**config_dict)
  File "/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle_env/lib/python3.10/site-packages/transformers/models/llama/configuration_llama.py", line 161, in __init__
    self._rope_scaling_validation()
  File "/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle_env/lib/python3.10/site-packages/transformers/models/llama/configuration_llama.py", line 182, in _rope_scaling_validation
    raise ValueError(
ValueError: `rope_scaling` must be a dictionary with two fields, `type` and `factor`, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}
Traceback (most recent call last):
  File "/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle_env/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle_env/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle_env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1097, in launch_command
    simple_launcher(args)
  File "/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle_env/lib/python3.10/site-packages/accelerate/commands/launch.py", line 703, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle_env/bin/python', '-m', 'eagle.train.main', '--basepath', '/data/horse/ws/dhra414f-dhra414f/models--meta-llama--Meta-Llama-3.1-8B-Instruct/snapshots/8c22764a7e3675c50d4c7c9a4edb474456022b16', '--tmpdir', '/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/EAGLE/eagle/data_eagle/sharegpt_0_67999_mufp16', '--cpdir', '/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle/checkpoints', '--configpath', '/data/horse/ws/dhra414f-dhra414f/eagle/EAGLE/eagle/train/EAGLE-LLaMA3.1-Instruct-8B.json']' returned non-zero exit status 1.

Do you have a solution for this. Please let me know if I am doing something wrong.

@Liyuhui-12
Copy link
Collaborator

It might be an issue with the version of the transformers package.

@870572761
Copy link

In fact, even if you solve this problem, you will find there do not exist the key word about "lm_head.weight". So maybe this code is custom-made for some model.
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants