How to convert Llama-2 huggingface checkpoint to the megatron format #1080

ken-arf · 2024-01-10T06:28:10Z

ken-arf
Jan 10, 2024

What is the proper way to convert the Llama-2 huggingface checkpoint format to the Megatron? I followed the instructions in the docs/llama2.md, but got the following errors. I don't understand why transformer_engine in core/transformer/custom_layers imports itself as te at line 6, and in that module, there is no attribute for pytorch.

MODEL_SIZE=7B
TP=1
TOP=/mnt
MEGATRON_DIR=$TOP/Megatron/Megatron-LM
HF_FORMAT_DIR=$TOP/LLaMa/llama_workarea/hf_llama_models/$MODEL_SIZE
MEGATRON_FORMAT_DIR=$TOP/Megatron/workspace.Megatron-LM/weights/$MODEL_SIZE
TOKENIZER_MODEL=$TOP/LLaMa/llama_workarea/hf_llama_models/7B/$MODEL_SIZE/tokenizer.model

export PYTHONPATH="$PWD:$PWD/tools/checkpoint"
echo $PYTHONPATH

python3 tools/checkpoint/util.py
--model-type GPT
--loader llama2_hf
--saver megatron
--target-tensor-parallel-size ${TP}
--load-dir ${HF_FORMAT_DIR}
--save-dir ${MEGATRON_FORMAT_DIR}
--tokenizer-model ${TOKENIZER_MODEL}

--
Loaded loader_llama2_hf as the loader.
Loaded saver_megatron as the saver.
Starting saver...
Starting loader...
Zarr-based strategies will not be registered because of missing packages
Zarr-based strategies will not be registered because of missing packages
File "/home/aae14935wb/Share/Megatron/Megatron-LM/megatron/core/models/gpt/gpt_model.py", line 15, in
from megatron.core.transformer.transformer_block import TransformerBlock
File "/home/aae14935wb/Share/Megatron/Megatron-LM/megatron/core/transformer/transformer_block.py", line 13, in
from megatron.core.transformer.custom_layers.transformer_engine import TENorm
File "/home/aae14935wb/Share/Megatron/Megatron-LM/megatron/core/transformer/custom_layers/transformer_engine.py", line 71, in
class TELinear(te.pytorch.Linear):
AttributeError: module 'transformer_engine' has no attribute 'pytorch'

CaesarWWK · 2024-01-10T08:59:21Z

CaesarWWK
Jan 10, 2024

try pip uninstall transformer-engine

0 replies

JunyeonL · 2024-01-26T08:14:24Z

JunyeonL
Jan 26, 2024

I faced the same issue when converting a Hugging Face checkpoint to Transformer format. When I run the util.py script, I get an error like this:

$ tools/checkpoint/util.py --model-type GPT {blah blah....... }
Loaded loader_llama2_hf as the loader.
Loaded saver_megatron as the saver.
Starting saver...
Starting loader...
Unable to import Megatron, please specify the path to Megatron using --megatron-path. Exiting. No module named 'megatron.arguments'
Unable to import Megatron, please specify the path to Megatron using --megatron-path. Exiting.

I added an error print to check the module name and checked it, and I get an error that megatron.arguments does not exist as shown above. However, if you additionally import megatron, it will show that there is no transformer engine as shown below.

$ python
Python 3.8.6 (default, Jan 22 2021, 11:41:28)
[GCC 8.4.1 20200928 (Red Hat 8.4.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.

import megatron
......
ModuleNotFoundError: No module named 'transformer_engine'

The method suggested by CaesarWWK, "pip uninstall transformer-engine," does not work because transformer-engine is not installed in my environment.

Could you suggest a better solution?

0 replies

ZhangEnmao · 2024-02-04T09:40:01Z

ZhangEnmao
Feb 4, 2024

I faced the same issue when converting a Hugging Face checkpoint to Transformer format. When I run the util.py script, I get an error like this:

$ tools/checkpoint/util.py --model-type GPT {blah blah....... } Loaded loader_llama2_hf as the loader. Loaded saver_megatron as the saver. Starting saver... Starting loader... Unable to import Megatron, please specify the path to Megatron using --megatron-path. Exiting. No module named 'megatron.arguments' Unable to import Megatron, please specify the path to Megatron using --megatron-path. Exiting.

I added an error print to check the module name and checked it, and I get an error that megatron.arguments does not exist as shown above. However, if you additionally import megatron, it will show that there is no transformer engine as shown below.

$ python Python 3.8.6 (default, Jan 22 2021, 11:41:28) [GCC 8.4.1 20200928 (Red Hat 8.4.1-1)] on linux Type "help", "copyright", "credits" or "license" for more information.

import megatron
......
ModuleNotFoundError: No module named 'transformer_engine'

The method suggested by CaesarWWK, "pip uninstall transformer-engine," does not work because transformer-engine is not installed in my environment.

Could you suggest a better solution?

Hi, Have you already solved this problem ? I got a problem like you. But I can succeed in importing megatron, and my error informations are as follows:

0 replies

ZhangEnmao · 2024-02-04T09:46:43Z

ZhangEnmao
Feb 4, 2024

What is the proper way to convert the Llama-2 huggingface checkpoint format to the Megatron? I followed the instructions in the docs/llama2.md, but got the following errors. I don't understand why transformer_engine in core/transformer/custom_layers imports itself as te at line 6, and in that module, there is no attribute for pytorch.

MODEL_SIZE=7B TP=1 TOP=/mnt MEGATRON_DIR=$TOP/Megatron/Megatron-LM HF_FORMAT_DIR=$TOP/LLaMa/llama_workarea/hf_llama_models/$MODEL_SIZE MEGATRON_FORMAT_DIR=$TOP/Megatron/workspace.Megatron-LM/weights/$MODEL_SIZE TOKENIZER_MODEL=$TOP/LLaMa/llama_workarea/hf_llama_models/7B/$MODEL_SIZE/tokenizer.model

export PYTHONPATH="$PWD:$PWD/tools/checkpoint" echo $PYTHONPATH

python3 tools/checkpoint/util.py --model-type GPT --loader llama2_hf --saver megatron --target-tensor-parallel-size ${TP} --load-dir ${HF_FORMAT_DIR} --save-dir ${MEGATRON_FORMAT_DIR} --tokenizer-model ${TOKENIZER_MODEL}

-- Loaded loader_llama2_hf as the loader. Loaded saver_megatron as the saver. Starting saver... Starting loader... Zarr-based strategies will not be registered because of missing packages Zarr-based strategies will not be registered because of missing packages File "/home/aae14935wb/Share/Megatron/Megatron-LM/megatron/core/models/gpt/gpt_model.py", line 15, in from megatron.core.transformer.transformer_block import TransformerBlock File "/home/aae14935wb/Share/Megatron/Megatron-LM/megatron/core/transformer/transformer_block.py", line 13, in from megatron.core.transformer.custom_layers.transformer_engine import TENorm File "/home/aae14935wb/Share/Megatron/Megatron-LM/megatron/core/transformer/custom_layers/transformer_engine.py", line 71, in class TELinear(te.pytorch.Linear): AttributeError: module 'transformer_engine' has no attribute 'pytorch'

I got the same problem like you! All informations are same. Have you already solved this problem ?

0 replies

Zhang202431 · 2024-03-01T07:58:50Z

Zhang202431
Mar 1, 2024

What is the proper way to convert the Llama-2 huggingface checkpoint format to the Megatron? I followed the instructions in the docs/llama2.md, but got the following errors. I don't understand why transformer_engine in core/transformer/custom_layers imports itself as te at line 6, and in that module, there is no attribute for pytorch.
MODEL_SIZE=7B TP=1 TOP=/mnt MEGATRON_DIR=$TOP/Megatron/Megatron-LM HF_FORMAT_DIR=$TOP/LLaMa/llama_workarea/hf_llama_models/$MODEL_SIZE MEGATRON_FORMAT_DIR=$TOP/Megatron/workspace.Megatron-LM/weights/$MODEL_SIZE TOKENIZER_MODEL=$TOP/LLaMa/llama_workarea/hf_llama_models/7B/$MODEL_SIZE/tokenizer.model
export PYTHONPATH="$PWD:$PWD/tools/checkpoint" echo $PYTHONPATH
python3 tools/checkpoint/util.py --model-type GPT --loader llama2_hf --saver megatron --target-tensor-parallel-size ${TP} --load-dir ${HF_FORMAT_DIR} --save-dir ${MEGATRON_FORMAT_DIR} --tokenizer-model ${TOKENIZER_MODEL}
-- Loaded loader_llama2_hf as the loader. Loaded saver_megatron as the saver. Starting saver... Starting loader... Zarr-based strategies will not be registered because of missing packages Zarr-based strategies will not be registered because of missing packages File "/home/aae14935wb/Share/Megatron/Megatron-LM/megatron/core/models/gpt/gpt_model.py", line 15, in from megatron.core.transformer.transformer_block import TransformerBlock File "/home/aae14935wb/Share/Megatron/Megatron-LM/megatron/core/transformer/transformer_block.py", line 13, in from megatron.core.transformer.custom_layers.transformer_engine import TENorm File "/home/aae14935wb/Share/Megatron/Megatron-LM/megatron/core/transformer/custom_layers/transformer_engine.py", line 71, in class TELinear(te.pytorch.Linear): AttributeError: module 'transformer_engine' has no attribute 'pytorch'

I got the same problem like you! All informations are same. Have you already solved this problem ?

I got the same problem. Have you solved it?

0 replies

2024-04-30T18:21:06Z

github-actions[bot]
bot Apr 30, 2024

Marking as stale. No activity in 60 days.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to convert Llama-2 huggingface checkpoint to the megatron format #1080

{{title}}

Replies: 6 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How to convert Llama-2 huggingface checkpoint to the megatron format #1080

ken-arf Jan 10, 2024

Replies: 6 comments

CaesarWWK Jan 10, 2024

JunyeonL Jan 26, 2024

ZhangEnmao Feb 4, 2024

ZhangEnmao Feb 4, 2024

Zhang202431 Mar 1, 2024

github-actions[bot] bot Apr 30, 2024

ken-arf
Jan 10, 2024

CaesarWWK
Jan 10, 2024

JunyeonL
Jan 26, 2024

ZhangEnmao
Feb 4, 2024

ZhangEnmao
Feb 4, 2024

Zhang202431
Mar 1, 2024

github-actions[bot]
bot Apr 30, 2024