not able to build directory using build.py #3

nihalkumar2k21 · 2024-03-21T07:29:12Z

(mlr_chat) anil@anil-gpu2:/media/anil/New Volume/nihal/mlr_chat$ ./build-mistral.sh
You are using a model of type mistral to instantiate a model of type llama. This is not supported for all configurations of models and can yield errors.
[TensorRT-LLM] TensorRT-LLM version: 0.8.0Traceback (most recent call last):
File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 895, in
args = parse_arguments()
File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 549, in parse_arguments
lora_config = LoraConfig.from_hf(args.hf_lora_dir,
TypeError: LoraConfig.from_hf() missing 1 required positional argument: 'trtllm_modules_to_hf_modules'
(mlr_chat) anil@anil-gpu2:/media/anil/New Volume/nihal/mlr_chat$ ./build-llama.sh
[TensorRT-LLM] TensorRT-LLM version: 0.8.0Traceback (most recent call last):
File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 895, in
args = parse_arguments()
File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 549, in parse_arguments
lora_config = LoraConfig.from_hf(args.hf_lora_dir,
TypeError: LoraConfig.from_hf() missing 1 required positional argument: 'trtllm_modules_to_hf_modules'

c6du · 2024-03-22T00:30:40Z

I think you can try setting it to an empty dictionary like:
lora_config = LoraConfig.from_hf(args.hf_lora_dir, hf_modules_to_trtllm_modules, dict())

if you check LoraConfig class you can notice from_hf actually called init function and this argument default value is a empty dictionary.

sugar5727 · 2024-03-22T02:32:40Z

You need use the tensorrt-llm==0.7.1

Vishwa0703 · 2024-03-22T13:54:28Z

I think you can try setting it to an empty dictionary like: lora_config = LoraConfig.from_hf(args.hf_lora_dir, hf_modules_to_trtllm_modules, dict())

if you check LoraConfig class you can notice from_hf actually called init function and this argument default value is a empty dictionary.

After setting an empty dict and running build.sh getting

(trtllm) vishwajeet@vishwa:~/Desktop/MYGPT/trt-llm-rag-linux$ bash build-llama.sh
[TensorRT-LLM] TensorRT-LLM version: 0.8.0[03/22/2024-19:03:13] [TRT-LLM] [I] Serially build TensorRT engines.
[03/22/2024-19:03:15] [TRT] [I] [MemUsageChange] Init CUDA: CPU +4032, GPU +0, now: CPU 5647, GPU 1383 (MiB)
[03/22/2024-19:03:16] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1798, GPU +316, now: CPU 7581, GPU 1699 (MiB)
[03/22/2024-19:03:16] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
[03/22/2024-19:03:16] [TRT-LLM] [W] Invalid timing cache, using freshly created one
[03/22/2024-19:03:17] [TRT-LLM] [I] [MemUsage] Rank 0 Engine build starts - Allocated Memory: Host 8.5100 (GiB) Device 1.6595 (GiB)
Traceback (most recent call last):
File "/home/vishwajeet/Desktop/MYGPT/trt-llm-rag-linux/build.py", line 908, in
build(0, args)
File "/home/vishwajeet/Desktop/MYGPT/trt-llm-rag-linux/build.py", line 852, in build
engine = build_rank_engine(builder, builder_config, engine_name,
File "/home/vishwajeet/Desktop/MYGPT/trt-llm-rag-linux/build.py", line 613, in build_rank_engine
tensorrt_llm_llama = tensorrt_llm.models.LLaMAForCausalLM(
File "/home/vishwajeet/miniconda3/envs/trtllm/lib/python3.10/site-packages/tensorrt_llm/models/modeling_utils.py", line 284, in call
obj = type.call(cls, *args, **kwargs)
TypeError: LLaMAForCausalLM.init() got an unexpected keyword argument 'num_layers'

sugar5727 · 2024-03-22T13:59:05Z

I think you can try setting it to an empty dictionary like: lora_config = LoraConfig.from_hf(args.hf_lora_dir, hf_modules_to_trtllm_modules, dict())
if you check LoraConfig class you can notice from_hf actually called init function and this argument default value is a empty dictionary.

After setting an empty dict and running build.sh getting

(trtllm) vishwajeet@vishwa:~/Desktop/MYGPT/trt-llm-rag-linux$ bash build-llama.sh [TensorRT-LLM] TensorRT-LLM version: 0.8.0[03/22/2024-19:03:13] [TRT-LLM] [I] Serially build TensorRT engines. [03/22/2024-19:03:15] [TRT] [I] [MemUsageChange] Init CUDA: CPU +4032, GPU +0, now: CPU 5647, GPU 1383 (MiB) [03/22/2024-19:03:16] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1798, GPU +316, now: CPU 7581, GPU 1699 (MiB) [03/22/2024-19:03:16] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading [03/22/2024-19:03:16] [TRT-LLM] [W] Invalid timing cache, using freshly created one [03/22/2024-19:03:17] [TRT-LLM] [I] [MemUsage] Rank 0 Engine build starts - Allocated Memory: Host 8.5100 (GiB) Device 1.6595 (GiB) Traceback (most recent call last): File "/home/vishwajeet/Desktop/MYGPT/trt-llm-rag-linux/build.py", line 908, in build(0, args) File "/home/vishwajeet/Desktop/MYGPT/trt-llm-rag-linux/build.py", line 852, in build engine = build_rank_engine(builder, builder_config, engine_name, File "/home/vishwajeet/Desktop/MYGPT/trt-llm-rag-linux/build.py", line 613, in build_rank_engine tensorrt_llm_llama = tensorrt_llm.models.LLaMAForCausalLM( File "/home/vishwajeet/miniconda3/envs/trtllm/lib/python3.10/site-packages/tensorrt_llm/models/modeling_utils.py", line 284, in call obj = type.call(cls, *args, **kwargs) TypeError: LLaMAForCausalLM.init() got an unexpected keyword argument 'num_layers'

same things I met，so you can try to install tensorrt-llm==0.7.1

Vishwa0703 · 2024-03-22T14:18:03Z

@sugar5727 Downgraded to tensorrt-llm==0.7.1 and now I am not facing those issues and I have RTX4060 Laptop 8gb when I run build-llama.sh it starts but gets killed

(trtllm) vishwajeet@vishwa:~/Desktop/MYGPT/trt-llm-rag-linux$ bash build-llama.sh
[03/22/2024-19:41:34] [TRT-LLM] [I] Serially build TensorRT engines.
[03/22/2024-19:41:36] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2991, GPU +0, now: CPU 4121, GPU 1039 (MiB)
[03/22/2024-19:41:37] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1798, GPU +314, now: CPU 6055, GPU 1353 (MiB)
[03/22/2024-19:41:37] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
[03/22/2024-19:41:37] [TRT-LLM] [W] Invalid timing cache, using freshly created one
[03/22/2024-19:41:38] [TRT-LLM] [I] [MemUsage] Rank 0 Engine build starts - Allocated Memory: Host 7.1123 (GiB) Device 1.3216 (GiB)
build-llama.sh: line 1: 41317 Killed python build.py --model_dir './model/llama/llama13_hf' --quant_ckpt_path './model/llama/llama13_int4_awq_weights/llama_tp1_rank0.npz' --dtype float16 --remove_input_padding --use_gpt_attention_plugin float16 --enable_context_fmha --use_gemm_plugin float16 --use_weight_only --weight_only_precision int4_awq --per_group --output_dir './model/llama/llama13_int4_engine' --world_size 1 --tp_size 1 --parallel_build --max_input_len 3900 --max_batch_size 1 --max_output_len 1024

sugar5727 · 2024-03-22T14:24:34Z

@sugar5727 Downgraded to tensorrt-llm==0.7.1 and now I am not facing those issues and I have RTX4060 Laptop 8gb when I run build-llama.sh it starts but gets killed

(trtllm) vishwajeet@vishwa:~/Desktop/MYGPT/trt-llm-rag-linux$ bash build-llama.sh [03/22/2024-19:41:34] [TRT-LLM] [I] Serially build TensorRT engines. [03/22/2024-19:41:36] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2991, GPU +0, now: CPU 4121, GPU 1039 (MiB) [03/22/2024-19:41:37] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1798, GPU +314, now: CPU 6055, GPU 1353 (MiB) [03/22/2024-19:41:37] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading [03/22/2024-19:41:37] [TRT-LLM] [W] Invalid timing cache, using freshly created one [03/22/2024-19:41:38] [TRT-LLM] [I] [MemUsage] Rank 0 Engine build starts - Allocated Memory: Host 7.1123 (GiB) Device 1.3216 (GiB) build-llama.sh: line 1: 41317 Killed python build.py --model_dir './model/llama/llama13_hf' --quant_ckpt_path './model/llama/llama13_int4_awq_weights/llama_tp1_rank0.npz' --dtype float16 --remove_input_padding --use_gpt_attention_plugin float16 --enable_context_fmha --use_gemm_plugin float16 --use_weight_only --weight_only_precision int4_awq --per_group --output_dir './model/llama/llama13_int4_engine' --world_size 1 --tp_size 1 --parallel_build --max_input_len 3900 --max_batch_size 1 --max_output_len 1024

sry, I dont face it before

Vishwa0703 · 2024-03-22T14:32:31Z

@sugar5727 which gpu you have?

sugar5727 · 2024-03-23T04:53:16Z

@sugar5727 which gpu you have?

RTX 4090

nihalkumar2k21 · 2024-03-23T08:49:12Z

(mlr_chat) anil@anil-gpu2:/media/anil/New Volume/nihal/mlr_chat$ ./build-mistral.sh You are using a model of type mistral to instantiate a model of type llama. This is not supported for all configurations of models and can yield errors. [TensorRT-LLM] TensorRT-LLM version: 0.8.0Traceback (most recent call last): File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 895, in args = parse_arguments() File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 549, in parse_arguments lora_config = LoraConfig.from_hf(args.hf_lora_dir, TypeError: LoraConfig.from_hf() missing 1 required positional argument: 'trtllm_modules_to_hf_modules' (mlr_chat) anil@anil-gpu2:/media/anil/New Volume/nihal/mlr_chat$ ./build-llama.sh [TensorRT-LLM] TensorRT-LLM version: 0.8.0Traceback (most recent call last): File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 895, in args = parse_arguments() File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 549, in parse_arguments lora_config = LoraConfig.from_hf(args.hf_lora_dir, TypeError: LoraConfig.from_hf() missing 1 required positional argument: 'trtllm_modules_to_hf_modules'

pip uninstall tensorrt_llm

then re-install

pip3 install tensorrt_llm==0.7.1 -U --pre --extra-index-url https://pypi.nvidia.com --log=debug.txt

nihalkumar2k21 · 2024-03-23T11:58:51Z

new error:

(trtllm) anil@anil-gpu2:/media/anil/New Volume/nihal/mlr_chat$ python3 app.py
Invalid MIT-MAGIC-COOKIE-1 key[anil-gpu2:45735] *** Process received signal ***
[anil-gpu2:45735] Signal: Segmentation fault (11)
[anil-gpu2:45735] Signal code: Address not mapped (1)
[anil-gpu2:45735] Failing at address: 0x440000e9
[anil-gpu2:45735] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f206b1b2420]
[anil-gpu2:45735] [ 1] /lib/x86_64-linux-gnu/libmpi.so.40(PMPI_Comm_set_errhandler+0x47)[0x7f1e0f681fc7]
[anil-gpu2:45735] [ 2] /home/anil/miniconda3/envs/trtllm/lib/python3.10/site-packages/mpi4py/MPI.cpython-310-x86_64-linux-gnu.so(+0x9abf0)[0x7f1dea220bf0]
[anil-gpu2:45735] [ 3] /home/anil/miniconda3/envs/trtllm/lib/python3.10/site-packages/mpi4py/MPI.cpython-310-x86_64-linux-gnu.so(+0x2decf)[0x7f1dea1b3ecf]
[anil-gpu2:45735] [ 4] python3(PyModule_ExecDef+0x70)[0x597d40]
[anil-gpu2:45735] [ 5] python3[0x5990c9]
[anil-gpu2:45735] [ 6] python3[0x4fd37b]
[anil-gpu2:45735] [ 7] python3(_PyEval_EvalFrameDefault+0x5a74)[0x4f37a4]
[anil-gpu2:45735] [ 8] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f]
[anil-gpu2:45735] [ 9] python3(_PyEval_EvalFrameDefault+0x4b26)[0x4f2856]
[anil-gpu2:45735] [10] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f]
[anil-gpu2:45735] [11] python3(_PyEval_EvalFrameDefault+0x731)[0x4ee461]
[anil-gpu2:45735] [12] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f]
[anil-gpu2:45735] [13] python3(_PyEval_EvalFrameDefault+0x31f)[0x4ee04f]
[anil-gpu2:45735] [14] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f]
[anil-gpu2:45735] [15] python3(_PyEval_EvalFrameDefault+0x31f)[0x4ee04f]
[anil-gpu2:45735] [16] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f]
[anil-gpu2:45735] [17] python3[0x4fd514]
[anil-gpu2:45735] [18] python3(_PyObject_CallMethodIdObjArgs+0x137)[0x50c327]
[anil-gpu2:45735] [19] python3(PyImport_ImportModuleLevelObject+0x525)[0x50b685]
[anil-gpu2:45735] [20] python3[0x517454]
[anil-gpu2:45735] [21] python3[0x4fd907]
[anil-gpu2:45735] [22] python3(PyObject_Call+0x209)[0x50a259]
[anil-gpu2:45735] [23] python3(_PyEval_EvalFrameDefault+0x5a74)[0x4f37a4]
[anil-gpu2:45735] [24] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f]
[anil-gpu2:45735] [25] python3(_PyEval_EvalFrameDefault+0x31f)[0x4ee04f]
[anil-gpu2:45735] [26] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f]
[anil-gpu2:45735] [27] python3[0x4fd514]
[anil-gpu2:45735] [28] python3(_PyObject_CallMethodIdObjArgs+0x137)[0x50c327]
[anil-gpu2:45735] [29] python3(PyImport_ImportModuleLevelObject+0x9da)[0x50bb3a]
[anil-gpu2:45735] *** End of error message ***
Segmentation fault (core dumped)

(trtllm) anil@anil-gpu2:/media/anil/New Volume/nihal/mlr_chat$ conda list

packages in environment at /home/anil/miniconda3/envs/trtllm:

suggest me the solution......

Vishwa0703 · 2024-04-02T06:12:27Z

@sugar5727
If you have single 4090 then when you run build_llama.sh/build_mistral.sh it builds TensorRt serially right?
Can you share the CPU and GPU usage while building the engine of llama/mistral.
Because when I am running the build_mistral.sh in place of GPU CPU is being consumed attaching the screen shot

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

not able to build directory using build.py #3

not able to build directory using build.py #3

nihalkumar2k21 commented Mar 21, 2024

c6du commented Mar 22, 2024

sugar5727 commented Mar 22, 2024

Vishwa0703 commented Mar 22, 2024

sugar5727 commented Mar 22, 2024

Vishwa0703 commented Mar 22, 2024

sugar5727 commented Mar 22, 2024

Vishwa0703 commented Mar 22, 2024 •

edited

Loading

sugar5727 commented Mar 23, 2024

nihalkumar2k21 commented Mar 23, 2024

nihalkumar2k21 commented Mar 23, 2024

Vishwa0703 commented Apr 2, 2024 •

edited

Loading

not able to build directory using build.py #3

not able to build directory using build.py #3

Comments

nihalkumar2k21 commented Mar 21, 2024

c6du commented Mar 22, 2024

sugar5727 commented Mar 22, 2024

Vishwa0703 commented Mar 22, 2024

sugar5727 commented Mar 22, 2024

Vishwa0703 commented Mar 22, 2024

sugar5727 commented Mar 22, 2024

Vishwa0703 commented Mar 22, 2024 • edited Loading

sugar5727 commented Mar 23, 2024

nihalkumar2k21 commented Mar 23, 2024

nihalkumar2k21 commented Mar 23, 2024

packages in environment at /home/anil/miniconda3/envs/trtllm:

Name Version Build Channel

Vishwa0703 commented Apr 2, 2024 • edited Loading

Vishwa0703 commented Mar 22, 2024 •

edited

Loading

Vishwa0703 commented Apr 2, 2024 •

edited

Loading