Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not able to build directory using build.py #3

Open
nihalkumar2k21 opened this issue Mar 21, 2024 · 11 comments
Open

not able to build directory using build.py #3

nihalkumar2k21 opened this issue Mar 21, 2024 · 11 comments

Comments

@nihalkumar2k21
Copy link

(mlr_chat) anil@anil-gpu2:/media/anil/New Volume/nihal/mlr_chat$ ./build-mistral.sh
You are using a model of type mistral to instantiate a model of type llama. This is not supported for all configurations of models and can yield errors.
[TensorRT-LLM] TensorRT-LLM version: 0.8.0Traceback (most recent call last):
File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 895, in
args = parse_arguments()
File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 549, in parse_arguments
lora_config = LoraConfig.from_hf(args.hf_lora_dir,
TypeError: LoraConfig.from_hf() missing 1 required positional argument: 'trtllm_modules_to_hf_modules'
(mlr_chat) anil@anil-gpu2:/media/anil/New Volume/nihal/mlr_chat$ ./build-llama.sh
[TensorRT-LLM] TensorRT-LLM version: 0.8.0Traceback (most recent call last):
File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 895, in
args = parse_arguments()
File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 549, in parse_arguments
lora_config = LoraConfig.from_hf(args.hf_lora_dir,
TypeError: LoraConfig.from_hf() missing 1 required positional argument: 'trtllm_modules_to_hf_modules'

@c6du
Copy link

c6du commented Mar 22, 2024

I think you can try setting it to an empty dictionary like:
lora_config = LoraConfig.from_hf(args.hf_lora_dir, hf_modules_to_trtllm_modules, dict())

if you check LoraConfig class you can notice from_hf actually called init function and this argument default value is a empty dictionary.

@sugar5727
Copy link

You need use the tensorrt-llm==0.7.1

@Vishwa0703
Copy link

I think you can try setting it to an empty dictionary like: lora_config = LoraConfig.from_hf(args.hf_lora_dir, hf_modules_to_trtllm_modules, dict())

if you check LoraConfig class you can notice from_hf actually called init function and this argument default value is a empty dictionary.

After setting an empty dict and running build.sh getting

(trtllm) vishwajeet@vishwa:~/Desktop/MYGPT/trt-llm-rag-linux$ bash build-llama.sh
[TensorRT-LLM] TensorRT-LLM version: 0.8.0[03/22/2024-19:03:13] [TRT-LLM] [I] Serially build TensorRT engines.
[03/22/2024-19:03:15] [TRT] [I] [MemUsageChange] Init CUDA: CPU +4032, GPU +0, now: CPU 5647, GPU 1383 (MiB)
[03/22/2024-19:03:16] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1798, GPU +316, now: CPU 7581, GPU 1699 (MiB)
[03/22/2024-19:03:16] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
[03/22/2024-19:03:16] [TRT-LLM] [W] Invalid timing cache, using freshly created one
[03/22/2024-19:03:17] [TRT-LLM] [I] [MemUsage] Rank 0 Engine build starts - Allocated Memory: Host 8.5100 (GiB) Device 1.6595 (GiB)
Traceback (most recent call last):
File "/home/vishwajeet/Desktop/MYGPT/trt-llm-rag-linux/build.py", line 908, in
build(0, args)
File "/home/vishwajeet/Desktop/MYGPT/trt-llm-rag-linux/build.py", line 852, in build
engine = build_rank_engine(builder, builder_config, engine_name,
File "/home/vishwajeet/Desktop/MYGPT/trt-llm-rag-linux/build.py", line 613, in build_rank_engine
tensorrt_llm_llama = tensorrt_llm.models.LLaMAForCausalLM(
File "/home/vishwajeet/miniconda3/envs/trtllm/lib/python3.10/site-packages/tensorrt_llm/models/modeling_utils.py", line 284, in call
obj = type.call(cls, *args, **kwargs)
TypeError: LLaMAForCausalLM.init() got an unexpected keyword argument 'num_layers'

@sugar5727
Copy link

I think you can try setting it to an empty dictionary like: lora_config = LoraConfig.from_hf(args.hf_lora_dir, hf_modules_to_trtllm_modules, dict())
if you check LoraConfig class you can notice from_hf actually called init function and this argument default value is a empty dictionary.

After setting an empty dict and running build.sh getting

(trtllm) vishwajeet@vishwa:~/Desktop/MYGPT/trt-llm-rag-linux$ bash build-llama.sh [TensorRT-LLM] TensorRT-LLM version: 0.8.0[03/22/2024-19:03:13] [TRT-LLM] [I] Serially build TensorRT engines. [03/22/2024-19:03:15] [TRT] [I] [MemUsageChange] Init CUDA: CPU +4032, GPU +0, now: CPU 5647, GPU 1383 (MiB) [03/22/2024-19:03:16] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1798, GPU +316, now: CPU 7581, GPU 1699 (MiB) [03/22/2024-19:03:16] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading [03/22/2024-19:03:16] [TRT-LLM] [W] Invalid timing cache, using freshly created one [03/22/2024-19:03:17] [TRT-LLM] [I] [MemUsage] Rank 0 Engine build starts - Allocated Memory: Host 8.5100 (GiB) Device 1.6595 (GiB) Traceback (most recent call last): File "/home/vishwajeet/Desktop/MYGPT/trt-llm-rag-linux/build.py", line 908, in build(0, args) File "/home/vishwajeet/Desktop/MYGPT/trt-llm-rag-linux/build.py", line 852, in build engine = build_rank_engine(builder, builder_config, engine_name, File "/home/vishwajeet/Desktop/MYGPT/trt-llm-rag-linux/build.py", line 613, in build_rank_engine tensorrt_llm_llama = tensorrt_llm.models.LLaMAForCausalLM( File "/home/vishwajeet/miniconda3/envs/trtllm/lib/python3.10/site-packages/tensorrt_llm/models/modeling_utils.py", line 284, in call obj = type.call(cls, *args, **kwargs) TypeError: LLaMAForCausalLM.init() got an unexpected keyword argument 'num_layers'

same things I met,so you can try to install tensorrt-llm==0.7.1

@Vishwa0703
Copy link

@sugar5727 Downgraded to tensorrt-llm==0.7.1 and now I am not facing those issues and I have RTX4060 Laptop 8gb when I run build-llama.sh it starts but gets killed

(trtllm) vishwajeet@vishwa:~/Desktop/MYGPT/trt-llm-rag-linux$ bash build-llama.sh
[03/22/2024-19:41:34] [TRT-LLM] [I] Serially build TensorRT engines.
[03/22/2024-19:41:36] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2991, GPU +0, now: CPU 4121, GPU 1039 (MiB)
[03/22/2024-19:41:37] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1798, GPU +314, now: CPU 6055, GPU 1353 (MiB)
[03/22/2024-19:41:37] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
[03/22/2024-19:41:37] [TRT-LLM] [W] Invalid timing cache, using freshly created one
[03/22/2024-19:41:38] [TRT-LLM] [I] [MemUsage] Rank 0 Engine build starts - Allocated Memory: Host 7.1123 (GiB) Device 1.3216 (GiB)
build-llama.sh: line 1: 41317 Killed python build.py --model_dir './model/llama/llama13_hf' --quant_ckpt_path './model/llama/llama13_int4_awq_weights/llama_tp1_rank0.npz' --dtype float16 --remove_input_padding --use_gpt_attention_plugin float16 --enable_context_fmha --use_gemm_plugin float16 --use_weight_only --weight_only_precision int4_awq --per_group --output_dir './model/llama/llama13_int4_engine' --world_size 1 --tp_size 1 --parallel_build --max_input_len 3900 --max_batch_size 1 --max_output_len 1024

@sugar5727
Copy link

@sugar5727 Downgraded to tensorrt-llm==0.7.1 and now I am not facing those issues and I have RTX4060 Laptop 8gb when I run build-llama.sh it starts but gets killed

(trtllm) vishwajeet@vishwa:~/Desktop/MYGPT/trt-llm-rag-linux$ bash build-llama.sh [03/22/2024-19:41:34] [TRT-LLM] [I] Serially build TensorRT engines. [03/22/2024-19:41:36] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2991, GPU +0, now: CPU 4121, GPU 1039 (MiB) [03/22/2024-19:41:37] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1798, GPU +314, now: CPU 6055, GPU 1353 (MiB) [03/22/2024-19:41:37] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading [03/22/2024-19:41:37] [TRT-LLM] [W] Invalid timing cache, using freshly created one [03/22/2024-19:41:38] [TRT-LLM] [I] [MemUsage] Rank 0 Engine build starts - Allocated Memory: Host 7.1123 (GiB) Device 1.3216 (GiB) build-llama.sh: line 1: 41317 Killed python build.py --model_dir './model/llama/llama13_hf' --quant_ckpt_path './model/llama/llama13_int4_awq_weights/llama_tp1_rank0.npz' --dtype float16 --remove_input_padding --use_gpt_attention_plugin float16 --enable_context_fmha --use_gemm_plugin float16 --use_weight_only --weight_only_precision int4_awq --per_group --output_dir './model/llama/llama13_int4_engine' --world_size 1 --tp_size 1 --parallel_build --max_input_len 3900 --max_batch_size 1 --max_output_len 1024

sry, I dont face it before

@Vishwa0703
Copy link

Vishwa0703 commented Mar 22, 2024

@sugar5727 which gpu you have?

@sugar5727
Copy link

@sugar5727 which gpu you have?

RTX 4090

@nihalkumar2k21
Copy link
Author

(mlr_chat) anil@anil-gpu2:/media/anil/New Volume/nihal/mlr_chat$ ./build-mistral.sh You are using a model of type mistral to instantiate a model of type llama. This is not supported for all configurations of models and can yield errors. [TensorRT-LLM] TensorRT-LLM version: 0.8.0Traceback (most recent call last): File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 895, in args = parse_arguments() File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 549, in parse_arguments lora_config = LoraConfig.from_hf(args.hf_lora_dir, TypeError: LoraConfig.from_hf() missing 1 required positional argument: 'trtllm_modules_to_hf_modules' (mlr_chat) anil@anil-gpu2:/media/anil/New Volume/nihal/mlr_chat$ ./build-llama.sh [TensorRT-LLM] TensorRT-LLM version: 0.8.0Traceback (most recent call last): File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 895, in args = parse_arguments() File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 549, in parse_arguments lora_config = LoraConfig.from_hf(args.hf_lora_dir, TypeError: LoraConfig.from_hf() missing 1 required positional argument: 'trtllm_modules_to_hf_modules'

pip uninstall tensorrt_llm

then re-install

pip3 install tensorrt_llm==0.7.1 -U --pre --extra-index-url https://pypi.nvidia.com --log=debug.txt

@nihalkumar2k21
Copy link
Author

new error:

(trtllm) anil@anil-gpu2:/media/anil/New Volume/nihal/mlr_chat$ python3 app.py
Invalid MIT-MAGIC-COOKIE-1 key[anil-gpu2:45735] *** Process received signal ***
[anil-gpu2:45735] Signal: Segmentation fault (11)
[anil-gpu2:45735] Signal code: Address not mapped (1)
[anil-gpu2:45735] Failing at address: 0x440000e9
[anil-gpu2:45735] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f206b1b2420]
[anil-gpu2:45735] [ 1] /lib/x86_64-linux-gnu/libmpi.so.40(PMPI_Comm_set_errhandler+0x47)[0x7f1e0f681fc7]
[anil-gpu2:45735] [ 2] /home/anil/miniconda3/envs/trtllm/lib/python3.10/site-packages/mpi4py/MPI.cpython-310-x86_64-linux-gnu.so(+0x9abf0)[0x7f1dea220bf0]
[anil-gpu2:45735] [ 3] /home/anil/miniconda3/envs/trtllm/lib/python3.10/site-packages/mpi4py/MPI.cpython-310-x86_64-linux-gnu.so(+0x2decf)[0x7f1dea1b3ecf]
[anil-gpu2:45735] [ 4] python3(PyModule_ExecDef+0x70)[0x597d40]
[anil-gpu2:45735] [ 5] python3[0x5990c9]
[anil-gpu2:45735] [ 6] python3[0x4fd37b]
[anil-gpu2:45735] [ 7] python3(_PyEval_EvalFrameDefault+0x5a74)[0x4f37a4]
[anil-gpu2:45735] [ 8] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f]
[anil-gpu2:45735] [ 9] python3(_PyEval_EvalFrameDefault+0x4b26)[0x4f2856]
[anil-gpu2:45735] [10] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f]
[anil-gpu2:45735] [11] python3(_PyEval_EvalFrameDefault+0x731)[0x4ee461]
[anil-gpu2:45735] [12] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f]
[anil-gpu2:45735] [13] python3(_PyEval_EvalFrameDefault+0x31f)[0x4ee04f]
[anil-gpu2:45735] [14] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f]
[anil-gpu2:45735] [15] python3(_PyEval_EvalFrameDefault+0x31f)[0x4ee04f]
[anil-gpu2:45735] [16] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f]
[anil-gpu2:45735] [17] python3[0x4fd514]
[anil-gpu2:45735] [18] python3(_PyObject_CallMethodIdObjArgs+0x137)[0x50c327]
[anil-gpu2:45735] [19] python3(PyImport_ImportModuleLevelObject+0x525)[0x50b685]
[anil-gpu2:45735] [20] python3[0x517454]
[anil-gpu2:45735] [21] python3[0x4fd907]
[anil-gpu2:45735] [22] python3(PyObject_Call+0x209)[0x50a259]
[anil-gpu2:45735] [23] python3(_PyEval_EvalFrameDefault+0x5a74)[0x4f37a4]
[anil-gpu2:45735] [24] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f]
[anil-gpu2:45735] [25] python3(_PyEval_EvalFrameDefault+0x31f)[0x4ee04f]
[anil-gpu2:45735] [26] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f]
[anil-gpu2:45735] [27] python3[0x4fd514]
[anil-gpu2:45735] [28] python3(_PyObject_CallMethodIdObjArgs+0x137)[0x50c327]
[anil-gpu2:45735] [29] python3(PyImport_ImportModuleLevelObject+0x9da)[0x50bb3a]
[anil-gpu2:45735] *** End of error message ***
Segmentation fault (core dumped)

(trtllm) anil@anil-gpu2:/media/anil/New Volume/nihal/mlr_chat$ conda list

packages in environment at /home/anil/miniconda3/envs/trtllm:

Name Version Build Channel

_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
absl-py 2.1.0 pypi_0 pypi
accelerate 0.20.3 pypi_0 pypi
aiofiles 23.2.1 pypi_0 pypi
aiohttp 3.9.3 pypi_0 pypi
aiosignal 1.3.1 pypi_0 pypi
alembic 1.13.1 pypi_0 pypi
altair 5.2.0 pypi_0 pypi
annotated-types 0.6.0 pypi_0 pypi
anyio 3.7.1 pypi_0 pypi
async-timeout 4.0.3 pypi_0 pypi
attrs 23.2.0 pypi_0 pypi
beautifulsoup4 4.12.3 pypi_0 pypi
blas 1.0 mkl
build 1.1.1 pypi_0 pypi
bzip2 1.0.8 h5eee18b_5
ca-certificates 2024.2.2 hbcca054_0 conda-forge
certifi 2024.2.2 pyhd8ed1ab_0 conda-forge
charset-normalizer 2.0.4 pyhd3eb1b0_0
click 8.1.7 pypi_0 pypi
colorama 0.4.6 pypi_0 pypi
colored 2.2.4 pypi_0 pypi
coloredlogs 15.0.1 pypi_0 pypi
contourpy 1.2.0 pypi_0 pypi
ctransformers 0.2.26 pypi_0 pypi
cuda-cudart 12.1.105 0 nvidia
cuda-cupti 12.1.105 0 nvidia
cuda-libraries 12.1.0 0 nvidia
cuda-nvrtc 12.1.105 0 nvidia
cuda-nvtx 12.1.105 0 nvidia
cuda-opencl 12.4.99 0 nvidia
cuda-python 12.2.0 pypi_0 pypi
cuda-runtime 12.1.0 0 nvidia
cycler 0.12.1 pypi_0 pypi
cython 3.0.9 pypi_0 pypi
dataclasses-json 0.6.4 pypi_0 pypi
datasets 2.14.6 pypi_0 pypi
deprecated 1.2.14 pypi_0 pypi
diffusers 0.15.0 pypi_0 pypi
dill 0.3.7 pypi_0 pypi
distro 1.9.0 pypi_0 pypi
docx2txt 0.8 pypi_0 pypi
environs 9.5.0 pypi_0 pypi
evaluate 0.4.1 pypi_0 pypi
exceptiongroup 1.2.0 pypi_0 pypi
faiss-cpu 1.7.4 pypi_0 pypi
fastapi 0.110.0 pypi_0 pypi
ffmpeg 4.3 hf484d3e_0 pytorch
ffmpy 0.3.2 pypi_0 pypi
filelock 3.13.1 py310h06a4308_0
flask 2.2.3 pypi_0 pypi
flask-marshmallow 0.15.0 pypi_0 pypi
flask-migrate 4.0.4 pypi_0 pypi
flask-sqlalchemy 3.0.3 pypi_0 pypi
flatbuffers 24.3.7 pypi_0 pypi
fonttools 4.50.0 pypi_0 pypi
freetype 2.12.1 h4a9f257_0
frozenlist 1.4.1 pypi_0 pypi
fsspec 2023.10.0 pypi_0 pypi
gmp 6.2.1 h295c915_3
gmpy2 2.1.2 py310heeb90bb_0
gnutls 3.6.15 he1e5248_0
gradio 4.14.0 pypi_0 pypi
gradio-client 0.8.0 pypi_0 pypi
greenlet 3.0.3 pypi_0 pypi
grpcio 1.56.0 pypi_0 pypi
h11 0.14.0 pypi_0 pypi
httpcore 1.0.4 pypi_0 pypi
httpx 0.27.0 pypi_0 pypi
huggingface-hub 0.21.4 pypi_0 pypi
humanfriendly 10.0 pypi_0 pypi
idna 3.4 py310h06a4308_0
importlib-metadata 7.1.0 pypi_0 pypi
importlib-resources 6.4.0 pypi_0 pypi
intel-openmp 2023.1.0 hdb19cb5_46306
itsdangerous 2.1.2 pypi_0 pypi
janus 1.0.0 pypi_0 pypi
jinja2 3.1.3 py310h06a4308_0
joblib 1.3.2 pypi_0 pypi
jpeg 9e h5eee18b_1
jsonpatch 1.33 pypi_0 pypi
jsonpointer 2.4 pypi_0 pypi
jsonschema 4.21.1 pypi_0 pypi
jsonschema-specifications 2023.12.1 pypi_0 pypi
kiwisolver 1.4.5 pypi_0 pypi
lame 3.100 h7b6447c_0
langchain 0.0.310 pypi_0 pypi
langsmith 0.0.43 pypi_0 pypi
lark 1.1.9 pypi_0 pypi
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.38 h1181459_1
lerc 3.0 h295c915_0
libcublas 12.1.0.26 0 nvidia
libcufft 11.0.2.4 0 nvidia
libcufile 1.9.0.20 0 nvidia
libcurand 10.3.5.119 0 nvidia
libcusolver 11.4.4.55 0 nvidia
libcusparse 12.0.2.55 0 nvidia
libdeflate 1.17 h5eee18b_1
libffi 3.4.4 h6a678d5_0
libgcc-ng 11.2.0 h1234567_1
libgfortran-ng 7.5.0 h14aa051_20 conda-forge
libgfortran4 7.5.0 h14aa051_20 conda-forge
libgomp 11.2.0 h1234567_1
libiconv 1.16 h7f8727e_2
libidn2 2.3.4 h5eee18b_0
libjpeg-turbo 2.0.0 h9bf148f_0 pytorch
libnpp 12.0.2.50 0 nvidia
libnvjitlink 12.1.105 0 nvidia
libnvjpeg 12.1.1.14 0 nvidia
libpng 1.6.39 h5eee18b_0
libstdcxx-ng 11.2.0 h1234567_1
libtasn1 4.19.0 h5eee18b_0
libtiff 4.5.1 h6a678d5_0
libunistring 0.9.10 h27cfd23_0
libuuid 1.41.5 h5eee18b_0
libwebp-base 1.3.2 h5eee18b_0
llama-index 0.9.27 pypi_0 pypi
llvm-openmp 14.0.6 h9e868ea_0
lz4-c 1.9.4 h6a678d5_0
mako 1.3.2 pypi_0 pypi
markdown-it-py 3.0.0 pypi_0 pypi
markupsafe 2.1.3 py310h5eee18b_0
marshmallow 3.21.1 pypi_0 pypi
matplotlib 3.8.3 pypi_0 pypi
mdurl 0.1.2 pypi_0 pypi
mkl 2023.1.0 h213fc3f_46344
mkl-service 2.4.0 py310h5eee18b_1
mkl_fft 1.3.8 py310h5eee18b_0
mkl_random 1.2.4 py310hdb19cb5_0
mpc 1.1.0 h10f8cd9_1
mpfr 4.0.2 hb69a4c5_1
mpi 1.0 mpich conda-forge
mpi4py 3.1.4 py310hfc96bbd_0
mpich 3.3.2 hc856adb_0
mpmath 1.3.0 py310h06a4308_0
multidict 6.0.5 pypi_0 pypi
multiprocess 0.70.15 pypi_0 pypi
mypy-extensions 1.0.0 pypi_0 pypi
ncurses 6.4 h6a678d5_0
nest-asyncio 1.6.0 pypi_0 pypi
nettle 3.7.3 hbbd107a_1
networkx 3.1 py310h06a4308_0
ninja 1.11.1.1 pypi_0 pypi
nltk 3.8.1 pypi_0 pypi
numpy 1.24.0 pypi_0 pypi
nvidia-ammo 0.7.4 pypi_0 pypi
nvidia-cublas-cu12 12.1.3.1 pypi_0 pypi
nvidia-cuda-cupti-cu12 12.1.105 pypi_0 pypi
nvidia-cuda-nvrtc-cu12 12.1.105 pypi_0 pypi
nvidia-cuda-runtime-cu12 12.1.105 pypi_0 pypi
nvidia-cudnn-cu12 8.9.2.26 pypi_0 pypi
nvidia-cufft-cu12 11.0.2.54 pypi_0 pypi
nvidia-curand-cu12 10.3.2.106 pypi_0 pypi
nvidia-cusolver-cu12 11.4.5.107 pypi_0 pypi
nvidia-cusparse-cu12 12.1.0.106 pypi_0 pypi
nvidia-nccl-cu12 2.18.1 pypi_0 pypi
nvidia-nvjitlink-cu12 12.4.99 pypi_0 pypi
nvidia-nvtx-cu12 12.1.105 pypi_0 pypi
onnx 1.14.1 pypi_0 pypi
onnx-graphsurgeon 0.3.27 pypi_0 pypi
onnxruntime 1.16.3 pypi_0 pypi
openai 1.14.2 pypi_0 pypi
openh264 2.1.1 h4ff587b_0
openjpeg 2.4.0 h3ad879b_0
openssl 3.0.13 h7f8727e_0
optimum 1.17.1 pypi_0 pypi
orjson 3.9.15 pypi_0 pypi
packaging 24.0 pypi_0 pypi
pandas 2.0.3 pypi_0 pypi
pillow 10.2.0 py310h5eee18b_0
pip 23.3.1 py310h06a4308_0
polygraphy 0.49.0 pypi_0 pypi
protobuf 5.26.0 pypi_0 pypi
psutil 5.9.7 pypi_0 pypi
py-cpuinfo 9.0.0 pypi_0 pypi
pyarrow 15.0.2 pypi_0 pypi
pyarrow-hotfix 0.6 pypi_0 pypi
pydantic 2.3.0 pypi_0 pypi
pydantic-core 2.6.3 pypi_0 pypi
pydantic-settings 2.0.3 pypi_0 pypi
pydub 0.25.1 pypi_0 pypi
pygments 2.17.2 pypi_0 pypi
pymilvus 2.3.0 pypi_0 pypi
pynvml 11.5.0 pypi_0 pypi
pyparsing 3.1.2 pypi_0 pypi
pypdf 3.15.5 pypi_0 pypi
pypdf2 3.0.1 pypi_0 pypi
pyproject-hooks 1.0.0 pypi_0 pypi
python 3.10.14 h955ad1f_0
python-dateutil 2.9.0.post0 pypi_0 pypi
python-dotenv 1.0.1 pypi_0 pypi
python-multipart 0.0.9 pypi_0 pypi
pytorch-cuda 12.1 ha16c6d3_5 pytorch
pytorch-mutex 1.0 cuda pytorch
pytube 15.0.0 pypi_0 pypi
pytz 2024.1 pypi_0 pypi
pyyaml 6.0.1 py310h5eee18b_0
readline 8.2 h5eee18b_0
referencing 0.34.0 pypi_0 pypi
regex 2023.12.25 pypi_0 pypi
requests 2.31.0 py310h06a4308_1
responses 0.18.0 pypi_0 pypi
rich 13.7.1 pypi_0 pypi
rouge-score 0.1.2 pypi_0 pypi
rpds-py 0.18.0 pypi_0 pypi
safetensors 0.4.2 pypi_0 pypi
scikit-learn 1.4.1.post1 pypi_0 pypi
scipy 1.12.0 pypi_0 pypi
semantic-version 2.10.0 pypi_0 pypi
sentence-transformers 2.2.2 pypi_0 pypi
sentencepiece 0.1.99 pypi_0 pypi
setuptools 68.2.2 py310h06a4308_0
shellingham 1.5.4 pypi_0 pypi
six 1.16.0 pypi_0 pypi
sniffio 1.3.1 pypi_0 pypi
soupsieve 2.5 pypi_0 pypi
sqlalchemy 2.0.28 pypi_0 pypi
sqlite 3.41.2 h5eee18b_0
starlette 0.36.3 pypi_0 pypi
sympy 1.12 py310h06a4308_0
tbb 2021.8.0 hdb19cb5_0
tenacity 8.2.3 pypi_0 pypi
tensorrt 9.2.0.post12.dev5 pypi_0 pypi
tensorrt-bindings 9.2.0.post12.dev5 pypi_0 pypi
tensorrt-libs 9.2.0.post12.dev5 pypi_0 pypi
tensorrt-llm 0.7.1 pypi_0 pypi
threadpoolctl 3.4.0 pypi_0 pypi
tiktoken 0.3.3 pypi_0 pypi
tk 8.6.12 h1ccaba5_0
tokenizers 0.13.4rc3 pypi_0 pypi
tomli 2.0.1 pypi_0 pypi
tomlkit 0.12.0 pypi_0 pypi
toolz 0.12.1 pypi_0 pypi
torch 2.1.2 pypi_0 pypi
torchaudio 2.2.1 py310_cu121 pytorch
torchvision 0.17.1 py310_cu121 pytorch
tqdm 4.66.2 pypi_0 pypi
transformers 4.33.1 pypi_0 pypi
triton 2.1.0 pypi_0 pypi
typer 0.9.0 pypi_0 pypi
typing-inspect 0.9.0 pypi_0 pypi
typing_extensions 4.9.0 py310h06a4308_1
tzdata 2024.1 pypi_0 pypi
ujson 5.9.0 pypi_0 pypi
urllib3 2.1.0 py310h06a4308_0
uvicorn 0.29.0 pypi_0 pypi
websockets 11.0.3 pypi_0 pypi
werkzeug 3.0.1 pypi_0 pypi
wheel 0.41.2 py310h06a4308_0
wrapt 1.16.0 pypi_0 pypi
xxhash 3.4.1 pypi_0 pypi
xz 5.4.6 h5eee18b_0
yaml 0.2.5 h7b6447c_0
yarl 1.9.4 pypi_0 pypi
youtube-transcript-api 0.6.2 pypi_0 pypi
zipp 3.18.1 pypi_0 pypi
zlib 1.2.13 h5eee18b_0
zstd 1.5.5 hc292b87_0

suggest me the solution......

@Vishwa0703
Copy link

Vishwa0703 commented Apr 2, 2024

@sugar5727
If you have single 4090 then when you run build_llama.sh/build_mistral.sh it builds TensorRt serially right?
Can you share the CPU and GPU usage while building the engine of llama/mistral.
Because when I am running the build_mistral.sh in place of GPU CPU is being consumed attaching the screen shot

Screenshot from 2024-04-02 11-48-31

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants