You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
AWS EC2 G6e.xlarge instance (1 L40S GPU), Linux Machine
Latest lorax version as of today, using the docker command
Python 3.12.6
PyTorch version: 2.4.0+cu121
CUDA version: 12.1
Information
Docker
The CLI directly
Tasks
An officially supported command
My own modifications
Reproduction
Run the following command on an appropriate machine:
docker run --gpus all \
--shm-size 1g \
-p 8080:80 \
-v $PWD/data:/data \
ghcr.io/predibase/lorax:main \
--model-id google/gemma-2-2b
which is the standard command as given in the README. I was able to reproduce with different base models as well.
See this error log:
2024-10-24T01:16:07.062275Z ERROR shard-manager: lorax_launcher: Shard complete standard error output:
2024-10-24 01:15:59.960 | INFO | lorax_server.utils.state:<module>:19 - Backend = fa2
2024-10-24 01:15:59.961 | INFO | lorax_server.utils.state:<module>:21 - Prefix caching = False
2024-10-24 01:15:59.961 | INFO | lorax_server.utils.state:<module>:22 - Chunked prefill = False
/opt/conda/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:79: FutureWarning: You are using a Backend <class 'lorax_server.utils.dist.FakeGroup'> as a ProcessGroup. This usage is deprecated since PyTorch 2.0. Please use a public API of PyTorch Distributed instead.
return func(*args, **kwargs)
Traceback (most recent call last):
File "/opt/conda/bin/lorax-server", line 8, in <module>
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/lorax_server/cli.py", line 91, in serve
server.serve(
File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 428, in serve
asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 289, in serve_inner
model = get_model(
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/__init__.py", line 174, in get_model
return FlashLlama(
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/flash_llama.py", line 40, in __init__
super().__init__(
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/flash_causal_lm.py", line 989, in __init__
raise ValueError(
ValueError: Model does not have lm head so it is presumed to be for embeddings.No embedding_dim was provided so we cannot load the model.Please pass in an embedding_dim to the model.
rank=0
2024-10-24T01:16:07.150703Z ERROR lorax_launcher: Shard 0 failed to start
2024-10-24T01:16:07.150737Z INFO lorax_launcher: Shutting down shards
Error: ShardCannotStart
Expected behavior
A causal language model should not be misinterpreted as an embedding model.
System Info
AWS EC2 G6e.xlarge instance (1 L40S GPU), Linux Machine
Latest lorax version as of today, using the docker command
Python 3.12.6
PyTorch version: 2.4.0+cu121
CUDA version: 12.1
Information
Tasks
Reproduction
which is the standard command as given in the README. I was able to reproduce with different base models as well.
Expected behavior
A causal language model should not be misinterpreted as an embedding model.
The issue may be caused by this change: https://github.com/predibase/lorax/pull/653/files#diff-d3148674a22b702ae6d1d9a2f9c4e3a1bc550bea2050d44e803a36faa1a681daR994
The text was updated successfully, but these errors were encountered: