Issue: recognizing a base causal language model as an embedding model #657

veezbo · 2024-10-24T01:33:39Z

System Info

AWS EC2 G6e.xlarge instance (1 L40S GPU), Linux Machine

Latest lorax version as of today, using the docker command

Python 3.12.6
PyTorch version: 2.4.0+cu121
CUDA version: 12.1

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

Run the following command on an appropriate machine:

docker run --gpus all \
  --shm-size 1g \
  -p 8080:80 \
  -v $PWD/data:/data \
  ghcr.io/predibase/lorax:main \
  --model-id google/gemma-2-2b

which is the standard command as given in the README. I was able to reproduce with different base models as well.

See this error log:

2024-10-24T01:16:07.062275Z ERROR shard-manager: lorax_launcher: Shard complete standard error output:

2024-10-24 01:15:59.960 | INFO     | lorax_server.utils.state:<module>:19 - Backend = fa2
2024-10-24 01:15:59.961 | INFO     | lorax_server.utils.state:<module>:21 - Prefix caching = False
2024-10-24 01:15:59.961 | INFO     | lorax_server.utils.state:<module>:22 - Chunked prefill = False
/opt/conda/lib/python3.10/site-packages/torch/distributed/c10d_logger.py:79: FutureWarning: You are using a Backend <class 'lorax_server.utils.dist.FakeGroup'> as a ProcessGroup. This usage is deprecated since PyTorch 2.0. Please use a public API of PyTorch Distributed instead.
  return func(*args, **kwargs)
Traceback (most recent call last):

  File "/opt/conda/bin/lorax-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/cli.py", line 91, in serve
    server.serve(

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 428, in serve
    asyncio.run(

  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)

  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 289, in serve_inner
    model = get_model(

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/__init__.py", line 174, in get_model
    return FlashLlama(

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/flash_llama.py", line 40, in __init__
    super().__init__(

  File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/flash_causal_lm.py", line 989, in __init__
    raise ValueError(

ValueError: Model does not have lm head so it is presumed to be for embeddings.No embedding_dim was provided so we cannot load the model.Please pass in an embedding_dim to the model.
 rank=0
2024-10-24T01:16:07.150703Z ERROR lorax_launcher: Shard 0 failed to start
2024-10-24T01:16:07.150737Z  INFO lorax_launcher: Shutting down shards
Error: ShardCannotStart

Expected behavior

A causal language model should not be misinterpreted as an embedding model.

The issue may be caused by this change: https://github.com/predibase/lorax/pull/653/files#diff-d3148674a22b702ae6d1d9a2f9c4e3a1bc550bea2050d44e803a36faa1a681daR994

The text was updated successfully, but these errors were encountered:

codybum · 2024-10-28T16:22:36Z

We are experiencing this as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue: recognizing a base causal language model as an embedding model #657

Issue: recognizing a base causal language model as an embedding model #657

veezbo commented Oct 24, 2024 •

edited

Loading

codybum commented Oct 28, 2024

Issue: recognizing a base causal language model as an embedding model #657

Issue: recognizing a base causal language model as an embedding model #657

Comments

veezbo commented Oct 24, 2024 • edited Loading

System Info

Information

Tasks

Reproduction

Expected behavior

codybum commented Oct 28, 2024

veezbo commented Oct 24, 2024 •

edited

Loading