Skip to content

Commit

Permalink
llava model added to readme gaudi
Browse files Browse the repository at this point in the history
  • Loading branch information
adobrzyniewicz-habana committed Feb 5, 2025
1 parent 85eb147 commit 6a3147d
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README_GAUDI.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,7 @@ The following configurations have been validated to be function with Gaudi2 devi
- [meta-llama/Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) with tensor parallelism on 8x HPU, BF16 datatype with random or greedy sampling
- [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) on single HPU or with tensor parallelism on 2x HPU, BF16 datatype with random or greedy sampling
- [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) with tensor parallelism on 2x HPU, BF16 datatype with random or greedy sampling
- [llava-hf/llava-1.5-7b-hf](https://huggingface.co/llava-hf/llava-1.5-7b-hf) on single HPU or with tensor parallelism on 8x HPU, BF16 datatype

# Performance Tuning

Expand Down Expand Up @@ -368,6 +369,7 @@ Additionally, there are HPU PyTorch Bridge environment variables impacting vLLM

- `PT_HPU_LAZY_MODE`: if `0`, PyTorch Eager backend for Gaudi will be used, if `1` PyTorch Lazy backend for Gaudi will be used. `1` is the default.
- `PT_HPU_ENABLE_LAZY_COLLECTIVES` must be set to `true` for tensor parallel inference with HPU Graphs.
- `PT_HPUGRAPH_DISABLE_TENSOR_CACHE` must be set to `false` for llava model.

# Quantization, FP8 Inference and Model Calibration Process

Expand Down

0 comments on commit 6a3147d

Please sign in to comment.