Skip to content

Commit

Permalink
Update part of Quickstart guide in mddocs (1/2)
Browse files Browse the repository at this point in the history
* Quickstart index.rst -> index.md

* Update for Linux Install Quickstart

* Update md docs for Windows Install QuickStart

* Small fix

* Add blank lines

* Update mddocs for llama cpp quickstart

* Update mddocs for llama3 llama-cpp and ollama quickstart

* Update mddocs for ollama quickstart

* Update mddocs for openwebui quickstart

* Update mddocs for privateGPT quickstart

* Update mddocs for vllm quickstart

* Small fix

* Update mddocs for text-generation-webui quickstart

* Update for video links
  • Loading branch information
Oscilloscope98 authored Jun 20, 2024
1 parent f0fdfa0 commit 8c9f877
Show file tree
Hide file tree
Showing 11 changed files with 607 additions and 824 deletions.
26 changes: 26 additions & 0 deletions docs/mddocs/Quickstart/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# IPEX-LLM Quickstart

> [!NOTE]
> We are adding more Quickstart guide.
This section includes efficient guide to show you how to:

- [`bigdl-llm` Migration Guide](./bigdl_llm_migration.md)
- [Install IPEX-LLM on Linux with Intel GPU](./install_linux_gpu.md)
- [Install IPEX-LLM on Windows with Intel GPU](./install_windows_gpu.md)
- [Install IPEX-LLM in Docker on Windows with Intel GPU](./docker_windows_gpu.md)
- [Run PyTorch Inference on Intel GPU using Docker (on Linux or WSL)](./docker_benchmark_quickstart.md)
- [Run Performance Benchmarking with IPEX-LLM](./benchmark_quickstart.md)
- [Run Local RAG using Langchain-Chatchat on Intel GPU](./chatchat_quickstart.md)
- [Run Text Generation WebUI on Intel GPU](./webui_quickstart.md)
- [Run Open WebUI on Intel GPU](./open_webui_with_ollama_quickstart.md)
- [Run PrivateGPT with IPEX-LLM on Intel GPU](./privateGPT_quickstart.md)
- [Run Coding Copilot (Continue) in VSCode with Intel GPU](./continue_quickstart.md)
- [Run Dify on Intel GPU](./dify_quickstart.md)
- [Run llama.cpp with IPEX-LLM on Intel GPU](./llama_cpp_quickstart.md)
- [Run Ollama with IPEX-LLM on Intel GPU](./ollama_quickstart.md)
- [Run Llama 3 on Intel GPU using llama.cpp and ollama with IPEX-LLM](./llama3_llamacpp_ollama_quickstart.md)
- [Run IPEX-LLM Serving with FastChat](./fastchat_quickstart.md)
- [Run IPEX-LLM Serving with vLLM on Intel GPU](./vLLM_quickstart.md)
- [Finetune LLM with Axolotl on Intel GPU](./axolotl_quickstart.md)
- [Run IPEX-LLM serving on Multiple Intel GPUs using DeepSpeed AutoTP and FastApi](./deepspeed_autotp_fastapi_quickstart.md)
33 changes: 0 additions & 33 deletions docs/mddocs/Quickstart/index.rst

This file was deleted.

131 changes: 63 additions & 68 deletions docs/mddocs/Quickstart/install_linux_gpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This guide demonstrates how to install IPEX-LLM on Linux with Intel GPUs. It applies to Intel Data Center GPU Flex Series and Max Series, as well as Intel Arc Series GPU.

IPEX-LLM currently supports the Ubuntu 20.04 operating system and later, and supports PyTorch 2.0 and PyTorch 2.1 on Linux. This page demonstrates IPEX-LLM with PyTorch 2.1. Check the [Installation](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#linux) page for more details.
IPEX-LLM currently supports the Ubuntu 20.04 operating system and later, and supports PyTorch 2.0 and PyTorch 2.1 on Linux. This page demonstrates IPEX-LLM with PyTorch 2.1. Check the [Installation](../Overview/install_gpu.md#linux) page for more details.

## Install Prerequisites

Expand Down Expand Up @@ -98,7 +98,7 @@ IPEX-LLM currently supports the Ubuntu 20.04 operating system and later, and sup
For Intel Core™ Ultra integrated GPU, please make sure level_zero version >= 1.3.28717. The level_zero version can be checked with `sycl-ls`, and verison will be tagged behind `[ext_oneapi_level_zero:gpu]`.

Here are the sample output of `sycl-ls`:
```
```bash
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:cpu:1] Intel(R) OpenCL, Intel(R) Core(TM) Ultra 5 125H OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) Graphics OpenCL 3.0 NEO [24.09.28717.12]
Expand All @@ -118,7 +118,7 @@ sudo dpkg -i *.deb
```

### Install oneAPI
```
```bash
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
Expand Down Expand Up @@ -163,43 +163,38 @@ Download and install the Miniforge as follows if you don't have conda installed
You can use `conda --version` to verify you conda installation.
After installation, create a new python environment `llm`:
```cmd
```bash
conda create -n llm python=3.11
```
Activate the newly created environment `llm`:
```cmd
```bash
conda activate llm
```
## Install `ipex-llm`
With the `llm` environment active, use `pip` to install `ipex-llm` for GPU.
Choose either US or CN website for `extra-index-url`:

```eval_rst
.. tabs::
.. tab:: US
.. code-block:: cmd
With the `llm` environment active, use `pip` to install `ipex-llm` for GPU. Choose either US or CN website for `extra-index-url`:
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
- For **US**:
.. tab:: CN
```bash
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
```
.. code-block:: cmd
- For **CN**:
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/
```
```bash
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/
```
```eval_rst
.. note::
> [!NOTE]
> If you encounter network issues while installing IPEX, refer to [this guide](../Overview/install_gpu.md#install-ipex-llm-from-wheel-1) for troubleshooting advice.
If you encounter network issues while installing IPEX, refer to `this guide <https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#id3>`_ for troubleshooting advice.
```
## Verify Installation
* You can verify if `ipex-llm` is successfully installed by simply importing a few classes from the library. For example, execute the following import command in the terminal:
- You can verify if `ipex-llm` is successfully installed by simply importing a few classes from the library. For example, execute the following import command in the terminal:
```bash
source /opt/intel/oneapi/setvars.sh
Expand All @@ -210,61 +205,59 @@ Choose either US or CN website for `extra-index-url`:
## Runtime Configurations
To use GPU acceleration on Linux, several environment variables are required or recommended before running a GPU example.
To use GPU acceleration on Linux, several environment variables are required or recommended before running a GPU example. Choose corresponding configurations based on your GPU device:
```eval_rst
.. tabs::
.. tab:: Intel Arc™ A-Series and Intel Data Center GPU Flex
- For **Intel Arc™ A-Series and Intel Data Center GPU Flex**:
For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series, we recommend:
For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series, we recommend:
.. code-block:: bash
# Configure oneAPI environment variables. Required step for APT or offline installed oneAPI.
# Skip this step for PIP-installed oneAPI since the environment has already been configured in LD_LIBRARY_PATH.
source /opt/intel/oneapi/setvars.sh
# Recommended Environment Variables for optimal performance
export USE_XETLA=OFF
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
export SYCL_CACHE_PERSISTENT=1
.. tab:: Intel Data Center GPU Max
```bash
# Configure oneAPI environment variables. Required step for APT or offline installed oneAPI.
# Skip this step for PIP-installed oneAPI since the environment has already been configured in LD_LIBRARY_PATH.
source /opt/intel/oneapi/setvars.sh
For Intel Data Center GPU Max Series, we recommend:
# Recommended Environment Variables for optimal performance
export USE_XETLA=OFF
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
export SYCL_CACHE_PERSISTENT=1
```
.. code-block:: bash
- For **Intel Data Center GPU Max**:
# Configure oneAPI environment variables. Required step for APT or offline installed oneAPI.
# Skip this step for PIP-installed oneAPI since the environment has already been configured in LD_LIBRARY_PATH.
source /opt/intel/oneapi/setvars.sh
For Intel Data Center GPU Max Series, we recommend:
# Recommended Environment Variables for optimal performance
export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
export SYCL_CACHE_PERSISTENT=1
export ENABLE_SDP_FUSION=1
```bash
# Configure oneAPI environment variables. Required step for APT or offline installed oneAPI.
# Skip this step for PIP-installed oneAPI since the environment has already been configured in LD_LIBRARY_PATH.
source /opt/intel/oneapi/setvars.sh
Please note that ``libtcmalloc.so`` can be installed by ``conda install -c conda-forge -y gperftools=2.10``
# Recommended Environment Variables for optimal performance
export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
export SYCL_CACHE_PERSISTENT=1
export ENABLE_SDP_FUSION=1
```
```
Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`
```eval_rst
.. seealso::
> [!NOTE]
> Please refer to [this guide](../Overview/install_gpu.md#runtime-configuration-1) for more details regarding runtime configuration.
Please refer to `this guide <../Overview/install_gpu.html#id5>`_ for more details regarding runtime configuration.
```
## A Quick Example
Now let's play with a real LLM. We'll be using the [phi-1.5](https://huggingface.co/microsoft/phi-1_5) model, a 1.3 billion parameter LLM for this demostration. Follow the steps below to setup and run the model, and observe how it responds to a prompt "What is AI?".
* Step 1: Activate the Python environment `llm` you previously created:
- Step 1: Activate the Python environment `llm` you previously created:
```bash
conda activate llm
```
* Step 2: Follow [Runtime Configurations Section](#runtime-configurations) above to prepare your runtime environment.
* Step 3: Create a new file named `demo.py` and insert the code snippet below.
- Step 2: Follow [Runtime Configurations Section](#runtime-configurations) above to prepare your runtime environment.
- Step 3: Create a new file named `demo.py` and insert the code snippet below.
```python
# Copy/Paste the contents to a new file demo.py
import torch
Expand All @@ -290,21 +283,23 @@ Now let's play with a real LLM. We'll be using the [phi-1.5](https://huggingface
output_str = tokenizer.decode(output[0], skip_special_tokens=True)
print(output_str)
```
> Note: when running LLMs on Intel iGPUs with limited memory size, we recommend setting `cpu_embedding=True` in the `from_pretrained` function.
> **Note**: When running LLMs on Intel iGPUs with limited memory size, we recommend setting `cpu_embedding=True` in the `from_pretrained` function.
> This will allow the memory-intensive embedding layer to utilize the CPU instead of GPU.
* Step 5. Run `demo.py` within the activated Python environment using the following command:
- Step 5. Run `demo.py` within the activated Python environment using the following command:
```bash
python demo.py
```
### Example output
Example output on a system equipped with an 11th Gen Intel Core i7 CPU and Iris Xe Graphics iGPU:
```
Question:What is AI?
Answer: AI stands for Artificial Intelligence, which is the simulation of human intelligence in machines.
```
### Example output
Example output on a system equipped with an 11th Gen Intel Core i7 CPU and Iris Xe Graphics iGPU:
```
Question:What is AI?
Answer: AI stands for Artificial Intelligence, which is the simulation of human intelligence in machines.
```
## Tips & Troubleshooting
Expand Down
Loading

0 comments on commit 8c9f877

Please sign in to comment.