Update part of Quickstart guide in mddocs (1/2)

* Quickstart index.rst -> index.md * Update for Linux Install Quickstart * Update md docs for Windows Install QuickStart * Small fix * Add blank lines * Update mddocs for llama cpp quickstart * Update mddocs for llama3 llama-cpp and ollama quickstart * Update mddocs for ollama quickstart * Update mddocs for openwebui quickstart * Update mddocs for privateGPT quickstart * Update mddocs for vllm quickstart * Small fix * Update mddocs for text-generation-webui quickstart * Update for video links
intel-analytics · Jun 20, 2024 · 8c9f877 · 8c9f877
1 parent f0fdfa0
commit 8c9f877
Show file tree

Hide file tree

Showing 11 changed files with 607 additions and 824 deletions.
diff --git a/docs/mddocs/Quickstart/index.md b/docs/mddocs/Quickstart/index.md
@@ -0,0 +1,26 @@
+# IPEX-LLM Quickstart
+
+> [!NOTE]
+> We are adding more Quickstart guide.
+
+This section includes efficient guide to show you how to:
+
+- [`bigdl-llm` Migration Guide](./bigdl_llm_migration.md)
+- [Install IPEX-LLM on Linux with Intel GPU](./install_linux_gpu.md)
+- [Install IPEX-LLM on Windows with Intel GPU](./install_windows_gpu.md)
+- [Install IPEX-LLM in Docker on Windows with Intel GPU](./docker_windows_gpu.md)
+- [Run PyTorch Inference on Intel GPU using Docker (on Linux or WSL)](./docker_benchmark_quickstart.md)
+- [Run Performance Benchmarking with IPEX-LLM](./benchmark_quickstart.md)
+- [Run Local RAG using Langchain-Chatchat on Intel GPU](./chatchat_quickstart.md)
+- [Run Text Generation WebUI on Intel GPU](./webui_quickstart.md)
+- [Run Open WebUI on Intel GPU](./open_webui_with_ollama_quickstart.md)
+- [Run PrivateGPT with IPEX-LLM on Intel GPU](./privateGPT_quickstart.md)
+- [Run Coding Copilot (Continue) in VSCode with Intel GPU](./continue_quickstart.md)
+- [Run Dify on Intel GPU](./dify_quickstart.md)
+- [Run llama.cpp with IPEX-LLM on Intel GPU](./llama_cpp_quickstart.md)
+- [Run Ollama with IPEX-LLM on Intel GPU](./ollama_quickstart.md)
+- [Run Llama 3 on Intel GPU using llama.cpp and ollama with IPEX-LLM](./llama3_llamacpp_ollama_quickstart.md)
+- [Run IPEX-LLM Serving with FastChat](./fastchat_quickstart.md)
+- [Run IPEX-LLM Serving with vLLM on Intel GPU](./vLLM_quickstart.md)
+- [Finetune LLM with Axolotl on Intel GPU](./axolotl_quickstart.md)
+- [Run IPEX-LLM serving on Multiple Intel GPUs using DeepSpeed AutoTP and FastApi](./deepspeed_autotp_fastapi_quickstart.md)
diff --git a/docs/mddocs/Quickstart/index.rst b/docs/mddocs/Quickstart/index.rst
diff --git a/docs/mddocs/Quickstart/install_linux_gpu.md b/docs/mddocs/Quickstart/install_linux_gpu.md
@@ -2,7 +2,7 @@
 
 This guide demonstrates how to install IPEX-LLM on Linux with Intel GPUs. It applies to Intel Data Center GPU Flex Series and Max Series, as well as Intel Arc Series GPU.
 
-IPEX-LLM currently supports the Ubuntu 20.04 operating system and later, and supports PyTorch 2.0 and PyTorch 2.1 on Linux. This page demonstrates IPEX-LLM with PyTorch 2.1. Check the [Installation](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#linux) page for more details.
+IPEX-LLM currently supports the Ubuntu 20.04 operating system and later, and supports PyTorch 2.0 and PyTorch 2.1 on Linux. This page demonstrates IPEX-LLM with PyTorch 2.1. Check the [Installation](../Overview/install_gpu.md#linux) page for more details.
 
 ## Install Prerequisites
 
@@ -98,7 +98,7 @@ IPEX-LLM currently supports the Ubuntu 20.04 operating system and later, and sup
 For Intel Core™ Ultra integrated GPU, please make sure level_zero version >= 1.3.28717. The level_zero version can be checked with `sycl-ls`, and verison will be tagged behind `[ext_oneapi_level_zero:gpu]`.
 
 Here are the sample output of `sycl-ls`:
-```
+```bash
 [opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.12.0.12_195853.xmain-hotfix]
 [opencl:cpu:1] Intel(R) OpenCL, Intel(R) Core(TM) Ultra 5 125H OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
 [opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) Graphics OpenCL 3.0 NEO  [24.09.28717.12]
@@ -118,7 +118,7 @@ sudo dpkg -i *.deb
 ```
 
 ### Install oneAPI 
-  ```
+  ```bash
   wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
 
   echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
@@ -163,43 +163,38 @@ Download and install the Miniforge as follows if you don't have conda installed
 You can use `conda --version` to verify you conda installation.
 
 After installation, create a new python environment `llm`:
-```cmd
+```bash
 conda create -n llm python=3.11
 ```
 Activate the newly created environment `llm`:
-```cmd
+```bash
 conda activate llm
 ```
 
 
 ## Install `ipex-llm`
 
-With the `llm` environment active, use `pip` to install `ipex-llm` for GPU.
-Choose either US or CN website for `extra-index-url`:
-
-```eval_rst
-.. tabs::
-   .. tab:: US
-
-      .. code-block:: cmd
+With the `llm` environment active, use `pip` to install `ipex-llm` for GPU. Choose either US or CN website for `extra-index-url`:
 
-         pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+- For **US**:
 
-   .. tab:: CN
+  ```bash
+  pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
+  ```
 
-      .. code-block:: cmd
+- For **CN**:
 
-         pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/
-```
+  ```bash
+  pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/
+  ```
 
-```eval_rst
-.. note::
+> [!NOTE]
+> If you encounter network issues while installing IPEX, refer to [this guide](../Overview/install_gpu.md#install-ipex-llm-from-wheel-1) for troubleshooting advice.
 
-  If you encounter network issues while installing IPEX, refer to `this guide <https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#id3>`_ for troubleshooting advice.
-```
 
 ## Verify Installation
-* You can verify if `ipex-llm` is successfully installed by simply importing a few classes from the library. For example, execute the following import command in the terminal:
+- You can verify if `ipex-llm` is successfully installed by simply importing a few classes from the library. For example, execute the following import command in the terminal:
+
   ```bash
   source /opt/intel/oneapi/setvars.sh
 
@@ -210,61 +205,59 @@ Choose either US or CN website for `extra-index-url`:
 
 ## Runtime Configurations
 
-To use GPU acceleration on Linux, several environment variables are required or recommended before running a GPU example.
+To use GPU acceleration on Linux, several environment variables are required or recommended before running a GPU example. Choose corresponding configurations based on your GPU device:
 
-```eval_rst
-.. tabs::
-   .. tab:: Intel Arc™ A-Series and Intel Data Center GPU Flex
+- For **Intel Arc™ A-Series and Intel Data Center GPU Flex**:
 
-      For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series, we recommend:
+  For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series, we recommend:
 
-      .. code-block:: bash
-
-         # Configure oneAPI environment variables. Required step for APT or offline installed oneAPI.
-         # Skip this step for PIP-installed oneAPI since the environment has already been configured in LD_LIBRARY_PATH.
-         source /opt/intel/oneapi/setvars.sh
-
-         # Recommended Environment Variables for optimal performance
-         export USE_XETLA=OFF
-         export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
-         export SYCL_CACHE_PERSISTENT=1
-
-   .. tab:: Intel Data Center GPU Max
+  ```bash
+  # Configure oneAPI environment variables. Required step for APT or offline installed oneAPI.
+  # Skip this step for PIP-installed oneAPI since the environment has already been configured in LD_LIBRARY_PATH.
+  source /opt/intel/oneapi/setvars.sh
 
-      For Intel Data Center GPU Max Series, we recommend:
+  # Recommended Environment Variables for optimal performance
+  export USE_XETLA=OFF
+  export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+  export SYCL_CACHE_PERSISTENT=1
+  ```
 
-      .. code-block:: bash
+- For **Intel Data Center GPU Max**:
 
-         # Configure oneAPI environment variables. Required step for APT or offline installed oneAPI.
-         # Skip this step for PIP-installed oneAPI since the environment has already been configured in LD_LIBRARY_PATH.
-         source /opt/intel/oneapi/setvars.sh
+  For Intel Data Center GPU Max Series, we recommend:
 
-         # Recommended Environment Variables for optimal performance
-         export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
-         export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
-         export SYCL_CACHE_PERSISTENT=1
-         export ENABLE_SDP_FUSION=1
+  ```bash
+  # Configure oneAPI environment variables. Required step for APT or offline installed oneAPI.
+  # Skip this step for PIP-installed oneAPI since the environment has already been configured in LD_LIBRARY_PATH.
+  source /opt/intel/oneapi/setvars.sh
 
-      Please note that ``libtcmalloc.so`` can be installed by ``conda install -c conda-forge -y gperftools=2.10``
+  # Recommended Environment Variables for optimal performance
+  export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
+  export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
+  export SYCL_CACHE_PERSISTENT=1
+  export ENABLE_SDP_FUSION=1
+  ```
 
-```
+  Please note that `libtcmalloc.so` can be installed by `conda install -c conda-forge -y gperftools=2.10`
 
-  ```eval_rst
-  .. seealso::
+> [!NOTE]
+> Please refer to [this guide](../Overview/install_gpu.md#runtime-configuration-1) for more details regarding runtime configuration.
 
-     Please refer to `this guide <../Overview/install_gpu.html#id5>`_ for more details regarding runtime configuration.
-  ```
 
 ## A Quick Example
 
 Now let's play with a real LLM. We'll be using the [phi-1.5](https://huggingface.co/microsoft/phi-1_5) model, a 1.3 billion parameter LLM for this demostration. Follow the steps below to setup and run the model, and observe how it responds to a prompt "What is AI?". 
 
-* Step 1: Activate the Python environment `llm` you previously created: 
+- Step 1: Activate the Python environment `llm` you previously created:
+
    ```bash
    conda activate llm
    ```
-* Step 2: Follow [Runtime Configurations Section](#runtime-configurations) above to prepare your runtime environment.  
-* Step 3: Create a new file named `demo.py` and insert the code snippet below.
+
+- Step 2: Follow [Runtime Configurations Section](#runtime-configurations) above to prepare your runtime environment.
+
+- Step 3: Create a new file named `demo.py` and insert the code snippet below.
+
    ```python
    # Copy/Paste the contents to a new file demo.py
    import torch
@@ -290,21 +283,23 @@ Now let's play with a real LLM. We'll be using the [phi-1.5](https://huggingface
        output_str = tokenizer.decode(output[0], skip_special_tokens=True)
        print(output_str)
    ```
-   > Note: when running LLMs on Intel iGPUs with limited memory size, we recommend setting `cpu_embedding=True` in the `from_pretrained` function.
+
+   > **Note**: When running LLMs on Intel iGPUs with limited memory size, we recommend setting `cpu_embedding=True` in the `from_pretrained` function.
    > This will allow the memory-intensive embedding layer to utilize the CPU instead of GPU.
 
-* Step 5. Run `demo.py` within the activated Python environment using the following command:
+- Step 5. Run `demo.py` within the activated Python environment using the following command:
+
   ```bash
   python demo.py
   ```
    
-   ### Example output
-  
-   Example output on a system equipped with an 11th Gen Intel Core i7 CPU and Iris Xe Graphics iGPU:
-   ```
-   Question:What is AI?
-   Answer: AI stands for Artificial Intelligence, which is the simulation of human intelligence in machines.
-   ```
+### Example output
+
+Example output on a system equipped with an 11th Gen Intel Core i7 CPU and Iris Xe Graphics iGPU:
+```
+Question:What is AI?
+Answer: AI stands for Artificial Intelligence, which is the simulation of human intelligence in machines.
+```
 
 ## Tips & Troubleshooting