Name		Name	Last commit message	Last commit date
parent directory ..
Dockerfile.llm		Dockerfile.llm
README.md		README.md
prompt.json		prompt.json
run.sh		run.sh
run_gptj.py		run_gptj.py
transformers.patch		transformers.patch

README.md

LLM GPT-j (Large Language Model) based on Intel Extension For PyTorch

We provide the Dockerfile.llm used for building docker image with PyTorch, IPEX and Patched Transformers installed. The GPT-j model is using EleutherAI/gpt-j-6B.

File structure

Dockerfile.llm: The dockerfile used for dependencies installation.
run_gptj.json: The inference benchmark script enabled ipex optimization path is used for measuring the performane of GPT-j.
prompt.json: prompt.json for run_gptj.py
transformers.patch: The patch file for transformers v4.26.1. This patch file also enabled our IPEX optimization for transformers and GPT-j.

ENV

PyTorch: 20230518 nightly
IPEX-CPU: master branch, commit de88d93

Build Docker image

Option 1 (default): you could use docker build to build the docker image in your environment.

docker build ./ -f Dockerfile.llm -t llm_centos8:latest

Option 2: If you need to use proxy, please use the following command

docker build ./ --build-arg http_proxy=${http_proxy} --build-arg https_proxy=${http_proxy} -f Dockerfile.llm -t llm_centos8:latest

Run GPT-j script

Step 1 Docker run: We need to use docker run which helps us run our scirpt in docker image built previously.

# Docker run into docker image
docker run --privileged -v `pwd`:/root/workspace -it llm_centos8:latest

Step 2 Environment Config: Tcmalloc is a recommended malloc implementation that emphasizes fragmentation avoidance and scalable concurrency support.

# Activate conda env
source activate llm

# Env config
export KMP_BLOCKTIME=1
export KMP_AFFINITY="granularity=fine,compact,1,0"
# IOMP & TcMalloc
export LD_PRELOAD=/root/anaconda3/envs/llm/lib/libiomp5.so:/root/anaconda3/envs/llm/lib/libtcmalloc.so:${LD_PRELOAD}

Step 3 Run GPT-j script: At this step, we could activate our conda env and run GPT-j script with the following configuration: max-new-tokens=32 num_beams=4

export CORES=$(lscpu | grep "Core(s) per socket" | awk '{print $NF}')

# Run GPT-j workload
bash run.sh

# Or
OMP_NUM_THREADS=${CORES} numactl -N 0 -m 0 python run_gptj.py --ipex --jit --dtype bfloat16 --max-new-tokens 32

# Run GPT-j workload with TPP
OMP_NUM_THREADS=${CORES} numactl -N 0 -m 0 python run_gptj.py --use-tpp --jit --dtype bfloat16 --max-new-tokens 32

# Note: the numactl parameters above should be used for HBM-cache mode.
# For flat configuration in quad mode use '-N 0 -m 2' to use the HBM memory.
# IPEX 
OMP_NUM_THREADS=${CORES} numactl -N 0 -m 2 python run_gptj.py --ipex --jit --dtype bfloat16 --max-new-tokens 32
# TPP
OMP_NUM_THREADS=${CORES} numactl -N 0 -m 2 python run_gptj.py --use-tpp --jit --dtype bfloat16 --max-new-tokens 32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpt-j_with_ipex

gpt-j_with_ipex

README.md

LLM GPT-j (Large Language Model) based on Intel Extension For PyTorch

File structure

ENV

Build Docker image

Run GPT-j script

Files

gpt-j_with_ipex

Directory actions

More options

Directory actions

More options

Latest commit

History

gpt-j_with_ipex

Folders and files

parent directory

README.md

LLM GPT-j (Large Language Model) based on Intel Extension For PyTorch

File structure

ENV

Build Docker image

Run GPT-j script