Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChatGLM3-6B LoRA Fine-tuning Demo #11450

Merged
merged 6 commits into from
Jul 1, 2024

Conversation

Uxito-Ada
Copy link
Contributor

@Uxito-Ada Uxito-Ada commented Jun 27, 2024

Description

Port ChatGLM repo's finetune demo to ipex-llm, running on Arc XPU.

  • a single Arc card
  • two Arc cards (needs deepspeed)

1. Why the change?

as above

2. User API changes

will replace the old chatglm lora finetuning.

3. Summary of the change

ChatGLM3-6B LoRA Fine-tuning Demo

4. How to test?

  • N/A
  • Unit test
  • Application test
  • Document test
  • ...

5. New dependencies

pip install "jieba>=0.42.1"
pip install "ruamel_yaml>=0.18.6"
pip install "rouge_chinese>=1.0.3"
pip install "jupyter>=1.0.0"
pip install "typer"
pip install "nltk"

@Uxito-Ada Uxito-Ada requested review from qiyuangong and glorysdj June 27, 2024 07:30
@Uxito-Ada
Copy link
Contributor Author

The README.md is WIP, which will append sections of 2-cards finetuning and how to serve with the outputs.

pip install "numpy<2.0.0"
# below command will install intel_extension_for_pytorch==2.1.10+xpu as default
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
pip install oneccl_bind_pt==2.1.100 --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single ARC doesn't need oneccl

Copy link
Contributor Author

@Uxito-Ada Uxito-Ada Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary, as XPU accelerator needs CCL. Without CCL, accelerator will switch to CUDA, and trainer will schedule model to CPU rather than XPU.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Contributor

@qiyuangong qiyuangong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other, LGTM

# See the License for the specific language governing permissions and
# limitations under the License.
#
# This is ported from https://github.com/THUDM/ChatGLM3/blob/main/finetune_demo/lora_finetune.ipynb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please highlight in the code which lines need to be changed for running on Arc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, as well as lora_finetune_chatglm.py.

Comment on lines 414 to 426
from ipex_llm.transformers import AutoModelForCausalLM
from ipex_llm.transformers.qlora import get_peft_model
import os
os.environ["ACCELERATE_USE_XPU"] = "true"
model = AutoModelForCausalLM.from_pretrained(
model_dir,
trust_remote_code=True,
load_in_low_bit="bf16",
optimize_model=False,
empty_init=False,
use_cache=False,
torch_dtype=torch.bfloat16
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Make it clear that L414-426 have been modified (e.g., add begin and end of change comments)
  2. do we need to use load_in_low_bit="bf16"?
  3. Is it possible to use llm_patch to minimize the changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turned to use llm_patch.

)
if peft_config.peft_type.name == "LORA":
# Add below L417 to enable accelerator to schedule model to Intel Arc XPU
os.environ["ACCELERATE_USE_XPU"] = "true"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we add this to llm_patch? @qiyuangong @plusbang ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can use llm_patch to minimize changes, then yes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, maybe we could add this to llm_patch just in this PR.

python process_advertise_gen_dataset.py
```

Then, './AdvertiseGen' will be converted to './AdvertiseGen_fix'. Now, we have prepared the dataset, and are going to start LoRA fine-tuning on ChatGLM3-6B.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

' changes to ` will high this work

@Uxito-Ada Uxito-Ada merged commit 07362ff into intel-analytics:main Jul 1, 2024
26 checks passed
@Uxito-Ada Uxito-Ada deleted the heyang_24_6_27 branch July 1, 2024 01:18
RyuKosei pushed a commit to RyuKosei/ipex-llm that referenced this pull request Jul 19, 2024
* ChatGLM3-6B LoRA Fine-tuning Demo

* refine

* refine

* add 2-card deepspeed

* refine format

* add mpi4py and deepspeed install
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants