v1.10: SDXL, Textual-Inversion, TRL, SynapseAI v1.14
SynapseAI v1.14
The codebase is fully validated for the latest version of Habana SDK, SynapseAI v1.14.0.
Stable Diffusion XL
SDXL is now supported and optimized for Gaudi.
Textual inversion fine-tuning
An example of textual-inversion fine-tuning has been added.
TRL
The 🤗 TRL library is now supported on Gaudi for performing DPO and SFT.
- Add DPO and SFT of TRL support in Gaudi and example #601
- Restructure example/trl/stack_llama_2 for generic DPO #635 @libinta
- Add DPO of TRL in README.md #652 @libinta
- Add seed in DPO for reproduce the training result #646 @sywangyi
Full bf16 evaluation
Full bf16 evaluation inside the trainer can now be performed like in Transformers.
- Adding support for bf16_full_eval #610 @bhargaveede
Text-generation pipeline
A text-generation pipeline fully optimized for Gaudi has been added.
- Text-Generation Pipeline Example #526 @sjagtap1803
Model optimizations
- Enhances llama performance by removing the 'cast_f32_to_bf16' operation #564 @kalyanjk
- Refactoring LLama Attention and mlp layers #589 @bgoldberg-habana
- Support for FlashAttention in Llama2 #584 @wszczurekhabana
- Integrate Habana flash attention to Llama2-70B finetune #596 @mandy-li
- Enabling T5ForConditionalGeneration Inference using static shapes #425 @bhargaveede
- Avoid falcon perf drop from PR#607 when BS=1 @schoi-habana
- Enable fused rmsnorm in bf16 for llama #621 @puneeshkhanna
- Flash attention enhancement of repeatKV #626 @puneeshkhanna
- Update repeat KV llama logic for better TP-4 performance #639 @puneeshkhanna
- Falcon changes for v1.14.0 release #654 @schoi-habana
TGI
TGI on Gaudi has been moved to a dedicated repo: https://github.com/huggingface/tgi-gaudi
- Update tokenizer for tgi #572 @hsubramony
- Remove redundant requirements #575 @hsubramony
- Change next_token_chooser to HeterogeneousNextTokenChooser for TGI #574 @yeonsily
- Remove TGI folder from Optimum Habana #597 @regisss
Various fixes
- Fix messed up README for llama2-70b #571 @mandy-li
- Fix Diffusers tests #570 @ssarkar2
- Fix fp8 command in text-generation README #586 @regisss
- Fix wav2vec inference bug #588 @skaulintel
- Fix hash_with_views error #587 @bgoldberg-habana
- Add dataset disposal of b-mc2/sql-create-context for codegen and fix zero3 lora save issue #552 @sywangyi
- Fix gptj training issue #594 @BaihuiJin
- Fix DataLoaderDispatcher issue in Gaudi #600 @sywangyi
- Fix for Falcon error from PR #587 #608 @schoi-habana
- Falcon graph compilation error fix for when bs>1 #607 @regisss
- Fix crash if gaudi_config is not passed to GaudiTrainer #613 @sywangyi
- Fix flash attention output for llama for padded batched inputs #623 @puneeshkhanna
- Fix backward error in DDP when running reward model finetune in RLHF #507 @sywangyi
- Fix dpo graph compile error in evaluation #630 @sywangyi
- Fix error in run_image_classification.py #631 @regisss
- Fix RLHF llama rewarding modeling backward issue #612 @sywangyi
- Fix SD example so that custom bf16 ops can be used #642 @regisss
- Fix SD2 test #647 @regisss
- Fix typo in README #656 @yeonsily
- Fix error in PR#654 #661 @schoi-habana
- Fix compile error for torch_cmpile for llama #662 @jiminha
- Fix SDXL test #666 @regisss
Others
- Remove red crosses in model table #577 @regisss
- Misc changes for transformers tests #581 @ankurneog
- Remove delete_doc_comment workflows #582 @regisss
- Pin PEFT for the languge-modeling example #591 @regisss
- Remove workarounds to have causal_mask in uint8 for GPT2, GPT-J and CodeGen #592 @regisss
- Change Synapse validated version in README #603 @regisss
- Dyn prompt afterrefactor #543 @ssarkar2
- In peft, only the trainable parameters need to be saved #576 @sywangyi
- Add inheritance in Diffusers pipelines #611 @regisss
- Update generation config to enable flash attention for inference #609 @puneeshkhanna
- Remove setting of PT_HPU_LAZY_MODE=2 in training_args.py #625 @vivekgoe
- Remove hpu:X notation untill fully supported by bridge #637 @hsubramony
- Add use_flash_attention to Llama2-70B finetuning command in README #640 @mandy-li
- Enable master_port selecting for DeepSpeed and MPI #641 @yangulei
- Enabling Graphs in Wav2Vec AC training #622 @bhargaveede
- Add changes to support FSDP #598 @vivekgoe
- Run Llama2 with torch.compile on Gaudi2 #616 @kausikmaiti
- Hqt #648 @bgoldberg-habana