Release v1.10: SDXL, Textual-Inversion, TRL, SynapseAI v1.14 · huggingface/optimum-habana

SynapseAI v1.14

The codebase is fully validated for the latest version of Habana SDK, SynapseAI v1.14.0.

SDXL is now supported and optimized for Gaudi.

An example of textual-inversion fine-tuning has been added.

The 🤗 TRL library is now supported on Gaudi for performing DPO and SFT.

Full bf16 evaluation inside the trainer can now be performed like in Transformers.

A text-generation pipeline fully optimized for Gaudi has been added.

Enhances llama performance by removing the 'cast_f32_to_bf16' operation #564 @kalyanjk
Refactoring LLama Attention and mlp layers #589 @bgoldberg-habana
Support for FlashAttention in Llama2 #584 @wszczurekhabana
Integrate Habana flash attention to Llama2-70B finetune #596 @mandy-li
Enabling T5ForConditionalGeneration Inference using static shapes #425 @bhargaveede
Avoid falcon perf drop from PR#607 when BS=1 @schoi-habana
Enable fused rmsnorm in bf16 for llama #621 @puneeshkhanna
Flash attention enhancement of repeatKV #626 @puneeshkhanna
Update repeat KV llama logic for better TP-4 performance #639 @puneeshkhanna
Falcon changes for v1.14.0 release #654 @schoi-habana

TGI on Gaudi has been moved to a dedicated repo: https://github.com/huggingface/tgi-gaudi

Update tokenizer for tgi #572 @hsubramony
Remove redundant requirements #575 @hsubramony
Change next_token_chooser to HeterogeneousNextTokenChooser for TGI #574 @yeonsily
Remove TGI folder from Optimum Habana #597 @regisss

Fix messed up README for llama2-70b #571 @mandy-li
Fix Diffusers tests #570 @ssarkar2
Fix fp8 command in text-generation README #586 @regisss
Fix wav2vec inference bug #588 @skaulintel
Fix hash_with_views error #587 @bgoldberg-habana
Add dataset disposal of b-mc2/sql-create-context for codegen and fix zero3 lora save issue #552 @sywangyi
Fix gptj training issue #594 @BaihuiJin
Fix DataLoaderDispatcher issue in Gaudi #600 @sywangyi
Fix for Falcon error from PR #587 #608 @schoi-habana
Falcon graph compilation error fix for when bs>1 #607 @regisss
Fix crash if gaudi_config is not passed to GaudiTrainer #613 @sywangyi
Fix flash attention output for llama for padded batched inputs #623 @puneeshkhanna
Fix backward error in DDP when running reward model finetune in RLHF #507 @sywangyi
Fix dpo graph compile error in evaluation #630 @sywangyi
Fix error in run_image_classification.py #631 @regisss
Fix RLHF llama rewarding modeling backward issue #612 @sywangyi
Fix SD example so that custom bf16 ops can be used #642 @regisss
Fix SD2 test #647 @regisss
Fix typo in README #656 @yeonsily
Fix error in PR#654 #661 @schoi-habana
Fix compile error for torch_cmpile for llama #662 @jiminha
Fix SDXL test #666 @regisss

Remove red crosses in model table #577 @regisss
Misc changes for transformers tests #581 @ankurneog
Remove delete_doc_comment workflows #582 @regisss
Pin PEFT for the languge-modeling example #591 @regisss
Remove workarounds to have causal_mask in uint8 for GPT2, GPT-J and CodeGen #592 @regisss
Change Synapse validated version in README #603 @regisss
Dyn prompt afterrefactor #543 @ssarkar2
In peft, only the trainable parameters need to be saved #576 @sywangyi
Add inheritance in Diffusers pipelines #611 @regisss
Update generation config to enable flash attention for inference #609 @puneeshkhanna
Remove setting of PT_HPU_LAZY_MODE=2 in training_args.py #625 @vivekgoe
Remove hpu:X notation untill fully supported by bridge #637 @hsubramony
Add use_flash_attention to Llama2-70B finetuning command in README #640 @mandy-li
Enable master_port selecting for DeepSpeed and MPI #641 @yangulei
Enabling Graphs in Wav2Vec AC training #622 @bhargaveede
Add changes to support FSDP #598 @vivekgoe
Run Llama2 with torch.compile on Gaudi2 #616 @kausikmaiti
Hqt #648 @bgoldberg-habana