Skip to content

v0.4.0: Full AMD support, Tech Report, Modal CI, Llama-3.2-Vision!

Compare
Choose a tag to compare
@ByronHsu ByronHsu released this 05 Nov 22:15
· 86 commits to main since this release
e985195

Highlights

  1. AMD GPU: We have partnered with Embedding LLM to adjust the Triton configuration to fully support AMD! With version 0.4.0, you can run multi-GPU training with 26% higher speed and 60% lower memory usage on AMD. See the full blogpost from https://embeddedllm.com/blog/cuda-to-rocm-portability-case-study-liger-kernel. @Edenzzzz @DocShotgun @tjtanaa

  2. Technical Report: We have published a technical report on arXiv (https://arxiv.org/pdf/2410.10989) with abundant details.

  3. Modal CI: We have moved our entire GPU CI stack to Modal! Thanks to intelligent Docker layer caching and blazingly fast container startup time and scheduling, we have reduced the CI overhead by over 10x (from minutes to seconds).

  4. LLaMA 3.2-Vision Model: We have added kernel support for the LLaMA 3.2-Vision model. You can easily use liger_kernel.transformers.apply_liger_kernel_to_mllama to patch the model. @tyler-romero @shivam15s

  5. JSD Kernel: We have added the JSD kernel for distillation, which also comes with a chunking version! @Tcc0403 @yundai424 @qingquansong

  6. HuggingFace Gradient Accumulation Fixes: We have fixed the notorious HuggingFace gradient accumulation issue (huggingface/transformers#34191) by carefully adjusting the cross entropy scalar. You can now safely use v0.4.0 with the latest HuggingFace gradient accumulation fixes (transformers>=4.46.2)!

What's Changed

New Contributors

Full Changelog: v0.3.1...v0.4.0