-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NVIDIA] Added transformer engine support and GPU optimizations #1385
[NVIDIA] Added transformer engine support and GPU optimizations #1385
Conversation
terrykong
commented
Aug 26, 2023
- Added Transformer Engine + FP8 support
- Updated T5x and jax version=0.4.11
- A100 Perf gains!
- 80% speedup - T5-small
- 23% speedup - T5-large
- 18% speedup - T5-xl
- 40% speedup - T5-xxl
- H100 support, with gains over A100
- 2.08x faster - T5-large
- 2.24x faster - T5-xl
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Co-authored-by: Sahil Jain <[email protected]> Co-authored-by: Terry Kong <[email protected]> Co-authored-by: Yu-Hang Tang <[email protected]> Co-authored-by: Ming Huang <[email protected]> Co-authored-by: Frederic Bastien <[email protected]> Co-authored-by: Sharath Turuvekere Sreenivas <[email protected]> Co-authored-by: Xiaowei Ren <[email protected]> Co-authored-by: Ryan Jeng <[email protected]> Co-authored-by: Reese Wang <[email protected]>
configs use packing (CV/Multimodal)
Updated T5x-large MNLI and SQUAD baselines
80ae059
to
1fa57af
Compare
Hello, out of curiosity (while I understand it may not be tested), would this in theory be able to support training/fine-tuning for models built on top of t5x like Flan-UL2? I guess yes, as it is simply a t5x model with specific config? |
@jon-chuang Yes, I believe that's correct given my understanding of the followup architectures to T5: UL2/Flan-T5/Flan-UL2. As long as the core model is the same and only the objective/inputs&targets change, those finetunings should also benefit. |
Closing in favor of #1391 |