chore(deps): update dependency accelerate to v0.34.2 #158
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
==0.30.0
->==0.34.2
Release Notes
huggingface/accelerate (accelerate)
v0.34.2
Compare Source
v0.34.1
: PatchfixCompare Source
Bug fixes
DataLoaders
could no longer be pickled in #3074 thanks to @byi8220default_transformers_cls_names_to_wrap
would separate_no_split_modules
by characters instead of keeping it as a list of layer names in #3075Full Changelog: huggingface/accelerate@v0.34.0...v0.34.1
v0.34.0
: : StatefulDataLoader Support, FP8 Improvements, and PyTorch Updates!Compare Source
Dependency Changes
safetensors
version 0.4.3.numpy
2.0.0Core
New Script Behavior Changes
accelerate
library will handle this automatically withaccelerator.end_training()
, or you can do it manually usingPartialState().destroy_process_group()
.transfer_to_npu
, ensuring better performance and compatibility.DataLoader Enhancements
StatefulDataLoader
fromtorchdata
, allowing better handling of data loading states. Enable by passinguse_stateful_dataloader=True
to theDataLoaderConfiguration
, and when callingload_state()
theDataLoader
will automatically be resumed from its last step, no more having to iterate through passed batches.prepare_data_loader()
function is now independent of theAccelerator
, giving you more flexibility towards which API levels you would like to use.DataLoader
states, ensuring smoother training sessions.set_epoch
function forMpDeviceLoaderWrapper
.FP8 Training Improvements
TransformerEngine
FP8 training, including better defaults for the quantized FP8 weights.TransformerEngine
integration works exactly as intended. These scripts run one half using 🤗 Accelerate's integration, the other with rawTransformersEngine
, providing users with a nice example of what we do under the hood with accelerate, and a good sanity check to make sure nothing breaks down over time. Find them hereTransformerEngine
andaccelerate
as well. Usedocker pull huggingface/accelerate@gpu-fp8-transformerengine
to quickly get an environment going.torchpippy
no more, long livetorch.distributed.pipelining
torchpippy
is now fully integrated into torch core, and as a result we are exclusively supporting the PyTorch implementation from now on[1, n, n]
rather than[2, n, n]
as before.pipelining
no longer supports encoder/decoder models, so thet5
example has been removed.torchpippy
potentially if needed.Fully Sharded Data Parallelism (FSDP)
FullyShardedDataParallelPlugin
yourself manually with no need for environment patching:accelerate launch
and need to ensure the env variables are setup properly for model loading:New Examples
axolotl
library, so very big kudos to their wonderful workBug Fixes
step
when loading the state by @muellerzr in https://github.com/huggingface/accelerate/pull/2992find_tied_params
for models with shared layers by @qubvel in https://github.com/huggingface/accelerate/pull/2986transformer_engine
on import by @oraluben in https://github.com/huggingface/accelerate/pull/3056skip_first_batches
support for StatefulDataloader and fix all the tests by @muellerzr in https://github.com/huggingface/accelerate/pull/3068New Contributors
Full Changelog:
step
when loading the state by @muellerzr in https://github.com/huggingface/accelerate/pull/2992find_tied_params
for models with shared layers by @qubvel in https://github.com/huggingface/accelerate/pull/2986end_training
by @SunMarc in https://github.com/huggingface/accelerate/pull/3012torchdata.stateful_dataloader.StatefulDataLoader
within theAccelerator
by @byi8220 in https://github.com/huggingface/accelerate/pull/2895prepare_data_loader()
from Accelerator by @siddk in https://github.com/huggingface/accelerate/pull/3047transformer_engine
on import by @oraluben in https://github.com/huggingface/accelerate/pull/3056skip_first_batches
support for StatefulDataloader and fix all the tests by @muellerzr in https://github.com/huggingface/accelerate/pull/3068Detailed Full Changelog:
v0.33.0
: : MUSA backend support and bugfixesCompare Source
MUSA backend support and bugfixes
Small release this month, with key focuses on some added support for backends and bugs:
torch.float8_e4m3fn
formatdtype_byte_size
by @SunMarc in https://github.com/huggingface/accelerate/pull/2945What's Changed
device_map="auto"
by @muellerzr in https://github.com/huggingface/accelerate/pull/2914multi_gpu
was being set and warning being printed even withnum_processes=1
by @HarikrishnanBalagopal in https://github.com/huggingface/accelerate/pull/2921pip
caching in CI by @SauravMaheshkar in https://github.com/huggingface/accelerate/pull/2952New Contributors
Full Changelog: huggingface/accelerate@v0.32.1...v0.33.0
v0.32.1
Compare Source
v0.32.0
: : Profilers, new hooks, speedups, and more!Compare Source
Core
huggingface_hub
rather than our own implementation (https://github.com/huggingface/accelerate/pull/2795)dispatch_model
(https://github.com/huggingface/accelerate/pull/2855)Accelerator.step
number is now restored when usingsave_state
andload_state
(https://github.com/huggingface/accelerate/pull/2765)import accelerate
and any other major core import by 68%, now should be only slightly longer than doingimport torch
(https://github.com/huggingface/accelerate/pull/2845)get_backend
and added aclear_device_cache
utility (https://github.com/huggingface/accelerate/pull/2857)Distributed Data Parallelism
allreduce
. (https://github.com/huggingface/accelerate/pull/2841)log_line_prefix_template
optional thenotebook_launcher
(https://github.com/huggingface/accelerate/pull/2888)FSDP
accelerate merge-weights
, one will be automatically created (https://github.com/huggingface/accelerate/pull/2854).safetensors
(https://github.com/huggingface/accelerate/pull/2853)XPU
torch>=2.4
(https://github.com/huggingface/accelerate/pull/2825)@require_triton
test decorator and enabletest_dynamo
work on xpu (https://github.com/huggingface/accelerate/pull/2878)load_state_dict
not working onxpu
and refine xpusafetensors
version check (https://github.com/huggingface/accelerate/pull/2879)XLA
Examples
accelerate launch
(https://github.com/huggingface/accelerate/pull/2902)Full Changelog
dispatch_model
by @panjd123 in https://github.com/huggingface/accelerate/pull/2855test_tracking.ClearMLTest
by @faaany in https://github.com/huggingface/accelerate/pull/2863torch_device
instead of0
for device check by @faaany in https://github.com/huggingface/accelerate/pull/2861test_zero3_integration
by @faaany in https://github.com/huggingface/accelerate/pull/2864log_line_prefix_template
Optional in Elastic Launcher for Backward Compatibility by @yhna940 in https://github.com/huggingface/accelerate/pull/2888require_triton
and enabletest_dynamo
work on xpu by @faaany in https://github.com/huggingface/accelerate/pull/2878load_state_dict
for xpu and refine xpu safetensor version check by @faaany in https://github.com/huggingface/accelerate/pull/2879New Contributors
Full Changelog: huggingface/accelerate@v0.31.0...v0.32.0
v0.31.0
: : Better support for sharded state dict with FSDP and BugfixesCompare Source
Core
timeout
default to PyTorch defaults based on backend by @muellerzr in https://github.com/huggingface/accelerate/pull/2758notebook_launcher
by @yhna940 in https://github.com/huggingface/accelerate/pull/2788FSDP
Megatron
What's Changed
logging
to log the actual user call site (instead of the call site inside the logger wrapper) of log functions by @luowyang in https://github.com/huggingface/accelerate/pull/2730notebook_launcher
by @yhna940 in https://github.com/huggingface/accelerate/pull/2788get_balanced_memory
by @faaany in https://github.com/huggingface/accelerate/pull/2826stage3_prefetch_bucket_size
value to an integer by @adk9 in https://github.com/huggingface/accelerate/pull/2814New Contributors
Full Changelog: huggingface/accelerate@v0.30.1...v0.31.0
v0.30.1
: : BugfixesCompare Source
Patchfix
Full Changelog: huggingface/accelerate@v0.30.0...v0.30.1
Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.