Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redirect duplicate save_load tutorials to the main save_load tutorial #3134

Closed
wants to merge 9 commits into from
Closed
20 changes: 8 additions & 12 deletions conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@
import pandocfilters
import pypandoc
import plotly.io as pio
from pathlib import Path
pio.renderers.default = 'sphinx_gallery'


Expand Down Expand Up @@ -140,22 +141,17 @@ def reset_seeds(gallery_conf, fname):
sphinx_gallery_conf['ignore_pattern'] = r'/(?!' + re.escape(os.getenv('GALLERY_PATTERN')) + r')[^/]+$'

for i in range(len(sphinx_gallery_conf['examples_dirs'])):
gallery_dir = sphinx_gallery_conf['gallery_dirs'][i]
source_dir = sphinx_gallery_conf['examples_dirs'][i]
# Create gallery dirs if it doesn't exist
try:
os.mkdir(gallery_dir)
except OSError:
pass
gallery_dir = Path(sphinx_gallery_conf["gallery_dirs"][i])
source_dir = Path(sphinx_gallery_conf["examples_dirs"][i])

# Copy rst files from source dir to gallery dir
for f in glob.glob(os.path.join(source_dir, '*.rst')):
distutils.file_util.copy_file(f, gallery_dir, update=True)

for f in source_dir.rglob("*.rst"):
f_dir = Path(f).parent
gallery_subdir_path = gallery_dir / f_dir.relative_to(source_dir)
gallery_subdir_path.mkdir(parents=True, exist_ok=True)
distutils.file_util.copy_file(f, gallery_subdir_path, update=True)

# Add any paths that contain templates here, relative to this directory.


templates_path = ['_templates']

# The suffix(es) of source filenames.
Expand Down
58 changes: 36 additions & 22 deletions intermediate_source/dist_tuto.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ simultaneously. If you have access to compute cluster you should check
with your local sysadmin or use your favorite coordination tool (e.g.,
`pdsh <https://linux.die.net/man/1/pdsh>`__,
`clustershell <https://cea-hpc.github.io/clustershell/>`__, or
`others <https://slurm.schedmd.com/>`__). For the purpose of this
`slurm <https://slurm.schedmd.com/>`__). For the purpose of this
tutorial, we will use a single machine and spawn multiple processes using
the following template.

Expand All @@ -64,11 +64,11 @@ the following template.


if __name__ == "__main__":
size = 2
world_size = 2
processes = []
mp.set_start_method("spawn")
for rank in range(size):
p = mp.Process(target=init_process, args=(rank, size, run))
for rank in range(world_size):
p = mp.Process(target=init_process, args=(rank, world_size, run))
p.start()
processes.append(p)

Expand Down Expand Up @@ -125,7 +125,7 @@ process 0 increments the tensor and sends it to process 1 so that they
both end up with 1.0. Notice that process 1 needs to allocate memory in
order to store the data it will receive.

Also notice that ``send``/``recv`` are **blocking**: both processes stop
Also notice that ``send/recv`` are **blocking**: both processes block
until the communication is completed. On the other hand immediates are
**non-blocking**; the script continues its execution and the methods
return a ``Work`` object upon which we can choose to
Expand Down Expand Up @@ -219,16 +219,23 @@ to obtain the sum of all tensors on all processes, we can use the
Since we want the sum of all tensors in the group, we use
``dist.ReduceOp.SUM`` as the reduce operator. Generally speaking, any
commutative mathematical operation can be used as an operator.
Out-of-the-box, PyTorch comes with 4 such operators, all working at the
Out-of-the-box, PyTorch comes with many such operators, all working at the
element-wise level:

- ``dist.ReduceOp.SUM``,
- ``dist.ReduceOp.PRODUCT``,
- ``dist.ReduceOp.MAX``,
- ``dist.ReduceOp.MIN``.
- ``dist.ReduceOp.MIN``,
- ``dist.ReduceOp.BAND``,
- ``dist.ReduceOp.BOR``,
- ``dist.ReduceOp.BXOR``,
- ``dist.ReduceOp.PREMUL_SUM``.

In addition to ``dist.all_reduce(tensor, op, group)``, there are a total
of 6 collectives currently implemented in PyTorch.
The full list of supported operators is
`here <https://pytorch.org/docs/stable/distributed.html#torch.distributed.ReduceOp>`__.

In addition to ``dist.all_reduce(tensor, op, group)``, there are many additional collectives currently implemented in
PyTorch. Here are a few supported collectives.

- ``dist.broadcast(tensor, src, group)``: Copies ``tensor`` from
``src`` to all other processes.
Expand All @@ -244,6 +251,12 @@ of 6 collectives currently implemented in PyTorch.
- ``dist.all_gather(tensor_list, tensor, group)``: Copies ``tensor``
from all processes to ``tensor_list``, on all processes.
- ``dist.barrier(group)``: Blocks all processes in `group` until each one has entered this function.
- ``dist.all_to_all(output_tensor_list, input_tensor_list, group)``: Scatters list of input tensors to all processes in
a group and return gathered list of tensors in output list.

The full list of supported collectives can be found by looking at the latest documentation for PyTorch Distributed
`(link) <https://pytorch.org/docs/stable/distributed.html>`__.


Distributed Training
--------------------
Expand Down Expand Up @@ -275,7 +288,7 @@ gradients of their model on their batch of data and then average their
gradients. In order to ensure similar convergence results when changing
the number of processes, we will first have to partition our dataset.
(You could also use
`tnt.dataset.SplitDataset <https://github.com/pytorch/tnt/blob/master/torchnet/dataset/splitdataset.py#L4>`__,
`torch.utils.data.random_split <https://pytorch.org/docs/stable/data.html#torch.utils.data.random_split>`__,
instead of the snippet below.)

.. code:: python
Expand Down Expand Up @@ -389,7 +402,7 @@ could train any model on a large computer cluster.
lot more tricks <https://seba-1511.github.io/dist_blog>`__ required to
implement a production-level implementation of synchronous SGD. Again,
use what `has been tested and
optimized <https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel>`__.
optimized <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel>`__.

Our Own Ring-Allreduce
~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -451,8 +464,9 @@ Communication Backends

One of the most elegant aspects of ``torch.distributed`` is its ability
to abstract and build on top of different backends. As mentioned before,
there are currently three backends implemented in PyTorch: Gloo, NCCL, and
MPI. They each have different specifications and tradeoffs, depending
there are multiple backends implemented in PyTorch.
Some of the most popular ones are Gloo, NCCL, and MPI.
They each have different specifications and tradeoffs, depending
on the desired use case. A comparative table of supported functions can
be found
`here <https://pytorch.org/docs/stable/distributed.html#module-torch.distributed>`__.
Expand Down Expand Up @@ -544,15 +558,15 @@ NCCL backend is included in the pre-built binaries with CUDA support.
Initialization Methods
~~~~~~~~~~~~~~~~~~~~~~

To finish this tutorial, let's talk about the very first function we
called: ``dist.init_process_group(backend, init_method)``. In
particular, we will go over the different initialization methods which
are responsible for the initial coordination step between each process.
Those methods allow you to define how this coordination is done.
Depending on your hardware setup, one of these methods should be
naturally more suitable than the others. In addition to the following
sections, you should also have a look at the `official
documentation <https://pytorch.org/docs/stable/distributed.html#initialization>`__.
To conclude this tutorial, let's examine the initial function we invoked:
``dist.init_process_group(backend, init_method)``. Specifically, we will discuss the various
initialization methods responsible for the preliminary coordination step between each process.
These methods enable you to define how this coordination is accomplished.

The choice of initialization method depends on your hardware setup, and one method may be more
suitable than others. In addition to the following sections, please refer to the `official
documentation <https://pytorch.org/docs/stable/distributed.html#initialization>`__ for further information.


**Environment Variable**

Expand Down
26 changes: 5 additions & 21 deletions recipes_source/recipes/README.txt
Original file line number Diff line number Diff line change
Expand Up @@ -25,38 +25,22 @@ PyTorch Recipes
Dynamic Quantization
https://pytorch.org/tutorials/recipes/recipes/dynamic_quantization.html

7. save_load_across_devices.py
Saving and loading models across devices in PyTorch
https://pytorch.org/tutorials/recipes/recipes/save_load_across_devices.html

8. saving_and_loading_a_general_checkpoint.py
Saving and loading a general checkpoint in PyTorch
https://pytorch.org/tutorials/recipes/recipes/saving_and_loading_a_general_checkpoint.html

9. saving_and_loading_models_for_inference.py
Saving and loading models for inference in PyTorch
https://pytorch.org/tutorials/recipes/recipes/saving_and_loading_models_for_inference.html

10. saving_multiple_models_in_one_file.py
Saving and loading multiple models in one file using PyTorch
https://pytorch.org/tutorials/recipes/recipes/saving_multiple_models_in_one_file.html

11. warmstarting_model_using_parameters_from_a_different_model.py
7. warmstarting_model_using_parameters_from_a_different_model.py
Warmstarting models using parameters from different model
https://pytorch.org/tutorials/recipes/recipes/warmstarting_model_using_parameters_from_a_different_model.html

12. zeroing_out_gradients.py
8. zeroing_out_gradients.py
Zeroing out gradients
https://pytorch.org/tutorials/recipes/recipes/zeroing_out_gradients.html

13. mobile_perf.py
9. mobile_perf.py
PyTorch Mobile Performance Recipes
https://pytorch.org/tutorials/recipes/mobile_perf.html

14. amp_recipe.py
10. amp_recipe.py
Automatic Mixed Precision
https://pytorch.org/tutorials/recipes/amp_recipe.html

15. regional_compilation.py
11. regional_compilation.py
Reducing torch.compile cold start compilation time with regional compilation
https://pytorch.org/tutorials/recipes/regional_compilation.html
181 changes: 0 additions & 181 deletions recipes_source/recipes/save_load_across_devices.py

This file was deleted.

Loading
Loading