New Features

Grid-based Sharding For EBC

This is a form of CW sharding and then TWRW sharding the respective CW shards. One of the key changes is how the metadata from sharding placements is constructed in grid sharding. We leverage the concept of per_node from TWRW and combine it with the permutations and concatenation required in CW. Pull Request #2445

Re-shardable Hash Zch

Fully reshardable ZCH: we can handle any value which is in common with default value (768) so WS 1,2,4,8,16,24,32,48,64,96,128, etc. and go up and down. Pull Request #2538

TorchRec 2D Parallel

In this diff we introduce a new parallelism strategy for scaling recommendation model training called 2D parallel. In this case, we scale model parallel through data parallel, hence, the 2D name. Our new entry point, DMPCollection, subclasses DMP and is meant to be a drop in replacement to integrate 2D parallelism in distributed training. By setting the total number of GPUs to train across and the number of GPUs to locally shard across (aka one replication group), users can train their models in the same training loop but now over a larger number of GPUs. The current implementation shards the model such that, for a given shard, its replicated shards lie on the ranks within the node. This significantly improves the performance of the all-reduce communication (parameter sync) by utilizing intra-node bandwidth. Under this scheme the supported sharding types are RW, CW, and GRID. TWRW is not supported due to no longer being able to take advantage of the intra node bandwidth in the 2D scheme. Pull Request #2554

Changelog

torch.compile compatibility support: #2381, #2475, #2583

torch.export module support: #2388, #2390, #2393,

DTensor improvement: #2585, #2626

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.1.0

New Features

Grid-based Sharding For EBC

Re-shardable Hash Zch

TorchRec 2D Parallel

Changelog