You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The way results are handled with distributed inference/training (torchrun launcher) is by only saving them in rank 0, however this might be wrong, misleading or ambiguous, especially when processes are not necessarily synchronized (dp/tp inference) and where one process/device might be affected by comms. A better approach would be sending the results from each process and merging them appropriately by the launcher (sum throughputs, avg latencies ?). this will remove the world_size logic from benchmarks.
The text was updated successfully, but these errors were encountered:
IlyasMoutawwakil
changed the title
Saving results from each process and an aggregated output
Saving results from each process and aggregating distributed output
Jan 12, 2024
The way results are handled with distributed inference/training (
torchrun
launcher) is by only saving them in rank 0, however this might be wrong, misleading or ambiguous, especially when processes are not necessarily synchronized (dp/tp inference) and where one process/device might be affected by comms. A better approach would be sending the results from each process and merging them appropriately by the launcher (sum throughputs, avg latencies ?). this will remove theworld_size
logic from benchmarks.The text was updated successfully, but these errors were encountered: