Saving results from each process and aggregating distributed output #92

IlyasMoutawwakil · 2023-12-02T04:22:26Z

The way results are handled with distributed inference/training (torchrun launcher) is by only saving them in rank 0, however this might be wrong, misleading or ambiguous, especially when processes are not necessarily synchronized (dp/tp inference) and where one process/device might be affected by comms. A better approach would be sending the results from each process and merging them appropriately by the launcher (sum throughputs, avg latencies ?). this will remove the world_size logic from benchmarks.

The text was updated successfully, but these errors were encountered:

IlyasMoutawwakil · 2024-02-19T07:44:06Z

done in #122

IlyasMoutawwakil self-assigned this Dec 7, 2023

IlyasMoutawwakil added the enhancement New feature or request label Dec 7, 2023

IlyasMoutawwakil changed the title ~~Saving results from each process and an aggregated output~~ Saving results from each process and aggregating distributed output Jan 12, 2024

IlyasMoutawwakil closed this as completed Feb 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saving results from each process and aggregating distributed output #92

Saving results from each process and aggregating distributed output #92

IlyasMoutawwakil commented Dec 2, 2023

IlyasMoutawwakil commented Feb 19, 2024

Saving results from each process and aggregating distributed output #92

Saving results from each process and aggregating distributed output #92

Comments

IlyasMoutawwakil commented Dec 2, 2023

IlyasMoutawwakil commented Feb 19, 2024