This repository contains Jupyter Notebook scripts for analyzing Microsoft's Philly trace, which described in "Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads" (ATC’19).
The official public data can be download from philly-traces.
We provide the elaborate scripts to analyze the trace, however, some results are conflict to Microsoft provided. You can refer to issue for more detailed information.
Description: Code for generating time sequence trace data.
Input: cluster_job_log
, revised_machine_list.csv
Output: timeseq.csv
Description: Analyze the correct machine list in Philly cluster through GPU utils file.
Input: cluster_gpu_util
Output: revised_machine_list.csv
Description: Plot the cluster machine distribution and time sequence cluster GPU utilization.
Input: revised_machine_list.csv
, timeseq.csv
Output: figures in imgs
folder