-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automation and reproducibility for MLPerf Inference v3.1 #1052
Comments
We got a feedback from submitters to create a GUI that can generate CM commands. I opened a ticket: #1070 |
[20240130] We had lots of great feedback and improved both generic CM automation recipes and CM workflows for MLPerf inference:
|
We started discussing a proposal for MLPerf reproducibility badges similar to ACM/IEEE/NeurIPS conferences: #1080 - feedback is welcome! |
Following the feedback from the MLPerf submitters, we have developed a prototype of a GUI to generate a command line to run MLPerf inference benchmarks for all main implementations (reference, Intel, Nvidia, Qualcomm, MIL and DeepSparse) and automate submissions. You can check it here. The long-term goal is to aggregate and encode all MLPerf submission rules and notes for all models, categories and divisions in this GUI. We have also developed a prototype of a reproducibility infrastructure to keep track of successful MLPerf inference benchmark configurations across different MLPerf versions, hardware, implementations, models and backends based on the ACM/IEEE/cTuning reproducibility methodology and badging. You can see the last results here - we will continue adding more tests based on your suggestions including GPT-J, LLAMA2 and Stable Diffusion. Our goal is to test as many v4.0 submissions as possible and add them to the above GUI to make it easier for the community to rerun experiments after the publication date. If some configurations are not working, we plan to help submitters fix issues. |
We improved CM automation for Intel, Nvidia and Qualcomm and added to GUI: https://access.cknowledge.org/playground/?action=howtorun&bench_uid=39877bb63fb54725 . We can re-run most of them now. |
We now have relatively stable common CM interface to rerun above submissions and reproduce key results. I close this ticket - we will open a similar ticket for inference v4.0 after publication. Huge thanks to colleagues from Intel, Qualcomm and Nvidia for their help and suggestions! |
The MLCommons taskforce on automation and reproducibility is helping the community, vendors and submitters check if it is possible to re-run MLPerf inference v3.1 submissions, fix encountered issues and add their implementations to the MLCommons CM automation to run all MLPerf benchmark implementations in a unified way.
Note that CM is a collaborative project to run all MLPerf inference benchmarks on any platform with any software/hardware stack using a unified interface. The MLCommons CM interface and automations are being developed based on the feedback from MLPerf users and submitters - if you encounter some issues or have suggestions and feature requests, please report them either via GitHub issues, via our Discord channel or by providing a patch to CM automations. Thank you and looking forward to collaborating with you!
Intel,Nvidia,AMD,ARM
Intel,ARM,AMD
✅ - via CM
✅ - via CM
1- original docker container fails because of incompatibility with the latest PIP packages: see GitHub issue. We collaborate with Intel to integrate their patch with the CM automation and re-run their submissions - it's mostly done.
2❌- not possible to rerun and reproduce performance numbers due to missing configuration files: see GitHub issue. After discussing this issue with submitters, we helped them generate missing configuration files using MLCommons CM automation for QAIC and match QAIC performance numbers from v3.1 submission. It should be possible to use CM for QAIC MLPerf v4.0 inference submissions.
MLCommons CM interface
You should be able to run MLPerf inference benchmarks via unified CM interface and portable workflow that can run natively or inside automatically generated Docker container:
pip install cmind cm pull repo mlcommons@ck cmr "run common mlperf inference" --implementation=nvidia --model=bert-99
Prepare official submission for Edge category:
cmr "run common mlperf inference _submission _full" --implementation=nvidia --model=bert-99
Prepare official submission for DataCenter category:
cmr "run common mlperf inference _submission _full" --implementation=nvidia \ --model=bert-99 --category=datacenter --division=closed
The text was updated successfully, but these errors were encountered: