Rules clarification to prevent sorting beyond datset boundaries in LLMs #309

nvashutoshd · 2025-02-04T10:21:04Z

This PR addresses and clarifies the rules for dataset sorting in LLM benchmarks, specifically focusing on the offline scenario. The changes aim to ensure fair and consistent performance evaluation across submissions.

Proposed changes for LLM benchmarks-

Dataset Sorting Limitation: Sorting is restricted to within the boundaries of a single dataset copy. This prevents the creation of batches with identical queries from multiple dataset copies, which could lead to artificial performance advantages.
Clarification on Runtime: Submitters are still allowed to run beyond the minimum duration time, and sorting of queries within a dataset copy is permitted.
Addressing Performance Bias: The new rule eliminates the potential unfair advantage gained from batching identical queries with identical sequence lengths across multiple dataset copies. The refined rules better reflect real-world scenarios, where identical queries are not typically grouped together

github-actions · 2025-02-04T10:21:17Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

nvashutoshd · 2025-02-04T10:21:37Z

cc @nv-ananjappa @nvzhihanj

mrmhodak · 2025-02-04T17:21:30Z

WG Notes:

Sorting outside of the dataset boundary prohibited (regardless of runtime).
LLMs instead of list of benchmarks

nvashutoshd · 2025-02-04T17:32:05Z

recheck

mrmhodak · 2025-02-04T21:57:15Z

@nvashutoshd @ashwin:
This seems to be impractical to implement if somebody wants to sort. Loadgen sends one query with all the data, there are no markers where dataset ends and begins. To sort, one would need to hardcode how long the dataset is (or similar), which is explicitly prohibited.

nvzhihanj · 2025-02-04T22:17:01Z

This seems to be impractical to implement if somebody wants to sort.

@mrmhodak the scenario you mentioned only happens when: 1) offline scenarios, where all queries are dumped to the SUT in one shot; 2) requested query count is larger than the dataset size (e.g. if you specify a larger min_duration or min_query_count).

I do agree it's impractical to implement sorting in such scenario, but sorting across boundary will enable exploitative actions like batching same queries which should not be allowed. In the future, LoadGen should be the one implementing and accepting sorting lambdas. @pgmpablo157321 FYI.

nvashutoshd · 2025-02-05T01:17:10Z

@mrmhodak @nv-ananjappa @nvzhihanj - I've updated the proposed rule change based on our discussion in the WG meeting this morning.

The rules now cover any/all LLMs, and limits sorting outside dataset boundary irrespective of runtime.

mrmhodak · 2025-02-11T07:07:58Z

I agree with the stated purpose, but it is problematic as a general rule. It currently affects Llama2-70b and MoE, where multiple copies of dataset are sent, but with no way for SUT to determine where dataset begins and ends, it effectively disallows sorting. This is confusing and can lead to the issues down the line.

It might be cleaner to just disallow it for these 2 models for 5.0 and revisit as a general rule for 5.1.

nvzhihanj · 2025-02-11T07:12:13Z

@mrmhodak I think whether to send multiple copies of data is independent of the benchmarks - remember you can increase the min_duration or min_query_count to request more copies of the dataset.
Essentially this rule is disallowing sorting when requesting multiple copies of dataset in offline mode. It impacts all network with non-uniform input and output distribution.

mrmhodak · 2025-02-11T07:19:15Z

@nvzhihanj: Good point, but we still have issue that anytime an SUT receives more that 1 copy of the dataset, there is no practical way to sort. That is my issue with the rule as currently - it seems to provide a way for sorting when more than a 1 copy of the dataset is received, but it does not exist.

This creates a confusing situation - we need rules that work within the capabilities of tools we have.

nvzhihanj · 2025-02-11T19:04:55Z

but we still have issue that anytime an SUT receives more that 1 copy of the dataset, there is no practical way to sort

@mrmhodak That is true. As I mentioned above, we shouldn't allow sorting under such scenario (offlien + more than 1 copy) until it's handled properly by loadgen. We can limit the application to the LLM workloads for this round and patch it in loadgen next round. What do you think?

Rules clarification to prevent sorting beyond datset boundaries in LLMs

daddb65

nvashutoshd requested a review from a team as a code owner February 4, 2025 10:21

Update rules to limit sorting in all LLM workloads and durations

88a21a1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rules clarification to prevent sorting beyond datset boundaries in LLMs #309

Rules clarification to prevent sorting beyond datset boundaries in LLMs #309

nvashutoshd commented Feb 4, 2025

github-actions bot commented Feb 4, 2025 •

edited

Loading

nvashutoshd commented Feb 4, 2025

mrmhodak commented Feb 4, 2025

nvashutoshd commented Feb 4, 2025

mrmhodak commented Feb 4, 2025

nvzhihanj commented Feb 4, 2025 •

edited

Loading

nvashutoshd commented Feb 5, 2025

mrmhodak commented Feb 11, 2025

nvzhihanj commented Feb 11, 2025

mrmhodak commented Feb 11, 2025

nvzhihanj commented Feb 11, 2025

Rules clarification to prevent sorting beyond datset boundaries in LLMs #309

Are you sure you want to change the base?

Rules clarification to prevent sorting beyond datset boundaries in LLMs #309

Conversation

nvashutoshd commented Feb 4, 2025

github-actions bot commented Feb 4, 2025 • edited Loading

nvashutoshd commented Feb 4, 2025

mrmhodak commented Feb 4, 2025

nvashutoshd commented Feb 4, 2025

mrmhodak commented Feb 4, 2025

nvzhihanj commented Feb 4, 2025 • edited Loading

nvashutoshd commented Feb 5, 2025

mrmhodak commented Feb 11, 2025

nvzhihanj commented Feb 11, 2025

mrmhodak commented Feb 11, 2025

nvzhihanj commented Feb 11, 2025

github-actions bot commented Feb 4, 2025 •

edited

Loading

nvzhihanj commented Feb 4, 2025 •

edited

Loading