Benchmark

Introduction
Supported Matrix
Usage

Introduction

Intel Neural Compressor provides a command incbench to launch the Intel CPU performance benchmark.

To get the peak performance on Intel Xeon CPU, we should avoid crossing NUMA node in one instance. Therefore, by default, incbench will trigger 1 instance on the first NUMA node.

Supported Matrix

Platform	Status
Linux	✔
Windows	✔

Usage

Parameters	Default	comments
num_instances	1	Number of instances
num_cores_per_instance	None	Number of cores in each instance
C, cores	0-${num_cores_on_NUMA-1}	decides the visible core range
cross_memory	False	whether to allocate memory cross NUMA

Note: cross_memory is set to True only when memory is insufficient.

General Use Cases

incbench main.py: run 1 instance on NUMA:0.
incbench --num_i 2 main.py: run 2 instances on NUMA:0.
incbench --num_c 2 main.py: run multi-instances with 2 cores per instance on NUMA:0.
incbench -C 24-47 main.py: run 1 instance on COREs:24-47.
incbench -C 24-47 --num_c 4 main.py: run multi-instances with 4 COREs per instance on COREs:24-47.

Note: > - num_i works the same as num_instances > - num_c works the same as num_cores_per_instance

Dump Throughput and Latency Summary

To merge benchmark results from multi-instances, "incbench" automatically checks log file messages for "throughput" and "latency" information matching the following patterns.

throughput_pattern = r"[T,t]hroughput:\s*([0-9]*\.?[0-9]+)\s*([a-zA-Z/]*)"
latency_pattern = r"[L,l]atency:\s*([0-9]*\.?[0-9]+)\s*([a-zA-Z/]*)"

Demo usage

print("Throughput: {:.3f} samples/sec".format(throughput))
print("Latency: {:.3f} ms".format(latency * 10**3))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark.md

benchmark.md

Benchmark

Introduction

Supported Matrix

Usage

General Use Cases

Dump Throughput and Latency Summary

Demo usage

Files

benchmark.md

Latest commit

History

benchmark.md

File metadata and controls

Benchmark

Introduction

Supported Matrix

Usage

General Use Cases

Dump Throughput and Latency Summary

Demo usage