Skip to content

Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool), targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep learning frameworks to pursue optimal inference performance.

License

Notifications You must be signed in to change notification settings

XuehaoSun/neural-compressor

This branch is 4 commits ahead of, 473 commits behind intel/neural-compressor:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

5abfac4 · Feb 4, 2024
Jan 31, 2024
Feb 4, 2024
Dec 29, 2023
Dec 4, 2023
Jan 31, 2024
Jan 30, 2024
Jan 25, 2024
Feb 4, 2024
Feb 2, 2024
Feb 1, 2024
Feb 2, 2024
Jan 23, 2024
Feb 1, 2024
Jun 16, 2020
Dec 29, 2023
Oct 23, 2022
Sep 13, 2023
Aug 28, 2023
Jan 22, 2024
Nov 14, 2023
Nov 28, 2023
Oct 24, 2023
Jan 23, 2024
Sep 15, 2023

Repository files navigation

Intel® Neural Compressor

An open-source Python library supporting popular model compression techniques on all mainstream deep learning frameworks (TensorFlow, PyTorch, ONNX Runtime, and MXNet)

python version license coverage Downloads

Architecture   |   Workflow   |   LLMs Recipes   |   Results   |   Documentations


Intel® Neural Compressor aims to provide popular model compression techniques such as quantization, pruning (sparsity), distillation, and neural architecture search on mainstream frameworks such as TensorFlow, PyTorch, ONNX Runtime, and MXNet, as well as Intel extensions such as Intel Extension for TensorFlow and Intel Extension for PyTorch. In particular, the tool provides the key features, typical examples, and open collaborations as below:

Installation

Install from pypi

pip install neural-compressor

Note: More installation methods can be found at Installation Guide. Please check out our FAQ for more details.

Getting Started

Quantization with Python API

# Install Intel Neural Compressor and TensorFlow
pip install neural-compressor
pip install tensorflow
# Prepare fp32 model
wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v1_6/mobilenet_v1_1.0_224_frozen.pb
from neural_compressor.data import DataLoader, Datasets
from neural_compressor.config import PostTrainingQuantConfig

dataset = Datasets("tensorflow")["dummy"](shape=(1, 224, 224, 3))
dataloader = DataLoader(framework="tensorflow", dataset=dataset)

from neural_compressor.quantization import fit

q_model = fit(
    model="./mobilenet_v1_1.0_224_frozen.pb",
    conf=PostTrainingQuantConfig(),
    calib_dataloader=dataloader,
)

Documentation

Overview
Architecture Workflow APIs LLMs Recipes Examples
Python-based APIs
Quantization Advanced Mixed Precision Pruning (Sparsity) Distillation
Orchestration Benchmarking Distributed Compression Model Export
Neural Coder (Zero-code Optimization)
Launcher JupyterLab Extension Visual Studio Code Extension Supported Matrix
Advanced Topics
Adaptor Strategy Distillation for Quantization SmoothQuant
Weight-Only Quantization (INT8/INT4/FP4/NF4) FP8 Quantization Layer-Wise Quantization
Innovations for Productivity
Neural Insights Neural Solution

Note: More documentations can be found at User Guide.

Selected Publications/Events

Note: View Full Publication List.

Additional Content

Communication

  • GitHub Issues: mainly for bug reports, new feature requests, question asking, etc.
  • Email: welcome to raise any interesting research ideas on model compression techniques by email for collaborations.
  • Discord Channel: join the discord channel for more flexible technical discussion.
  • WeChat group: scan the QA code to join the technical discussion.

About

Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool), targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep learning frameworks to pursue optimal inference performance.

Resources

License

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 91.5%
  • JavaScript 5.1%
  • Shell 1.2%
  • TypeScript 1.0%
  • Jupyter Notebook 0.6%
  • CSS 0.4%
  • Other 0.2%