Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update XNNPACK to latest version #18038

Merged
merged 28 commits into from
Nov 3, 2023
Merged

Update XNNPACK to latest version #18038

merged 28 commits into from
Nov 3, 2023

Conversation

skottmckay
Copy link
Contributor

@skottmckay skottmckay commented Oct 20, 2023

Description

Update XNNPACK to latest version

  • adds fp16 kernels and various other improvements
  • requires pthreadpool update as well

Most code updates in the XNNPACK EP are to adjust to the new XNNPACK API

  • 'setup' is split into 'reshape' and 'setup'
  • some ops use a workspace buffer
    • copied workspace allocation from XNNPACK unit test code
  • some suffixes changed

Added wrapper for XNNPACK caches to base XNNPACK EP kernel

  • simplifies usage
  • XNNPACK split out the code and weights caches, but the code cache isn't currently usable via the public API
    • we could use the internal types if we think it's required for performance reasons. non-trivial though as we'd need to propagate ifdef values from the XNNPACK build up to the ORT build.
    • using XNNPACK internals would also mean we would not be able to support using a pre-build XNNPACK package
      • not an issue currently

Fixed opset registration for internal NHWC domain

  • was not being tied to the ONNX version, so nodes inserted by layout transformation had the incorrect opset
  • a number of other places needed updating once this issue was fixed

Remove support for NCHW Resize from XNNPACK EP so it's NHWC only

  • we only supported NCHW for fp32,
    • doing so adds complexity in multiple places (XNNPACK EP kernel implementation, layout transformation and transpose optimization)
    • unclear if that complexity provides any benefit. can add back if required by production scenario

Motivation and Context

We're looking at enabling fp16 support for CoreML and NNAPI. If we do that we need a good fallback story if the CPU EP will be used. The XNNPACK fp16 kernels will hopefully provide that.

NOTE: This PR doesn't add fp16 support to the XNNPACK EP kernels. That can be done as required in separate EPs and should be relatively simple to do.

Pending some updates to the cmake config there to work with FetchContent so patch file is WIP.
…internal cache.h

Fix Resize registrations
Fix opset of internal NHWC domain not matching the ONNX opset for the model
…hanges that required extra patching.

Update dependency artifacts in az.
…s. As that EP explicitly registers ops in the old opsets I'm assuming they need these parallel schemas.
@skottmckay skottmckay marked this pull request as ready for review October 25, 2023 05:17
@skottmckay skottmckay requested a review from a team as a code owner October 25, 2023 05:17
@snnn
Copy link
Member

snnn commented Oct 26, 2023

/azp run Windows CPU CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@skottmckay skottmckay requested a review from wejoncy October 27, 2023 05:48
@skottmckay
Copy link
Contributor Author

I'll address the deps.txt conflict once everyone is happy with the changes as it requires updating the dependencies package in the CI to match the latest main.

@wejoncy
Copy link
Contributor

wejoncy commented Oct 27, 2023

LGTM!

- changes to check if an EP had a kernel to speed up unit tests (assumably) didn't take into account some EPs only have NHWC versions of kernels
- update xnnpack kernels to cover earlier opsets
  - had to add schemas for earlier versions for jsep and the operator unit tests also only have good coverage for earlier schemas
    - if we didn't do this we lose a lot of test coverage
- move some files to the correct directories for the operator
- fix usage of workspace in a few places
- allow zero size allocation to not throw. sometime the workspace has a size of zero (e.g. Resize) when it's not needed
@snnn snnn merged commit 4f2096b into main Nov 3, 2023
89 of 91 checks passed
@snnn snnn deleted the skottmckay/UpdateXNNPACK branch November 3, 2023 16:04
kleiti pushed a commit to kleiti/onnxruntime that referenced this pull request Mar 22, 2024
### Description
<!-- Describe your changes. -->
Update XNNPACK to latest version
- adds fp16 kernels and various other improvements
- requires pthreadpool update as well

Most code updates in the XNNPACK EP are to adjust to the new XNNPACK API
- 'setup' is split into 'reshape' and 'setup'
-  some ops use a workspace buffer
   -  copied workspace allocation from XNNPACK unit test code
- some suffixes changed 

Added wrapper for XNNPACK caches to base XNNPACK EP kernel
- simplifies usage
- XNNPACK split out the code and weights caches, but the code cache
isn't currently usable via the public API
- we could use the internal types if we think it's required for
performance reasons. non-trivial though as we'd need to propagate ifdef
values from the XNNPACK build up to the ORT build.
- using XNNPACK internals would also mean we would not be able to
support using a pre-build XNNPACK package
    - not an issue currently
  
Fixed opset registration for internal NHWC domain
- was not being tied to the ONNX version, so nodes inserted by layout
transformation had the incorrect opset
- a number of other places needed updating once this issue was fixed

Remove support for NCHW Resize from XNNPACK EP so it's NHWC only
- we only supported NCHW for fp32,
- doing so adds complexity in multiple places (XNNPACK EP kernel
implementation, layout transformation and transpose optimization)
- unclear if that complexity provides any benefit. can add back if
required by production scenario

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
We're looking at enabling fp16 support for CoreML and NNAPI. If we do
that we need a good fallback story if the CPU EP will be used. The
XNNPACK fp16 kernels will hopefully provide that.

NOTE: This PR doesn't add fp16 support to the XNNPACK EP kernels. That
can be done as required in separate EPs and should be relatively simple
to do.
siweic0 pushed a commit to siweic0/onnxruntime-web that referenced this pull request May 9, 2024
### Description
<!-- Describe your changes. -->
Update XNNPACK to latest version
- adds fp16 kernels and various other improvements
- requires pthreadpool update as well

Most code updates in the XNNPACK EP are to adjust to the new XNNPACK API
- 'setup' is split into 'reshape' and 'setup'
-  some ops use a workspace buffer
   -  copied workspace allocation from XNNPACK unit test code
- some suffixes changed 

Added wrapper for XNNPACK caches to base XNNPACK EP kernel
- simplifies usage
- XNNPACK split out the code and weights caches, but the code cache
isn't currently usable via the public API
- we could use the internal types if we think it's required for
performance reasons. non-trivial though as we'd need to propagate ifdef
values from the XNNPACK build up to the ORT build.
- using XNNPACK internals would also mean we would not be able to
support using a pre-build XNNPACK package
    - not an issue currently
  
Fixed opset registration for internal NHWC domain
- was not being tied to the ONNX version, so nodes inserted by layout
transformation had the incorrect opset
- a number of other places needed updating once this issue was fixed

Remove support for NCHW Resize from XNNPACK EP so it's NHWC only
- we only supported NCHW for fp32,
- doing so adds complexity in multiple places (XNNPACK EP kernel
implementation, layout transformation and transpose optimization)
- unclear if that complexity provides any benefit. can add back if
required by production scenario

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
We're looking at enabling fp16 support for CoreML and NNAPI. If we do
that we need a good fallback story if the CPU EP will be used. The
XNNPACK fp16 kernels will hopefully provide that.

NOTE: This PR doesn't add fp16 support to the XNNPACK EP kernels. That
can be done as required in separate EPs and should be relatively simple
to do.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants