Skip to content

Commit

Permalink
Use export_delegate_segments
Browse files Browse the repository at this point in the history
Summary:
Before this diff, exporting the 7b llama model with xnnpack was failing during flatc serialization because of a too-large offset.

I noticed that the `to_executorch()` call wasn't enabling the `extract_delegate_segments` flag, which means:
- The gigabytes of delegate data was getting inlined in the intermediate .json file used for flatc serialization. In this case, that added up to 15GiB of ASCII numbers, consuming 99.98% of the size of the overall .json file. This caused `flatc` to consume a ton of memory when parsing it, and to ultimately fail when it couldn't handle such a large array.
- At runtime when `XnnpackBackend::init()` calls `processed->Free()`, the data couldn't actually be freed, dramatically increasing the peak memory used during execution.

Before setting this flag, exporting the model on my devvm failed after ~43 minutes. After setting this flag, it succeeded after 19 minutes.

Reviewed By: kimishpatel

Differential Revision: D53738848

fbshipit-source-id: a47cc46f2941e74693487aad12621a62b003a3be
  • Loading branch information
dbort authored and facebook-github-bot committed Feb 14, 2024
1 parent d8d60a0 commit c4e0185
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions examples/models/llama2/export_llama_lib.py
Original file line number Diff line number Diff line change
Expand Up @@ -398,6 +398,7 @@ def _export_llama(modelname, args) -> str: # noqa: C901
export_program = edge_manager.to_executorch(
ExecutorchBackendConfig(
extract_constant_segment=True,
extract_delegate_segments=True,
passes=[
QuantFusionPass(),
],
Expand Down

0 comments on commit c4e0185

Please sign in to comment.