Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Summary: Before this diff, exporting the 7b llama model with xnnpack was failing during flatc serialization because of a too-large offset. I noticed that the `to_executorch()` call wasn't enabling the `extract_delegate_segments` flag, which means: - The gigabytes of delegate data was getting inlined in the intermediate .json file used for flatc serialization. In this case, that added up to 15GiB of ASCII numbers, consuming 99.98% of the size of the overall .json file. This caused `flatc` to consume a ton of memory when parsing it, and to ultimately fail when it couldn't handle such a large array. - At runtime when `XnnpackBackend::init()` calls `processed->Free()`, the data couldn't actually be freed, dramatically increasing the peak memory used during execution. Before setting this flag, exporting the model on my devvm failed after ~43 minutes. After setting this flag, it succeeded after 19 minutes. Reviewed By: kimishpatel Differential Revision: D53738848 fbshipit-source-id: a47cc46f2941e74693487aad12621a62b003a3be
- Loading branch information