forked from pytorch/executorch
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Halve model loading time for llama demo on iOS (pytorch#4032)
Summary: Pull Request resolved: pytorch#4032 mmap is not recommended for large sequential workloads -- you have to take a bunch of page faults. I originally assumed this would hurt peak memory usage (we read all the weights into memory at once and then pack them; packing is basically copying them), but it doesn't. In retrospect, this makes sense because we actually operate on one weights tensor at a time, and the individual tensors aren't gigantic, there are just a lot of them. Reviewed By: larryliu0820 Differential Revision: D58826044 fbshipit-source-id: 1a4e8a3522d5e24b56687bebd1f0f92e5514aff2
- Loading branch information
1 parent
01a1b28
commit 8740c69
Showing
3 changed files
with
16 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters