Skip to content

Commit

Permalink
fix: split tensor uma estimate result
Browse files Browse the repository at this point in the history
Signed-off-by: thxCode <[email protected]>
  • Loading branch information
thxCode committed Aug 23, 2024
1 parent 405be43 commit ff62abf
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ GGUF Parser helps in reviewing and estimating the usage of a GGUF format model w

- Since v0.7.2, GGUF Parser supports retrieving the model's metadata via split file,
which suffixes with something like `-00001-of-00009.gguf`.
- The table result `UMA` indicates the memory usage of Apple MacOS only.
- The table result `UMA` indicates the memory usage of Apple macOS only.
- Since v0.7.0, GGUF Parser is going to support estimating the usage of multiple GPUs.
+ The table result `RAM` means the system memory usage when
running [LLaMA.Cpp](https://github.com/ggerganov/llama.cpp) or LLaMA.Cpp like application.
Expand Down
6 changes: 6 additions & 0 deletions file_estimate.go
Original file line number Diff line number Diff line change
Expand Up @@ -641,6 +641,12 @@ func (e LLaMACppUsageEstimate) SummarizeMemory(mmap bool, nonUMARamFootprint, no
ems.VRAMs[i].UMA = fp + wg + kv + /* cp */ 0
if !e.NoMMap && mmap {
ems.VRAMs[i].UMA -= wg
// NB(thxCode): the weight add back for the following reasons:
// - UMA treats as one device.
// - RPC server will load all weights and computation.
if i > 0 {
ems.VRAMs[i].UMA += wg + cp
}
}

// NonUMA.
Expand Down

0 comments on commit ff62abf

Please sign in to comment.