Skip to content

Commit

Permalink
Calc llm gpu memory and upgrade to docusaurus v3 (#80)
Browse files Browse the repository at this point in the history
  • Loading branch information
samos123 authored Nov 18, 2023
1 parent 2ba5179 commit 2ee757f
Show file tree
Hide file tree
Showing 5 changed files with 3,528 additions and 1,514 deletions.
53 changes: 53 additions & 0 deletions blog/2023-11-16-calculating-gpu-memory-for-llm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
---
slug: calculating-gpu-memory-for-llm
title: "Calculating GPU memory for LLMs"
authors:
- name: Sam Stoelinga
title: Engineer
url: https://github.com/samos123
tags: [llm, gpu, memory]
---

How many GPUs do I need to be able to serve Llama 70B? In order
to answer that, you need to know how much GPU memory will be required by
the Large Language Model.

The formula is simple:
$$
M = \dfrac{(P * 4B)}{ (32 / Q)} + O
$$
| Symbol | Description |
| ----------- | ----------- |
| M | GPU memory expressed in Gigabyte |
| P | The amount of parameters in the model. E.g. a 7B model has 7 billion parameters. |
| 4B | 4 bytes, expressing the bytes used for each parameter |
| 32 | There are 32 bits in 4 bytes |
| Q | The amount of bits that should be used for loading the model. E.g. 16 bits, 8 bits or 4 bits. |
| O | Overhead of loading additional things in GPU memory. E.g. input or batches |

Now let's try out some examples.

### GPU memory required for serving Llama 70B
Let's try it out for Llama 70B that we will load in 16 bit with 10GB overhead.
The model has 70 billion parameters.
$$
\dfrac{70 * 4 \mathrm{bytes}}{32 / 16} + 10\mathrm{GB} = 150\mathrm{GB}
$$
That's quite a lot of memory. A single A100 80GB wouldn't be enough, although
2x A100 80GB should be enough to serve the Llama 2 70B model in 16 bit mode.

How to further reduce GPU memory required for Llama 2 70B? Quantization is a method to reduce the memory footprint. Quantization is able to do this by reducing the precision of the model's parameters from floating-point to lower-bit representations, such as 8-bit integers. This process significantly decreases the memory and computational requirements, enabling more efficient deployment of the model, particularly on devices with limited resources. However, it requires careful management to maintain the model's performance, as reducing precision can potentially impact the accuracy of the outputs.

In general, the consensus seems to be that 8 bit quantization achieves similar performance to using 16 bit. However, 4 bit quantization could have a noticeable impact to the model performance.

Let's do another example where we use 4 bit quantization of Llama 2 70B and 1GB overhead:
$$
\dfrac{70 * 4 \mathrm{bytes}}{32 / 4} + 1\mathrm{GB} = 36\mathrm{GB}
$$
This is something you could easily run on 2 x L4 24GB GPUs.

Got more questions? Don't hesitate to join our Discord and ask away.

<a href="https://discord.gg/JeXhcmjZVm">
<img alt="discord-invite" src="https://dcbadge.vercel.app/api/server/JeXhcmjZVm?style=flat" />
</a>
2 changes: 1 addition & 1 deletion docs/installation/gcp.md
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,7 @@ echo "PRINCIPAL: ${SERVICE_ACCOUNT}"
```

Run the following to create the ConfigMap:
[embedmd]:# (https://raw.githubusercontent.com/substratusai/substratus/main/install/gcp/up.sh bash /kubectl apply -f - << EOF/ /\nEOF$/)
[embedmd]:# (https://raw.githubusercontent.com/substratusai/substratus/main/install/gcp/up.sh bash /kubectl apply -f -/ /\nEOF$/)
```bash
kubectl apply -f - << EOF
apiVersion: v1
Expand Down
29 changes: 24 additions & 5 deletions docusaurus.config.js
Original file line number Diff line number Diff line change
@@ -1,8 +1,15 @@
// @ts-check
// Note: type annotations allow type checking and IDEs autocompletion

const lightCodeTheme = require("prism-react-renderer/themes/github");
const darkCodeTheme = require("prism-react-renderer/themes/dracula");
import remarkMath from 'remark-math';
import rehypeKatex from 'rehype-katex';

import { themes } from "prism-react-renderer"
const lightCodeTheme = themes.github;
const darkCodeTheme = themes.dracula;

// const lightCodeTheme = require("prism-react-renderer/themes/github");
// const darkCodeTheme = require("prism-react-renderer/themes/dracula");

/** @type {import('@docusaurus/types').Config} */
const config = {
Expand Down Expand Up @@ -47,26 +54,38 @@ const config = {
/** @type {import('@docusaurus/preset-classic').Options} */
({
docs: {
sidebarPath: require.resolve("./sidebars.js"),
sidebarPath: "./sidebars.js",
// Please change this to your repo.
// Remove this to remove the "edit this page" links.
editUrl:
"https://github.com/substratusai/substratusai.github.io/tree/main/",
},
blog: {
remarkPlugins: [remarkMath],
rehypePlugins: [rehypeKatex],
showReadingTime: true,
// Please change this to your repo.
// Remove this to remove the "edit this page" links.
editUrl:
"https://github.com/substratusai/substratusai.github.io/tree/main/",
},
theme: {
customCss: require.resolve("./src/css/custom.css"),
customCss: ["./src/css/custom.css"],
},
}),
],
],

stylesheets: [
{
href: 'https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.css',
type: 'text/css',
integrity:
'sha384-odtC+0UGzzFL/6PNoE8rX/SPcQDXBJ+uRepguP4QkPCm2LBxH3FA3y+fKSiJ+AmM',
crossorigin: 'anonymous',
},
],

themeConfig:
/** @type {import('@docusaurus/preset-classic').ThemeConfig} */
({
Expand Down Expand Up @@ -150,4 +169,4 @@ const config = {
}),
};

module.exports = config;
export default config;
20 changes: 11 additions & 9 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -14,18 +14,20 @@
"write-heading-ids": "docusaurus write-heading-ids"
},
"dependencies": {
"@docusaurus/core": "2.4.1",
"@docusaurus/preset-classic": "2.4.1",
"@mdx-js/react": "^1.6.22",
"@docusaurus/core": "^3.0.0",
"@docusaurus/preset-classic": "^3.0.0",
"@mdx-js/react": "^3.0.0",
"clsx": "^1.2.1",
"prism-react-renderer": "^1.3.5",
"react": "^17.0.2",
"react-dom": "^17.0.2",
"prism-react-renderer": "^2.1.0",
"react": "^18.2.0",
"react-dom": "^18.2.0",
"react-github-btn": "^1.4.0",
"react-player": "^2.12.0"
"react-player": "^2.12.0",
"rehype-katex": "7",
"remark-math": "6"
},
"devDependencies": {
"@docusaurus/module-type-aliases": "2.4.1"
"@docusaurus/module-type-aliases": "^3.0.0"
},
"browserslist": {
"production": [
Expand All @@ -40,6 +42,6 @@
]
},
"engines": {
"node": ">=16.14"
"node": ">=18.0"
}
}
Loading

0 comments on commit 2ee757f

Please sign in to comment.