Skip to content

Commit

Permalink
Merge branch 'master' of github.com:farleylai/notes
Browse files Browse the repository at this point in the history
  • Loading branch information
Farley Lai committed Sep 15, 2022
2 parents 6bcf9c9 + 61eb407 commit 541bc9a
Show file tree
Hide file tree
Showing 2 changed files with 29 additions and 196 deletions.
19 changes: 11 additions & 8 deletions _notebooks/2022-07-29-generative.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,7 @@
"metadata": {
"colab": {
"name": "2022-07-29-generative.ipynb",
"provenance": [],
"authorship_tag": "ABX9TyMW/NI7AfluzKqKSZtGsNEI"
"provenance": []
},
"kernelspec": {
"name": "python3",
Expand All @@ -19,7 +18,7 @@
{
"cell_type": "markdown",
"source": [
"# Generative AI with Diffusion Models\n",
"# Generative Diffusion Models for Image Synthesis\n",
"> Implication of memory mapped storage for super large data access.\n",
"\n",
"- hide: true\n",
Expand All @@ -35,10 +34,14 @@
{
"cell_type": "markdown",
"source": [
"TL; DR\t\n",
"- SLURM monitors the total resident memory (RSS) consumed by all the task processes (incl. dataloader workers)\t\n",
"- `pin_memory=True` increases RSS significantly and may cause leaks with mmap based LMDB, pushing to the memory limit sooner\n",
"- PyTorch `FastDataLoader` or `DataLoader` created with `persistent_workers=True` is going to accumulate RSS with workers that never reset MMAP based storage such as `LMDB` env across epochs"
"TL; DR\n",
"\n",
"This post compares and highlights the progress in recent generative text-image diffusion models as follows:\n",
"\n",
"- Diffusion Models Beat GANs (OpenAI)\n",
"- GLIDE (OpenAI)\n",
"- DALLE·2 (OpenAI)\n",
"- Imagen (Google Brain)"
],
"metadata": {
"id": "xbz6IUa-1749"
Expand Down Expand Up @@ -243,4 +246,4 @@
}
}
]
}
}
206 changes: 18 additions & 188 deletions _notebooks/2022-07-30-transformers.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,7 @@
"metadata": {
"colab": {
"name": "2022-07-30-transformers.ipynb",
"provenance": [],
"authorship_tag": "ABX9TyMW/NI7AfluzKqKSZtGsNEI"
"provenance": []
},
"kernelspec": {
"name": "python3",
Expand All @@ -26,7 +25,7 @@
"- toc: false\n",
"- badges: true\n",
"- comments: true\n",
"- categories: [blog, generative, diffusion, deep learning]"
"- categories: [blog, transformer, computer vision, deep learning, multimodal]"
],
"metadata": {
"id": "xBiIWit9fTB0"
Expand All @@ -35,10 +34,21 @@
{
"cell_type": "markdown",
"source": [
"TL; DR\t\n",
"- SLURM monitors the total resident memory (RSS) consumed by all the task processes (incl. dataloader workers)\t\n",
"- `pin_memory=True` increases RSS significantly and may cause leaks with mmap based LMDB, pushing to the memory limit sooner\n",
"- PyTorch `FastDataLoader` or `DataLoader` created with `persistent_workers=True` is going to accumulate RSS with workers that never reset MMAP based storage such as `LMDB` env across epochs"
"TL; DR\n",
"\n",
"\n",
"This post goes through transformer based architectures in various novel applications.\n",
"\n",
"- Transformer for End-to-End Object Detection - 🔗 Zhu et al. (2021)\n",
"- Transformer for 3D Object Detection - 🔗 Bhattacharyya et al. (2021)\n",
"- Transformer for Multi-Object Tracking - 🔗 Sun et al. (2020)\n",
"- Transformer for Lane Shape Prediction - 🔗 Liu et al. (2020)\n",
"- Transformer for Vision-Language Modeling - 🔗 Zhang et al. (2021)\n",
"- Transformer for Image Synthesis - 🔗 Esser et al. (2020)\n",
"- Transformer for Music Generation - 🔗 Hsiao et al. (2021)\n",
"- Transformer for Dance Generation with Music - 🔗 Huang et al. (2021)\n",
"- Transformer for Point-Cloud Processing - 🔗 Guo et al. (2020)\n",
"- Transformer for Time-Series Forecasting - 🔗 Lim et al. (2020)"
],
"metadata": {
"id": "xbz6IUa-1749"
Expand All @@ -61,186 +71,6 @@
"metadata": {
"id": "nVWrtfItgu6z"
}
},
{
"cell_type": "code",
"source": [
"import os\n",
"import sys\n",
"import random\n",
"import pytest\n",
"import torch\n",
"\n",
"from torch.utils.data import DataLoader\n",
"from time import time\n",
"from tqdm import tqdm\n",
"\n",
"print()\n",
"\n",
"KB = 2**10\n",
"MB = 2**10 * KB\n",
"GB = 2**10 * MB\n",
"\n",
"def rss_usage(breakdown=False):\n",
" import psutil\n",
" proc = psutil.Process(os.getpid())\n",
" RSS = []\n",
" RSS.append((os.getpid(), proc.memory_info().rss))\n",
" for child in proc.children(recursive=True):\n",
" RSS.append((child.pid, child.memory_info().rss))\n",
" \n",
" rss = sum(mem for _, mem in RSS)\n",
" return (rss, RSS) if breakdown else rss\n",
"\n",
"def test_rss():\n",
" print(sys.argv)\n",
" argv = sys.argv\n",
" sys.argv = [argv[1]]\n",
" import lmdb\n",
" from utils.utils import FastDataLoader\n",
" from dataset.lmdb_dataset import UCF101LMDB_2CLIP\n",
" from main_nce import parse_args, get_transform\n",
" args = parse_args()\n",
" sys.argv = argv\n",
" lmdb_root = \"/mnt/ssd/dataset/ucf101/lmdb\"\n",
" lmdb_path = f\"{lmdb_root}/UCF101/ucf101_frame.lmdb\"\n",
" trans = get_transform('train', args)\n",
" ucf101 = UCF101LMDB_2CLIP(db_path=lmdb_path, mode='train', transform=trans, num_frames=32, ds=1, return_label=True)\n",
" print(f\"Created UCF101 2clip dataset of size {len(ucf101)}\")\n",
"\n",
" dataloader = FastDataLoader(ucf101, \n",
" batch_size=32, shuffle=True,\n",
" num_workers=4, persistent_workers=False, \n",
" pin_memory=not True, sampler=None, drop_last=True)\n",
" batches = 8\n",
" for epoch in range(3):\n",
" rss = rss_usage()\n",
" print(f\"[e{epoch:02d}] RSS: {rss / GB:.2f} GB\")\n",
" for idx, (input_seq, label) in tqdm(enumerate(dataloader), total=len(dataloader), disable=True):\n",
" if idx % 4 == 0:\n",
" rss, RSS = rss_usage(True)\n",
" for pid, mem in RSS:\n",
" print(f\"[e{epoch:02d}][b{idx:02d}][{pid}] consumes {mem / GB:.2f} GB\")\n",
" print(f\"[e{epoch:02d}][b{idx:02d}] RSS: {rss / GB:.2f} GB\")\n",
" if idx == batches:\n",
" break"
],
"metadata": {
"id": "DZvUrWjofNHQ"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"[e00] RSS: 2.56 GB\n",
"[e00][b00][14023] consumes 1.08 GB\n",
"[e00][b00][14055] consumes 0.49 GB\n",
"[e00][b00][14071] consumes 0.81 GB\n",
"[e00][b00][14087] consumes 0.83 GB\n",
"[e00][b00][14103] consumes 0.84 GB\n",
"[e00][b00] RSS: 4.07 GB\n",
"[e00][b04][14023] consumes 1.08 GB\n",
"[e00][b04][14055] consumes 0.78 GB\n",
"[e00][b04][14071] consumes 0.90 GB\n",
"[e00][b04][14087] consumes 0.64 GB\n",
"[e00][b04][14103] consumes 0.80 GB\n",
"[e00][b04] RSS: 4.20 GB\n",
"[e00][b08][14023] consumes 1.08 GB\n",
"[e00][b08][14055] consumes 0.97 GB\n",
"[e00][b08][14071] consumes 1.00 GB\n",
"[e00][b08][14087] consumes 0.66 GB\n",
"[e00][b08][14103] consumes 1.24 GB\n",
"[e00][b08] RSS: 4.95 GB\n",
"[e01] RSS: 4.97 GB\n",
"[e01][b00][14023] consumes 1.08 GB\n",
"[e01][b00][14055] consumes 0.66 GB\n",
"[e01][b00][14071] consumes 0.92 GB\n",
"[e01][b00][14087] consumes 0.66 GB\n",
"[e01][b00][14103] consumes 0.80 GB\n",
"[e01][b00] RSS: 4.12 GB\n",
"[e01][b04][14023] consumes 1.08 GB\n",
"[e01][b04][14055] consumes 0.80 GB\n",
"[e01][b04][14071] consumes 0.99 GB\n",
"[e01][b04][14087] consumes 1.04 GB\n",
"[e01][b04][14103] consumes 1.03 GB\n",
"[e01][b04] RSS: 4.93 GB\n",
"[e01][b08][14023] consumes 1.08 GB\n",
"[e01][b08][14055] consumes 0.87 GB\n",
"[e01][b08][14071] consumes 1.05 GB\n",
"[e01][b08][14087] consumes 1.19 GB\n",
"[e01][b08][14103] consumes 1.07 GB\n",
"[e01][b08] RSS: 5.26 GB\n",
"[e02] RSS: 5.29 GB\n",
"[e02][b00][14023] consumes 1.08 GB\n",
"[e02][b00][14055] consumes 0.85 GB\n",
"[e02][b00][14071] consumes 1.06 GB\n",
"[e02][b00][14087] consumes 1.09 GB\n",
"[e02][b00][14103] consumes 1.09 GB\n",
"[e02][b00] RSS: 5.17 GB\n",
"[e02][b04][14023] consumes 1.08 GB\n",
"[e02][b04][14055] consumes 0.92 GB\n",
"[e02][b04][14071] consumes 1.12 GB\n",
"[e02][b04][14087] consumes 0.86 GB\n",
"[e02][b04][14103] consumes 1.14 GB\n",
"[e02][b04] RSS: 5.12 GB\n",
"[e02][b08][14023] consumes 1.08 GB\n",
"[e02][b08][14055] consumes 0.97 GB\n",
"[e02][b08][14071] consumes 1.19 GB\n",
"[e02][b08][14087] consumes 0.93 GB\n",
"[e02][b08][14103] consumes 1.23 GB\n",
"[e02][b08] RSS: 5.39 GB"
],
"metadata": {
"id": "EM7WJme3qSgF"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"The root cause is the `LMDB` Python API to access database records as follows may not release the mapped memory timely on completion to reduce the runtime RSS."
],
"metadata": {
"id": "7wVw3tkF_Jro"
}
},
{
"cell_type": "code",
"source": [
"class UCF101LMDB_2CLIP(object):\n",
" ...\n",
" print('Loading LMDB from %s, split:%d' % (self.db_path, self.which_split))\n",
" self.env = lmdb.open(self.db_path, subdir=os.path.isdir(self.db_path),\n",
" readonly=True, lock=False,\n",
" readahead=False, meminit=False)\n",
" ...\n",
" \n",
" def __getitem__(self, index):\n",
" vpath, vlen, vlabel, vname = self.video_subset.iloc[index]\n",
" env = self.env\n",
" with env.begin(write=False) as txn:\n",
" raw = msgpack.loads(txn.get(self.get_video_id[vname].encode('ascii')))"
],
"metadata": {
"id": "tZIeNR97_4ky"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Worse, the [FastLoader](https://bit.ly/3vvTXtj) never recreates dataset iterator workers that involes the `LMDB` env and will grow RSS over epochs due to increasing MMAP access.\n",
"If using the vanilla `DataLoader`, make sure to set `persistent_workers=False` in case of a similar memory leak.\n",
"Nonetheless, sufficient memory must be allocated at least for peak usage in one epoch.\n",
"This serves as the workaround."
],
"metadata": {
"id": "9DiPzglnAA7j"
}
}
]
}
}

0 comments on commit 541bc9a

Please sign in to comment.