Memory consumption first grows up then falls down. #26

zhenqin96 · 2023-06-29T03:38:59Z

Dear authors, it is nice to see this amazing work. When I run this code, I found an interesting phenomenon that when loading the model, it occupies more GPU memory. And when the training starts, the GPU memory consumption will be stabilized at a value slightly lower than the former.

For example, when I run openlm-research/open_llama_7b with deepspeed --master_port "$port" --include localhost:"$CUDA_VISIBLE_DEVICES" src/train_lomo.py config/args_lomo.yaml on a single V100 GPU with batch_size set to 1 and others left to the default values, I find that the GPU memory consumption is first 18588MB before training starts, and during training, it is stabilized at 15933MB. Can you provide more information about this phenomenon? Many thanks!

The text was updated successfully, but these errors were encountered:

zhenqin96 · 2023-06-29T06:56:48Z

BTW, the version of pytorch is 2.0

KaiLv69 · 2023-06-29T11:24:30Z

Hi. It's due to some intermediate variables when calling AutoModelForCausalLM.from_pretrained(). And when init deepspeed engine, torch.cuda.empty_cache() will be called to release the memory occupied by these intermediate variables.

zhenqin96 · 2023-06-29T11:40:58Z

Hi. It's due to some intermediate variables when calling AutoModelForCausalLM.from_pretrained(). And when init deepspeed engine, torch.cuda.empty_cache() will be called to release the memory occupied by these intermediate variables.

Thank you very much for your reply!

KaiLv69 self-assigned this Jun 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory consumption first grows up then falls down. #26

Memory consumption first grows up then falls down. #26

zhenqin96 commented Jun 29, 2023

zhenqin96 commented Jun 29, 2023

KaiLv69 commented Jun 29, 2023

zhenqin96 commented Jun 29, 2023

Memory consumption first grows up then falls down. #26

Memory consumption first grows up then falls down. #26

Comments

zhenqin96 commented Jun 29, 2023

zhenqin96 commented Jun 29, 2023

KaiLv69 commented Jun 29, 2023

zhenqin96 commented Jun 29, 2023