These scripts are used to check the GPU status of the server.
Because these scripts will not be terminated automatically, it is recommended to run them using screen
or tmux
to avoid being interrupted by the terminal.
- screen command
- outside the screen
screen
to start a new screenscreen -ls
to list all existing screensscreen -r [screen_id]
to resume a screen
- inside the screen
Ctrl + a
thend
to detach the screenCtrl + a
then:quit
to terminate the screen
- outside the screen
This script use huggingface gpt2-xl model to generate random text to check the GPU can work properly.
To run this script, run python3 gpu_test.py --device [gpu_id]
in the terminal.
This script will record the status (fan speed, temperature) of the GPUs every 10 seconds.
To run this script, simply run python3 record_gpu_status.py
in the terminal.
This is a sample shell script to run the above two scripts together conveniently.