-
Connect to the cluster via SSH
ssh <username>@uc2.scc.kit.edu
The very first action should be, to create a shared workspace:
ws_allocate ASR 60 # 60 Days ws_allocate MT 60 # 60 Days
Add users to the workspace:
module load system/ws_addon # Example: ws_share -t dir-w -u uxude ASR ws_share -t dir-w -u <user> <workspace> setfacl -Rm u:USERNAME:rwX,d:u:USERNAME:rwX $(ws_find ASR) setfacl -Rm u:USERNAME:rwX,d:u:USERNAME:rwX $(ws_find MT)
To access the workspace, run:
cd $(ws_find ASR) cd $(ws_find MT)
To check the remaining time of the workspace, run:
ws_list
To extend the workspace, run:
ws_extend ASR 30 # 30 Days ws_extend MT 30 # 30 Days
Note that this is automatically done in the
setup.sh
script once the workspace is about to expire. -
Download the project
git clone https://github.com/BertilBraun/Advanced-Improvement-in-Speech-Translation.git PST
-
Create a virtual environment
First, install miniconda by following the instructions here.
mkdir -p ~/miniconda3 wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3 rm -rf ~/miniconda3/miniconda.sh # Initialize conda in your bash shell ~/miniconda3/bin/conda init bash source ~/.bashrc
Then, create a virtual environment and install the required packages:
cd ~/PST conda create --name pst conda activate pst # Install required packages from environment.yml conda env update -f environment.yml # Ensure setup completed and install additional packages ./setup.sh
-
Running
Ensure that the scripts start executing on the login node to avoid errors after the job has been submitted to the cluster.
./run_YOUR_SCRIPT.sh
Ensure that the script is executable:
chmod +x run_YOUR_SCRIPT.sh
-
Submitting to the cluster
Once you are sure that the script is executable and runs without errors, you can submit it to the cluster.
Make sure, to have set the correct
SBATCH
parameters in the script, such astimeouts
, required clustercores
andGPUs
and thejob-name
. Ensure, that the correct and required modules are being loaded by calling the~/AI-ST/setup.sh
script.#SBATCH --job-name=process_audio # job name #SBATCH --partition=gpu_4 # single, gpu_4 #SBATCH --time=02:00:00 # wall-clock time limit #SBATCH --mem=200000 # in MB check limits per node #SBATCH --nodes=1 # number of nodes to be used #SBATCH --cpus-per-task=1 # number of CPUs required per MPI task #SBATCH --ntasks-per-node=1 # maximum count of tasks per node #SBATCH --mail-type=ALL # Notify user by email when certain event types occur. #SBATCH --gres=gpu:4 # number of GPUs required per node #SBATCH --output=../../ASR/logs/output_%j.txt # standard output and error log #SBATCH --error=../../ASR/logs/error_%j.txt # %j is the job id, making each log file unique, therefore not overwriting each other
To then submit the script to the cluster, run:
sbatch run_YOUR_SCRIPT.sh
You can use the
dev_gpu_4
partition for quick testing, but be aware that the maximum runtime is 30 minutes. -
Monitoring
To monitor the status of your job, run:
squeue -l [(-i 2) to update every 2 seconds]
To cancel a job, run:
scancel <job-id>
The logs of the job are mostly stored in the directory of the script, depending on the task. The output and error logs are named
output_<job-id>.txt
anderror_<job-id>.txt
, respectively. Thejob-id
is the number that is returned when submitting the job to the cluster. -
Downloading the results
To download the results from the cluster, run:
scp [-r] <username>@uc2.scc.kit.edu:~/<PATH-ON-REMOTE> <LOCAL-PATH>
The
-r
flag is only required if you want to download a directory.