Skip to content

Commit

Permalink
even better setup
Browse files Browse the repository at this point in the history
  • Loading branch information
ymahlau committed Dec 5, 2023
1 parent 2b0c68e commit 7ad6cb6
Show file tree
Hide file tree
Showing 27 changed files with 56 additions and 55 deletions.
4 changes: 2 additions & 2 deletions config/cfg_luis_resp_aa_0.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -146,10 +146,10 @@ data:
worker_episode_bucket_size: 5
max_batch_size: 15000
max_cpu_evaluator: 1
max_cpu_inference_server: 2
max_cpu_inference_server: 6
max_cpu_log_dist_save_collect: 1
max_cpu_updater: 2
max_cpu_worker: 22
max_cpu_worker: 32
max_eval_per_worker: 74
merge_inference_update_gpu: false
net_cfg:
Expand Down
4 changes: 2 additions & 2 deletions config/cfg_luis_resp_aa_1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -146,10 +146,10 @@ data:
worker_episode_bucket_size: 5
max_batch_size: 15000
max_cpu_evaluator: 1
max_cpu_inference_server: 2
max_cpu_inference_server: 6
max_cpu_log_dist_save_collect: 1
max_cpu_updater: 2
max_cpu_worker: 22
max_cpu_worker: 32
max_eval_per_worker: 74
merge_inference_update_gpu: false
net_cfg:
Expand Down
4 changes: 2 additions & 2 deletions config/cfg_luis_resp_aa_2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -146,10 +146,10 @@ data:
worker_episode_bucket_size: 5
max_batch_size: 15000
max_cpu_evaluator: 1
max_cpu_inference_server: 2
max_cpu_inference_server: 6
max_cpu_log_dist_save_collect: 1
max_cpu_updater: 2
max_cpu_worker: 22
max_cpu_worker: 32
max_eval_per_worker: 74
merge_inference_update_gpu: false
net_cfg:
Expand Down
4 changes: 2 additions & 2 deletions config/cfg_luis_resp_aa_3.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -146,10 +146,10 @@ data:
worker_episode_bucket_size: 5
max_batch_size: 15000
max_cpu_evaluator: 1
max_cpu_inference_server: 2
max_cpu_inference_server: 6
max_cpu_log_dist_save_collect: 1
max_cpu_updater: 2
max_cpu_worker: 22
max_cpu_worker: 32
max_eval_per_worker: 74
merge_inference_update_gpu: false
net_cfg:
Expand Down
4 changes: 2 additions & 2 deletions config/cfg_luis_resp_aa_4.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -146,10 +146,10 @@ data:
worker_episode_bucket_size: 5
max_batch_size: 15000
max_cpu_evaluator: 1
max_cpu_inference_server: 2
max_cpu_inference_server: 6
max_cpu_log_dist_save_collect: 1
max_cpu_updater: 2
max_cpu_worker: 22
max_cpu_worker: 32
max_eval_per_worker: 74
merge_inference_update_gpu: false
net_cfg:
Expand Down
4 changes: 2 additions & 2 deletions config/cfg_luis_resp_cc_0.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -141,10 +141,10 @@ data:
worker_episode_bucket_size: 5
max_batch_size: 15000
max_cpu_evaluator: 1
max_cpu_inference_server: 2
max_cpu_inference_server: 6
max_cpu_log_dist_save_collect: 1
max_cpu_updater: 2
max_cpu_worker: 22
max_cpu_worker: 32
max_eval_per_worker: 74
merge_inference_update_gpu: false
net_cfg:
Expand Down
4 changes: 2 additions & 2 deletions config/cfg_luis_resp_cc_1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -141,10 +141,10 @@ data:
worker_episode_bucket_size: 5
max_batch_size: 15000
max_cpu_evaluator: 1
max_cpu_inference_server: 2
max_cpu_inference_server: 6
max_cpu_log_dist_save_collect: 1
max_cpu_updater: 2
max_cpu_worker: 22
max_cpu_worker: 32
max_eval_per_worker: 74
merge_inference_update_gpu: false
net_cfg:
Expand Down
4 changes: 2 additions & 2 deletions config/cfg_luis_resp_cc_2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -141,10 +141,10 @@ data:
worker_episode_bucket_size: 5
max_batch_size: 15000
max_cpu_evaluator: 1
max_cpu_inference_server: 2
max_cpu_inference_server: 6
max_cpu_log_dist_save_collect: 1
max_cpu_updater: 2
max_cpu_worker: 22
max_cpu_worker: 32
max_eval_per_worker: 74
merge_inference_update_gpu: false
net_cfg:
Expand Down
4 changes: 2 additions & 2 deletions config/cfg_luis_resp_cc_3.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -141,10 +141,10 @@ data:
worker_episode_bucket_size: 5
max_batch_size: 15000
max_cpu_evaluator: 1
max_cpu_inference_server: 2
max_cpu_inference_server: 6
max_cpu_log_dist_save_collect: 1
max_cpu_updater: 2
max_cpu_worker: 22
max_cpu_worker: 32
max_eval_per_worker: 74
merge_inference_update_gpu: false
net_cfg:
Expand Down
4 changes: 2 additions & 2 deletions config/cfg_luis_resp_cc_4.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -141,10 +141,10 @@ data:
worker_episode_bucket_size: 5
max_batch_size: 15000
max_cpu_evaluator: 1
max_cpu_inference_server: 2
max_cpu_inference_server: 6
max_cpu_log_dist_save_collect: 1
max_cpu_updater: 2
max_cpu_worker: 22
max_cpu_worker: 32
max_eval_per_worker: 74
merge_inference_update_gpu: false
net_cfg:
Expand Down
4 changes: 2 additions & 2 deletions config/cfg_luis_resp_co_0.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -126,10 +126,10 @@ data:
worker_episode_bucket_size: 5
max_batch_size: 15000
max_cpu_evaluator: 1
max_cpu_inference_server: 2
max_cpu_inference_server: 6
max_cpu_log_dist_save_collect: 1
max_cpu_updater: 2
max_cpu_worker: 22
max_cpu_worker: 32
max_eval_per_worker: 74
merge_inference_update_gpu: false
net_cfg:
Expand Down
4 changes: 2 additions & 2 deletions config/cfg_luis_resp_co_1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -126,10 +126,10 @@ data:
worker_episode_bucket_size: 5
max_batch_size: 15000
max_cpu_evaluator: 1
max_cpu_inference_server: 2
max_cpu_inference_server: 6
max_cpu_log_dist_save_collect: 1
max_cpu_updater: 2
max_cpu_worker: 22
max_cpu_worker: 32
max_eval_per_worker: 74
merge_inference_update_gpu: false
net_cfg:
Expand Down
4 changes: 2 additions & 2 deletions config/cfg_luis_resp_co_2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -126,10 +126,10 @@ data:
worker_episode_bucket_size: 5
max_batch_size: 15000
max_cpu_evaluator: 1
max_cpu_inference_server: 2
max_cpu_inference_server: 6
max_cpu_log_dist_save_collect: 1
max_cpu_updater: 2
max_cpu_worker: 22
max_cpu_worker: 32
max_eval_per_worker: 74
merge_inference_update_gpu: false
net_cfg:
Expand Down
4 changes: 2 additions & 2 deletions config/cfg_luis_resp_co_3.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -126,10 +126,10 @@ data:
worker_episode_bucket_size: 5
max_batch_size: 15000
max_cpu_evaluator: 1
max_cpu_inference_server: 2
max_cpu_inference_server: 6
max_cpu_log_dist_save_collect: 1
max_cpu_updater: 2
max_cpu_worker: 22
max_cpu_worker: 32
max_eval_per_worker: 74
merge_inference_update_gpu: false
net_cfg:
Expand Down
4 changes: 2 additions & 2 deletions config/cfg_luis_resp_co_4.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -126,10 +126,10 @@ data:
worker_episode_bucket_size: 5
max_batch_size: 15000
max_cpu_evaluator: 1
max_cpu_inference_server: 2
max_cpu_inference_server: 6
max_cpu_log_dist_save_collect: 1
max_cpu_updater: 2
max_cpu_worker: 22
max_cpu_worker: 32
max_eval_per_worker: 74
merge_inference_update_gpu: false
net_cfg:
Expand Down
4 changes: 2 additions & 2 deletions config/cfg_luis_resp_cr_0.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -121,10 +121,10 @@ data:
worker_episode_bucket_size: 5
max_batch_size: 15000
max_cpu_evaluator: 1
max_cpu_inference_server: 2
max_cpu_inference_server: 6
max_cpu_log_dist_save_collect: 1
max_cpu_updater: 2
max_cpu_worker: 22
max_cpu_worker: 32
max_eval_per_worker: 74
merge_inference_update_gpu: false
net_cfg:
Expand Down
4 changes: 2 additions & 2 deletions config/cfg_luis_resp_cr_1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -121,10 +121,10 @@ data:
worker_episode_bucket_size: 5
max_batch_size: 15000
max_cpu_evaluator: 1
max_cpu_inference_server: 2
max_cpu_inference_server: 6
max_cpu_log_dist_save_collect: 1
max_cpu_updater: 2
max_cpu_worker: 22
max_cpu_worker: 32
max_eval_per_worker: 74
merge_inference_update_gpu: false
net_cfg:
Expand Down
4 changes: 2 additions & 2 deletions config/cfg_luis_resp_cr_2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -121,10 +121,10 @@ data:
worker_episode_bucket_size: 5
max_batch_size: 15000
max_cpu_evaluator: 1
max_cpu_inference_server: 2
max_cpu_inference_server: 6
max_cpu_log_dist_save_collect: 1
max_cpu_updater: 2
max_cpu_worker: 22
max_cpu_worker: 32
max_eval_per_worker: 74
merge_inference_update_gpu: false
net_cfg:
Expand Down
4 changes: 2 additions & 2 deletions config/cfg_luis_resp_cr_3.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -121,10 +121,10 @@ data:
worker_episode_bucket_size: 5
max_batch_size: 15000
max_cpu_evaluator: 1
max_cpu_inference_server: 2
max_cpu_inference_server: 6
max_cpu_log_dist_save_collect: 1
max_cpu_updater: 2
max_cpu_worker: 22
max_cpu_worker: 32
max_eval_per_worker: 74
merge_inference_update_gpu: false
net_cfg:
Expand Down
4 changes: 2 additions & 2 deletions config/cfg_luis_resp_cr_4.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -121,10 +121,10 @@ data:
worker_episode_bucket_size: 5
max_batch_size: 15000
max_cpu_evaluator: 1
max_cpu_inference_server: 2
max_cpu_inference_server: 6
max_cpu_log_dist_save_collect: 1
max_cpu_updater: 2
max_cpu_worker: 22
max_cpu_worker: 32
max_eval_per_worker: 74
merge_inference_update_gpu: false
net_cfg:
Expand Down
4 changes: 2 additions & 2 deletions config/cfg_luis_resp_fc_0.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -126,10 +126,10 @@ data:
worker_episode_bucket_size: 5
max_batch_size: 15000
max_cpu_evaluator: 1
max_cpu_inference_server: 2
max_cpu_inference_server: 6
max_cpu_log_dist_save_collect: 1
max_cpu_updater: 2
max_cpu_worker: 22
max_cpu_worker: 32
max_eval_per_worker: 74
merge_inference_update_gpu: false
net_cfg:
Expand Down
4 changes: 2 additions & 2 deletions config/cfg_luis_resp_fc_1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -126,10 +126,10 @@ data:
worker_episode_bucket_size: 5
max_batch_size: 15000
max_cpu_evaluator: 1
max_cpu_inference_server: 2
max_cpu_inference_server: 6
max_cpu_log_dist_save_collect: 1
max_cpu_updater: 2
max_cpu_worker: 22
max_cpu_worker: 32
max_eval_per_worker: 74
merge_inference_update_gpu: false
net_cfg:
Expand Down
4 changes: 2 additions & 2 deletions config/cfg_luis_resp_fc_2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -126,10 +126,10 @@ data:
worker_episode_bucket_size: 5
max_batch_size: 15000
max_cpu_evaluator: 1
max_cpu_inference_server: 2
max_cpu_inference_server: 6
max_cpu_log_dist_save_collect: 1
max_cpu_updater: 2
max_cpu_worker: 22
max_cpu_worker: 32
max_eval_per_worker: 74
merge_inference_update_gpu: false
net_cfg:
Expand Down
4 changes: 2 additions & 2 deletions config/cfg_luis_resp_fc_3.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -126,10 +126,10 @@ data:
worker_episode_bucket_size: 5
max_batch_size: 15000
max_cpu_evaluator: 1
max_cpu_inference_server: 2
max_cpu_inference_server: 6
max_cpu_log_dist_save_collect: 1
max_cpu_updater: 2
max_cpu_worker: 22
max_cpu_worker: 32
max_eval_per_worker: 74
merge_inference_update_gpu: false
net_cfg:
Expand Down
4 changes: 2 additions & 2 deletions config/cfg_luis_resp_fc_4.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -126,10 +126,10 @@ data:
worker_episode_bucket_size: 5
max_batch_size: 15000
max_cpu_evaluator: 1
max_cpu_inference_server: 2
max_cpu_inference_server: 6
max_cpu_log_dist_save_collect: 1
max_cpu_updater: 2
max_cpu_worker: 22
max_cpu_worker: 32
max_eval_per_worker: 74
merge_inference_update_gpu: false
net_cfg:
Expand Down
4 changes: 2 additions & 2 deletions scripts/training/generate_training_cfg_oc.py
Original file line number Diff line number Diff line change
Expand Up @@ -321,10 +321,10 @@ def generate_training_structured_configs():
only_generate_buffer=False,
restrict_cpu=True, # only works on LINUX
max_cpu_updater=2,
max_cpu_worker=22,
max_cpu_worker=32,
max_cpu_evaluator=1,
max_cpu_log_dist_save_collect=1,
max_cpu_inference_server=2,
max_cpu_inference_server=6,
temperature_input=temperature_input,
single_sbr_temperature=single_temperature,
compile_model=False,
Expand Down
7 changes: 4 additions & 3 deletions slurm/run_gpu_small.sh
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
#!/bin/bash
#SBATCH --job-name=gpu_small
#SBATCH --output=slurm-%j-out.txt
#SBATCH --output=slurm-%j-%a-out.txt
#SBATCH --time=24:00:00 # (HH:MM:SS)
#SBATCH --partition=tnt
#SBATCH --cpus-per-task=28
#SBATCH --cpus-per-task=42
#SBATCH --mem=100G
#SBATCH --verbose
#SBATCH --gres=gpu:rtx3090:2
#SBATCH --array=0
echo "Hier beginnt die Ausführung/Berechnung"
module load GCC/11.2.0
cd ..
srun -c 28 --gres=gpu:rtx3090:2 -v /bigwork/nhmlmahy/miniforge3/envs/albatross-env/bin/python start_training.py config=cfg_oc_proxy_luis_0 hydra.job.chdir=True
srun -c 28 --gres=gpu:rtx3090:2 -v /bigwork/nhmlmahy/miniforge3/envs/albatross-env/bin/python start_training.py config=cfg_luis_resp $SLURM_ARRAY_TASK_ID hydra.job.chdir=True

0 comments on commit 7ad6cb6

Please sign in to comment.