Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated storage and training files to enable text generation tasks. A… #640

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions benchmark/mnist/mnist.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ model_storage:
training:
gpus: 1
device: "cuda:0"
generative: False
dataloader_workers: 2
use_previous_model: True
initial_model: random
Expand Down
1 change: 1 addition & 0 deletions benchmark/wildtime_benchmarks/example_pipelines/arxiv.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ training:
initial_model: random
batch_size: 128
shuffle: True
generative: False
optimizers:
- name: "default"
algorithm: "SGD"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ training:
gpus: 1
device: "cuda:0"
dataloader_workers: 2
generative: False
use_previous_model: True
initial_model: random
batch_size: 96
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ training:
gpus: 1
device: "cuda:0"
dataloader_workers: 2
generative: False
use_previous_model: True
initial_model: random
batch_size: 64
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ training:
gpus: 1
device: "cuda:0"
dataloader_workers: 2
generative: False
use_previous_model: True
initial_model: random
batch_size: 64
Expand Down
2 changes: 2 additions & 0 deletions benchmark/wildtime_benchmarks/example_pipelines/fmow.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ training:
gpus: 1
device: "cuda:0"
dataloader_workers: 2
generative: False

use_previous_model: True
initial_model: random
batch_size: 64
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ model_storage:
training:
gpus: 1
device: "cuda:0"
generative: False
dataloader_workers: 2
use_previous_model: True
initial_model: random
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ training:
gpus: 1
device: "cuda:0"
dataloader_workers: 2
generative: False
use_previous_model: True
initial_model: random
batch_size: 64
Expand Down
17 changes: 8 additions & 9 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ dependencies:
- psycopg2
- sqlalchemy>=2.0
- pyaml
- pydantic
- pydantic==2.9.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be solved with the latest commit on main

- numpy==1.26.*
- pandas
- bitstring
Expand All @@ -43,11 +43,10 @@ dependencies:
- nltk
- pytorch::pytorch=2.2.1
- pytorch::torchvision
- pytorch::cpuonly # comment out if commenting in lines below for CUDA
# - pytorch::pytorch-cuda=12.1
# - nvidia::cuda-libraries-dev=12.1.*
# - nvidia::cuda-nvcc=12.1.*
# - nvidia::cuda-nvtx=12.1.*
# - nvidia::cuda-cupti=12.1.*
# - nvidia::cuda-cudart-dev=12.1.*
# - nvidia::cuda-profiler-api=12.1.*
- pytorch::pytorch-cuda=12.1
- nvidia::cuda-libraries-dev=12.1.*
- nvidia::cuda-nvcc=12.1.*
- nvidia::cuda-nvtx=12.1.*
- nvidia::cuda-cupti=12.1.*
- nvidia::cuda-cudart-dev=12.1.*
- nvidia::cuda-profiler-api==12.1.*
Comment on lines -48 to +52
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be commented out. no changes here should be necessary

1 change: 1 addition & 0 deletions integrationtests/config/dummy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ model_storage:
training:
gpus: 1
device: "cpu"
generative: False
dataloader_workers: 1
use_previous_model: True
initial_model: random
Expand Down
4 changes: 3 additions & 1 deletion integrationtests/config/rho_loss.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ training:
gpus: 1
device: "cpu"
dataloader_workers: 2
generative: False
use_previous_model: False
initial_model: random
batch_size: 4
Expand Down Expand Up @@ -60,6 +61,7 @@ selection_strategy:
il_model_config:
num_classes: 10
device: "cpu"
generative: False
dataloader_workers: 1
use_previous_model: False
batch_size: 2
Expand All @@ -75,4 +77,4 @@ selection_strategy:
lr: 0.1
momentum: 0.001
optimization_criterion:
name: "CrossEntropyLoss"
name: "CrossEntropyLoss"
1 change: 1 addition & 0 deletions modyn/common/grpc/grpc_helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -251,6 +251,7 @@ def prepare_start_training_request(
enable_accurate_gpu_measurements=training_config.enable_accurate_gpu_measurements,
record_loss_every=training_config.record_loss_every,
drop_last_batch=training_config.drop_last_batch,
generative=training_config.generative,
)

def start_training(
Expand Down
2 changes: 1 addition & 1 deletion modyn/config/examples/modyn_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -278,7 +278,7 @@ selector:
local_storage_directory: "/tmp/local_storage"
local_storage_max_samples_in_file: 1000000
cleanup_storage_directories_after_shutdown: true
ignore_existing_trigger_samples: false
ignore_existing_trigger_samples: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not merge this


trainer_server:
hostname: "trainer_server"
Expand Down
5 changes: 5 additions & 0 deletions modyn/config/schema/pipeline/training/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,11 @@ class TrainingConfig(ModynBaseModel):
"we start with random weights. If initial_model is 'pretrained', cannot be False."
)
)
generative: bool = Field(False,
description=(
"If True then, then the training pipeline goes into the generative branch, data is sampled without expecting labels."
)
)
seed: int | None = Field(
None,
description=(
Expand Down
2 changes: 1 addition & 1 deletion modyn/config/schema/system/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,7 @@ class SelectorConfig(HostnamePortMixin):
),
)
ignore_existing_trigger_samples: bool = Field(
False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also shouldnt merge that

True,
description=(
"Whether to ignore existing trigger samples when starting the selector. If set to false, the trigger "
"sample directory has to be empty upon startup. May lead to unexpected behaviour if set to true and the "
Expand Down
Loading
Loading