Updated storage and training files to enable text generation tasks. A… #640

sjohn4 · 2024-12-11T16:13:55Z

Made minor updates to the environment and ensured everything passes the standard Python compliance tests, maintaining compatibility with Modyn's existing functionality. Added GPT-2 as a model that can be used along with the GPT-2 tokenizer. Introduced the generative flag to the Modyn pipeline, enabling it to distinguish between generative and classification tasks. Also updated some Python test files to incorporate this flag, ensuring no errors are thrown.

…dditionally, made minor updates to the environment and ensured everything passes the standard Python compliance tests, maintaining compatibility with Modyn's existing functionality.

codecov · 2024-12-11T16:30:53Z

Codecov Report

Attention: Patch coverage is 27.29592% with 570 lines in your changes missing coverage. Please review.

Project coverage is 81.72%. Comparing base (b63795f) to head (c08b24e).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
modyn/models/GPT2/GPT2_Model_LoRA.py	17.09%	417 Missing ⚠️
.../trainer_server/internal/dataset/online_dataset.py	46.45%	83 Missing ⚠️
modyn/models/GPT2/GPT2.py	33.73%	55 Missing ⚠️
modyn/models/tokenizers/gpt2_tokenizer.py	38.46%	8 Missing ⚠️
...trainer_server/internal/trainer/pytorch_trainer.py	73.07%	7 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #640      +/-   ##
==========================================
- Coverage   85.58%   81.72%   -3.86%     
==========================================
  Files         258      261       +3     
  Lines       11377    12089     +712     
==========================================
+ Hits         9737     9880     +143     
- Misses       1640     2209     +569

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

MaxiBoether

I briefly skimmed the PR. Thanks for the work! The biggest feedback right now would be that we should not have parallel codepaths which are mostly copy/paste of existing ones but without labels. Rather, let's have clean code where we move labels to be optional, both in the c++/protobuf/python worlds. And before I do the proper review, would be great if you could check the diff for changes that should not go onto main :)

Thanks!

MaxiBoether · 2024-12-13T14:40:29Z

environment.yml

@@ -30,7 +30,7 @@ dependencies:
  - psycopg2
  - sqlalchemy>=2.0
  - pyaml
-  - pydantic
+  - pydantic==2.9.2


This should be solved with the latest commit on main

MaxiBoether · 2024-12-13T14:40:41Z

environment.yml

-#  - nvidia::cuda-libraries-dev=12.1.*
-#  - nvidia::cuda-nvcc=12.1.*
-#  - nvidia::cuda-nvtx=12.1.*
-#  - nvidia::cuda-cupti=12.1.*
-#  - nvidia::cuda-cudart-dev=12.1.*
-#  - nvidia::cuda-profiler-api=12.1.*
+  - pytorch::pytorch-cuda=12.1
+  - nvidia::cuda-libraries-dev=12.1.*
+  - nvidia::cuda-nvcc=12.1.*
+  - nvidia::cuda-nvtx=12.1.*
+  - nvidia::cuda-cupti=12.1.*
+  - nvidia::cuda-cudart-dev=12.1.*
+  - nvidia::cuda-profiler-api==12.1.*


should be commented out. no changes here should be necessary

MaxiBoether · 2024-12-13T14:40:49Z

modyn/config/examples/modyn_config.yaml

-  ignore_existing_trigger_samples: false
+  ignore_existing_trigger_samples: true


let's not merge this

MaxiBoether · 2024-12-13T14:41:39Z

modyn/config/schema/system/config.py

@@ -255,7 +255,7 @@ class SelectorConfig(HostnamePortMixin):
        ),
    )
    ignore_existing_trigger_samples: bool = Field(
-        False,


also shouldnt merge that

MaxiBoether · 2024-12-13T14:42:38Z

modyn/models/GPT2/GPT2.py

@@ -0,0 +1,365 @@
+#plimport pytorch_lightning as pl


this file needs to be cleaned before merging it. remove all unnecessary comments. there should only be necessary functions, e.g. there should not need to be the need for evaluation code in here

MaxiBoether · 2024-12-13T14:43:51Z

modyn/storage/include/internal/grpc/draft.hpp

@@ -0,0 +1,152 @@
+//""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""


draft file?

MaxiBoether · 2024-12-13T14:44:36Z

modyn/storage/include/internal/grpc/storage_service_impl.hpp

  Status Get(ServerContext* context, const modyn::storage::GetRequest* request,
             ServerWriter<modyn::storage::GetResponse>* writer) override;
+  Status GetNL(ServerContext* context, const modyn::storage::GetRequest* request,


similar to my comment on the proto file, let's not introduce second codepaths as much as possible, because this seems to replicate a lot of logic. rather, let's add labels if we have them in the orgiinal code path

MaxiBoether · 2024-12-13T14:45:09Z

modyn/storage/include/internal/grpc/storage_service_impl.hpp

@@ -74,6 +74,8 @@ class StorageServiceImpl final : public modyn::storage::Storage::Service {

  Status Get(ServerContext* context, const modyn::storage::GetRequest* request,
             ServerWriter<modyn::storage::GetResponse>* writer) override;
+  Status GetNL(ServerContext* context, const modyn::storage::GetRequest* request,
+             ServerWriter<modyn::storage::GetResponseNoLabels>* writer) override;


also i think this file in general needs to be cleaned

MaxiBoether · 2024-12-13T14:45:50Z

modyn/trainer_server/internal/dataset/online_dataset.py

@@ -409,8 +626,45 @@ def _fetch_partition_noprefetch(
                container["weights"][idx],
            )

+    def _fetch_partition_noprefetch_generative(  # based on the one above


similar comments here. I would propose to refactor the PR such that we don't have parallel codepaths for generative

MaxiBoether · 2024-12-13T14:46:04Z

modyn/trainer_server/internal/utils/training_info.py

@@ -57,7 +57,8 @@ def __init__(

        self.shuffle = request.shuffle
        self.enable_accurate_gpu_measurements = request.enable_accurate_gpu_measurements
-
+        self.generative=request.generative
+        print(f"generetive:{self.generative}")


Updated storage and training files to enable text generation tasks. A…

c08b24e

…dditionally, made minor updates to the environment and ensured everything passes the standard Python compliance tests, maintaining compatibility with Modyn's existing functionality.

MaxiBoether requested changes Dec 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated storage and training files to enable text generation tasks. A… #640

Updated storage and training files to enable text generation tasks. A… #640

sjohn4 commented Dec 11, 2024

codecov bot commented Dec 11, 2024

MaxiBoether left a comment

MaxiBoether Dec 13, 2024

MaxiBoether Dec 13, 2024

MaxiBoether Dec 13, 2024

MaxiBoether Dec 13, 2024

MaxiBoether Dec 13, 2024

MaxiBoether Dec 13, 2024

MaxiBoether Dec 13, 2024

MaxiBoether Dec 13, 2024

MaxiBoether Dec 13, 2024

MaxiBoether Dec 13, 2024

		ignore_existing_trigger_samples: false
		ignore_existing_trigger_samples: true

		@@ -0,0 +1,152 @@
		//""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

Updated storage and training files to enable text generation tasks. A… #640

Are you sure you want to change the base?

Updated storage and training files to enable text generation tasks. A… #640

Conversation

sjohn4 commented Dec 11, 2024

codecov bot commented Dec 11, 2024

Codecov Report

MaxiBoether left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment