-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Model Management Policies #295
Implement Model Management Policies #295
Conversation
* create model storage manager to handle the new strategies * create 2 full model strategies (naive implementation and compressed version thereof) and one incremental strategy (weights difference strategy) with 2 difference operators (sub and xor) * enabled option to verify checksum of downloaded files * add model storage strategy as part of pipeline to the metadata database * rework of model storage component * separate metadata from model information in model storage * add method to (un-)zip files * add various unit tests
…293-implement_model_mgmt_policies # Conflicts: # modyn/protos/trainer_server.proto # modyn/trainer_server/internal/grpc/generated/trainer_server_pb2.py # modyn/trainer_server/internal/grpc/generated/trainer_server_pb2.pyi
* solve merge conflicts and edit pipeline schema
This comment was marked as outdated.
This comment was marked as outdated.
…293-implement_model_mgmt_policies # Conflicts: # modyn/config/examples/example-pipeline.yaml # modyn/config/schema/pipeline-schema.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Puh, this is quite a beast to review :D Thanks for the hard work. I did a very first round. In this round, I focused on the integration of the strategies into the system. I did not take a look at all the operators/differences just yet. I will do that when we align on the integration itself.
I think my biggest concern currently is the "ModelStorageManager". I don't understand its implementation really and in general we should try to avoid instantiating the model unless it is necessary (in the very basic setting, where we just do nothing except storing full model, it should not be necessary). But it was hard for me to review that class since I was not able to follow the control flow. See the individual comments for details :) We can chat in the comments here and continue the dicussion in our in person meeting back in Zurich
…293-implement_model_mgmt_policies # Conflicts: # modyn/selector/internal/grpc/generated/selector_pb2.py # modyn/selector/internal/grpc/selector_grpc_servicer.py # modyn/tests/trainer_server/internal/grpc/test_trainer_server_grpc_servicer.py # modyn/trainer_server/internal/grpc/generated/trainer_server_pb2.py
* make metadata required * rename model id to model class name to mitigate confusion * make model storage folder configurable * add documentation
* fix integration tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much! Looking very good already. I think one more round and we are good
integrationtests/model_storage/integrationtest_model_storage.py
Outdated
Show resolved
Hide resolved
...s/model_storage/internal/storage_strategies/full_model_strategies/test_pytorch_full_model.py
Outdated
Show resolved
Hide resolved
modyn/trainer_server/internal/grpc/trainer_server_grpc_servicer.py
Outdated
Show resolved
Hide resolved
* fix bug in supervisor * remove data types class * add comprehensive docstrings for model storage manager * improve testing of model storage components * test end-to-end pipeline execution
* from 60 to 90 seconds
…293-implement_model_mgmt_policies # Conflicts: # modyn/trainer_server/internal/trainer/remote_downsamplers/remote_gradnorm_downsampling.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. Almost there. Thank you so much for your work.
-
The biggest unclear thing is still the parent ID discussion. Maybe my text makes it a bit more clear why I am confused. Then, we can look for a solution. We can also have a call about this.
-
I had a small comment on the base model state.
-
We need to merge main, unfortunately, they are a few conflicts... Your PR will be the next to be merged, so this will be the last time.
Thanks for the thorough tests and documentation, this is looking great.
modyn/trainer_server/internal/grpc/trainer_server_grpc_servicer.py
Outdated
Show resolved
Hide resolved
…293-implement_model_mgmt_policies # Conflicts: # integrationtests/model_storage/integrationtest_model_storage.py # integrationtests/test_docker_compose.py # modyn/metadata_database/metadata_database_connection.py # modyn/metadata_database/models/pipelines.py # modyn/protos/trainer_server.proto # modyn/selector/internal/grpc/selector_grpc_servicer.py # modyn/selector/internal/selector_manager.py # modyn/tests/metadata_database/models/test_pipelines.py # modyn/tests/metadata_database/test_metadata_database_connection.py # modyn/tests/model_storage/internal/grpc/test_model_storage_grpc_servicer.py # modyn/tests/selector/internal/test_selector_manager.py # modyn/trainer_server/internal/grpc/generated/trainer_server_pb2.py # modyn/trainer_server/internal/grpc/generated/trainer_server_pb2.pyi
- Return model state after loading
- add comments to determine_parent_model_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aand we're done. Thank you so much again, congrats for submitting your thesis and all the best!
Currently, we use a naïve approach in order to store our trained models in the model storage. We want to have the possibility to do this more efficiently via pluggable policies. In particular, we want to support the following:
This should enable us to examine basic research questions in multi-model management. Corresponding issues: #293, #265