Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Model Management Policies #295

Merged

Conversation

robin-oester
Copy link
Collaborator

Currently, we use a naïve approach in order to store our trained models in the model storage. We want to have the possibility to do this more efficiently via pluggable policies. In particular, we want to support the following:

  • store first model of a pipeline with all layer information and other metadata
  • store subsequent models more efficiently by just storing weights/biases
  • implement other approach, using xor-difference between successive models and storing float exponent bits separately
  • create infrastructure to make these approaches pluggable and modularly expandable
  • make approaches configurable (with additional options) in pipeline config

This should enable us to examine basic research questions in multi-model management. Corresponding issues: #293, #265

* create model storage manager to handle the new strategies
* create 2 full model strategies (naive implementation and compressed version thereof) and one incremental strategy (weights difference strategy) with 2 difference operators (sub and xor)
* enabled option to verify checksum of downloaded files
* add model storage strategy as part of pipeline to the metadata database
* rework of model storage component
* separate metadata from model information in model storage
* add method to (un-)zip files
* add various unit tests
…293-implement_model_mgmt_policies

# Conflicts:
#	modyn/protos/trainer_server.proto
#	modyn/trainer_server/internal/grpc/generated/trainer_server_pb2.py
#	modyn/trainer_server/internal/grpc/generated/trainer_server_pb2.pyi
* solve merge conflicts and edit pipeline schema
@github-actions

This comment was marked as outdated.

…293-implement_model_mgmt_policies

# Conflicts:
#	modyn/config/examples/example-pipeline.yaml
#	modyn/config/schema/pipeline-schema.yaml
Copy link
Contributor

@MaxiBoether MaxiBoether left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Puh, this is quite a beast to review :D Thanks for the hard work. I did a very first round. In this round, I focused on the integration of the strategies into the system. I did not take a look at all the operators/differences just yet. I will do that when we align on the integration itself.

I think my biggest concern currently is the "ModelStorageManager". I don't understand its implementation really and in general we should try to avoid instantiating the model unless it is necessary (in the very basic setting, where we just do nothing except storing full model, it should not be necessary). But it was hard for me to review that class since I was not able to follow the control flow. See the individual comments for details :) We can chat in the comments here and continue the dicussion in our in person meeting back in Zurich

modyn/config/schema/pipeline-schema.yaml Show resolved Hide resolved
modyn/config/schema/pipeline-schema.yaml Show resolved Hide resolved
modyn/config/schema/pipeline-schema.yaml Outdated Show resolved Hide resolved
modyn/config/schema/pipeline-schema.yaml Outdated Show resolved Hide resolved
modyn/config/schema/pipeline-schema.yaml Show resolved Hide resolved
modyn/model_storage/internal/model_storage_manager.py Outdated Show resolved Hide resolved
modyn/model_storage/internal/model_storage_manager.py Outdated Show resolved Hide resolved
modyn/model_storage/internal/model_storage_manager.py Outdated Show resolved Hide resolved
…293-implement_model_mgmt_policies

# Conflicts:
#	modyn/selector/internal/grpc/generated/selector_pb2.py
#	modyn/selector/internal/grpc/selector_grpc_servicer.py
#	modyn/tests/trainer_server/internal/grpc/test_trainer_server_grpc_servicer.py
#	modyn/trainer_server/internal/grpc/generated/trainer_server_pb2.py
* make metadata required
* rename model id to model class name to mitigate confusion
* make model storage folder configurable
* add documentation
* fix grpc test
* fix integration tests
Copy link
Contributor

@MaxiBoether MaxiBoether left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much! Looking very good already. I think one more round and we are good

modyn/config/schema/pipeline-schema.yaml Show resolved Hide resolved
benchmark/mnist/mnist.yaml Show resolved Hide resolved
modyn/metadata_database/metadata_database_connection.py Outdated Show resolved Hide resolved
modyn/common/ftp/ftp_utils.py Show resolved Hide resolved
modyn/model_storage/model_storage.py Outdated Show resolved Hide resolved
* fix bug in supervisor
* remove data types class
* add comprehensive docstrings for model storage manager
* improve testing of model storage components
* test end-to-end pipeline execution
* make pylint happy
* change ArticleNet to implement all abstract methods
…293-implement_model_mgmt_policies

# Conflicts:
#	modyn/trainer_server/internal/trainer/remote_downsamplers/remote_gradnorm_downsampling.py
Copy link
Contributor

@MaxiBoether MaxiBoether left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. Almost there. Thank you so much for your work.

  1. The biggest unclear thing is still the parent ID discussion. Maybe my text makes it a bit more clear why I am confused. Then, we can look for a solution. We can also have a call about this.

  2. I had a small comment on the base model state.

  3. We need to merge main, unfortunately, they are a few conflicts... Your PR will be the next to be merged, so this will be the last time.

Thanks for the thorough tests and documentation, this is looking great.

modyn/model_storage/internal/model_storage_manager.py Outdated Show resolved Hide resolved
modyn/model_storage/internal/model_storage_manager.py Outdated Show resolved Hide resolved
modyn/models/articlenet/articlenet.py Outdated Show resolved Hide resolved
modyn/supervisor/internal/grpc_handler.py Show resolved Hide resolved
…293-implement_model_mgmt_policies

# Conflicts:
#	integrationtests/model_storage/integrationtest_model_storage.py
#	integrationtests/test_docker_compose.py
#	modyn/metadata_database/metadata_database_connection.py
#	modyn/metadata_database/models/pipelines.py
#	modyn/protos/trainer_server.proto
#	modyn/selector/internal/grpc/selector_grpc_servicer.py
#	modyn/selector/internal/selector_manager.py
#	modyn/tests/metadata_database/models/test_pipelines.py
#	modyn/tests/metadata_database/test_metadata_database_connection.py
#	modyn/tests/model_storage/internal/grpc/test_model_storage_grpc_servicer.py
#	modyn/tests/selector/internal/test_selector_manager.py
#	modyn/trainer_server/internal/grpc/generated/trainer_server_pb2.py
#	modyn/trainer_server/internal/grpc/generated/trainer_server_pb2.pyi
- Return model state after loading
- add comments to determine_parent_model_id
Copy link
Contributor

@MaxiBoether MaxiBoether left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aand we're done. Thank you so much again, congrats for submitting your thesis and all the best!

@robin-oester robin-oester merged commit 98bd787 into main Oct 25, 2023
8 checks passed
@XianzheMa XianzheMa deleted the robin-oester/feature/#293-implement_model_mgmt_policies branch July 3, 2024 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request model-storage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Model Management Policies Validate whether transferred model equals local model
2 participants