Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asynchronous prefetching of data #253

Closed
wants to merge 2 commits into from

Conversation

MaxiBoether
Copy link
Contributor

We add an additional thread to load data from the selector in order to hide the latency of data transfer from selector during training.

Solves #175.

@github-actions
Copy link

github-actions bot commented May 10, 2023

✅ Result of Pytest Coverage

---------- coverage: platform linux, python 3.11.3-final-0 -----------

Name Stmts Miss Cover
modyn/common/ftp/ftp_server.py 31 0 100%
modyn/common/ftp/ftp_utils.py 31 12 61%
modyn/database/abstract_database_connection.py 35 0 100%
modyn/database/partition_by_meta.py 33 12 64%
modyn/metadata_database/metadata_base.py 3 0 100%
modyn/metadata_database/metadata_database_connection.py 34 0 100%
modyn/metadata_database/models/pipelines.py 9 1 89%
modyn/metadata_database/models/sample_training_metadata.py 15 0 100%
modyn/metadata_database/models/selector_state_metadata.py 45 10 78%
modyn/metadata_database/models/trained_models.py 14 0 100%
modyn/metadata_database/models/trigger_partitions.py 10 0 100%
modyn/metadata_database/models/trigger_training_metadata.py 14 0 100%
modyn/metadata_database/models/triggers.py 10 0 100%
modyn/metadata_processor/internal/grpc/metadata_processor_grpc_servicer.py 18 0 100%
modyn/metadata_processor/internal/grpc/metadata_processor_server.py 24 0 100%
modyn/metadata_processor/internal/metadata_processor_manager.py 23 4 83%
modyn/metadata_processor/metadata_processor.py 11 0 100%
modyn/metadata_processor/metadata_processor_entrypoint.py 24 1 96%
modyn/metadata_processor/processor_strategies/abstract_processor_strategy.py 29 0 100%
modyn/metadata_processor/processor_strategies/basic_processor_strategy.py 17 2 88%
modyn/metadata_processor/processor_strategies/processor_strategy_type.py 6 1 83%
modyn/model_storage/internal/grpc/grpc_server.py 22 0 100%
modyn/model_storage/internal/grpc/model_storage_grpc_servicer.py 66 0 100%
modyn/model_storage/model_storage.py 24 5 79%
modyn/model_storage/model_storage_entrypoint.py 32 3 91%
modyn/models/dlrm/cuda_ext/dot_based_interact.py 24 13 46%
modyn/models/dlrm/dlrm.py 58 9 84%
modyn/models/dlrm/nn/embeddings.py 123 64 48%
modyn/models/dlrm/nn/factories.py 24 9 62%
modyn/models/dlrm/nn/interactions.py 50 11 78%
modyn/models/dlrm/nn/mlps.py 77 23 70%
modyn/models/dlrm/nn/parts.py 55 4 93%
modyn/models/dlrm/setup.py 5 5 0%
modyn/models/dlrm/utils/install_lib.py 11 7 36%
modyn/models/dlrm/utils/utils.py 28 0 100%
modyn/models/resnet18/resnet18.py 6 2 67%
modyn/selector/internal/grpc/selector_grpc_servicer.py 58 5 91%
modyn/selector/internal/grpc/selector_server.py 26 1 96%
modyn/selector/internal/selector_manager.py 87 26 70%
modyn/selector/internal/selector_strategies/abstract_downsample_strategy.py 74 8 89%
modyn/selector/internal/selector_strategies/abstract_selection_strategy.py 153 14 91%
modyn/selector/internal/selector_strategies/freshness_sampling_strategy.py 110 8 93%
modyn/selector/internal/selector_strategies/gradnorm_downsampling_strategy.py 6 2 67%
modyn/selector/internal/selector_strategies/loss_downsampling_strategy.py 6 0 100%
modyn/selector/internal/selector_strategies/new_data_strategy.py 90 6 93%
modyn/selector/internal/trigger_sample/trigger_sample_storage.py 76 3 96%
modyn/selector/selector.py 54 4 93%
modyn/selector/selector_entrypoint.py 24 1 96%
modyn/storage/internal/database/models/dataset.py 20 0 100%
modyn/storage/internal/database/models/file.py 17 0 100%
modyn/storage/internal/database/models/sample.py 44 7 84%
modyn/storage/internal/database/storage_base.py 3 0 100%
modyn/storage/internal/database/storage_database_connection.py 53 0 100%
modyn/storage/internal/database/storage_database_utils.py 21 0 100%
modyn/storage/internal/file_watcher/new_file_watcher.py 205 41 80%
modyn/storage/internal/file_watcher/new_file_watcher_watch_dog.py 59 9 85%
modyn/storage/internal/file_wrapper/abstract_file_wrapper.py 23 1 96%
modyn/storage/internal/file_wrapper/binary_file_wrapper.py 49 1 98%
modyn/storage/internal/file_wrapper/file_wrapper_type.py 7 0 100%
modyn/storage/internal/file_wrapper/single_sample_file_wrapper.py 48 2 96%
modyn/storage/internal/filesystem_wrapper/abstract_filesystem_wrapper.py 31 1 97%
modyn/storage/internal/filesystem_wrapper/filesystem_wrapper_type.py 6 0 100%
modyn/storage/internal/filesystem_wrapper/local_filesystem_wrapper.py 52 0 100%
modyn/storage/internal/grpc/grpc_server.py 20 0 100%
modyn/storage/internal/grpc/storage_grpc_servicer.py 123 10 92%
modyn/storage/storage.py 34 1 97%
modyn/storage/storage_entrypoint.py 24 1 96%
modyn/supervisor/entrypoint.py 39 5 87%
modyn/supervisor/internal/grpc_handler.py 222 48 78%
modyn/supervisor/internal/trigger.py 6 0 100%
modyn/supervisor/internal/triggers/amounttrigger.py 15 0 100%
modyn/supervisor/internal/triggers/timetrigger.py 27 1 96%
modyn/supervisor/supervisor.py 226 22 90%
modyn/tests/database/test_abstract_database_connection.py 19 0 100%
modyn/tests/metadata_database/models/test_pipelines.py 33 0 100%
modyn/tests/metadata_database/models/test_sample_training_metadata.py 40 0 100%
modyn/tests/metadata_database/models/test_selector_state_metadata.py 46 0 100%
modyn/tests/metadata_database/models/test_trained_models.py 46 0 100%
modyn/tests/metadata_database/models/test_trigger_training_metadata.py 38 0 100%
modyn/tests/metadata_database/models/test_triggers.py 33 0 100%
modyn/tests/metadata_database/test_metadata_database_connection.py 29 0 100%
modyn/tests/metadata_processor/internal/grpc/test_metadata_processor_grpc_servicer.py 26 0 100%
modyn/tests/metadata_processor/internal/grpc/test_metadata_processor_server.py 27 0 100%
modyn/tests/metadata_processor/internal/test_metadata_processor_manager.py 42 3 93%
modyn/tests/metadata_processor/processor_strategies/test_abstract_processor_strategy.py 60 0 100%
modyn/tests/metadata_processor/processor_strategies/test_basic_processor_strategy.py 43 0 100%
modyn/tests/metadata_processor/test_metadata_processor.py 22 3 86%
modyn/tests/metadata_processor/test_metadata_processor_entrypoint.py 22 0 100%
modyn/tests/model_storage/internal/grpc/test_model_storage_grpc_server.py 13 0 100%
modyn/tests/model_storage/internal/grpc/test_model_storage_grpc_servicer.py 78 1 99%
modyn/tests/model_storage/test_model_storage.py 35 5 86%
modyn/tests/model_storage/test_model_storage_entrypoint.py 22 0 100%
modyn/tests/models/test_dlrm.py 19 0 100%
modyn/tests/selector/internal/grpc/test_selector_grpc_servicer.py 114 0 100%
modyn/tests/selector/internal/grpc/test_selector_server.py 33 0 100%
modyn/tests/selector/internal/selector_strategies/test_abstract_downsample_strategy.py 255 0 100%
modyn/tests/selector/internal/selector_strategies/test_abstract_selection_strategy.py 184 0 100%
modyn/tests/selector/internal/selector_strategies/test_freshness_sampling_strategy.py 308 0 100%
modyn/tests/selector/internal/selector_strategies/test_loss_downsampler.py 32 0 100%
modyn/tests/selector/internal/selector_strategies/test_new_data_strategy.py 519 0 100%
modyn/tests/selector/internal/test_selector_manager.py 116 3 97%
modyn/tests/selector/internal/trigger_sample/test_trigger_sample_storage.py 176 0 100%
modyn/tests/selector/test_selector.py 84 3 96%
modyn/tests/selector/test_selector_entrypoint.py 22 0 100%
modyn/tests/storage/internal/database/models/test_dataset.py 47 0 100%
modyn/tests/storage/internal/database/models/test_file.py 64 0 100%
modyn/tests/storage/internal/database/models/test_sample.py 73 0 100%
modyn/tests/storage/internal/database/test_database_storage_utils.py 21 2 90%
modyn/tests/storage/internal/database/test_storage_database_connection.py 54 3 94%
modyn/tests/storage/internal/file_watcher/test_new_file_watcher.py 377 13 97%
modyn/tests/storage/internal/file_watcher/test_new_file_watcher_watch_dog.py 95 1 99%
modyn/tests/storage/internal/file_wrapper/test_binary_file_wrapper.py 92 0 100%
modyn/tests/storage/internal/file_wrapper/test_file_wrapper_type.py 6 1 83%
modyn/tests/storage/internal/file_wrapper/test_single_sample_file_wrapper.py 90 0 100%
modyn/tests/storage/internal/filesystem_wrapper/test_filesystem_wrapper_type.py 6 1 83%
modyn/tests/storage/internal/filesystem_wrapper/test_local_filesystem_wrapper.py 167 0 100%
modyn/tests/storage/internal/grpc/test_grpc_server.py 11 0 100%
modyn/tests/storage/internal/grpc/test_storage_grpc_servicer.py 239 3 99%
modyn/tests/storage/test_storage.py 42 1 98%
modyn/tests/storage/test_storage_entrypoint.py 21 0 100%
modyn/tests/supervisor/internal/test_grpc_handler.py 259 8 97%
modyn/tests/supervisor/internal/test_trigger.py 5 0 100%
modyn/tests/supervisor/internal/triggers/test_amounttrigger.py 30 0 100%
modyn/tests/supervisor/internal/triggers/test_timetrigger.py 26 0 100%
modyn/tests/supervisor/test_entrypoint.py 29 0 100%
modyn/tests/supervisor/test_supervisor.py 345 1 99%
modyn/tests/trainer_server/internal/data/test_data_utils.py 19 1 95%
modyn/tests/trainer_server/internal/data/test_online_dataset.py 232 3 99%
modyn/tests/trainer_server/internal/grpc/test_trainer_server_grpc_server.py 15 0 100%
modyn/tests/trainer_server/internal/grpc/test_trainer_server_grpc_servicer.py 314 7 98%
modyn/tests/trainer_server/internal/metadata_collector/test_metadata_collector.py 41 0 100%
modyn/tests/trainer_server/internal/trainer/metadata_pytorch_callbacks/test_loss_callback.py 51 1 98%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_gradnorm_downsample.py 86 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_loss_downsample.py 78 0 100%
modyn/tests/trainer_server/internal/trainer/test_pytorch_trainer.py 356 43 88%
modyn/tests/trainer_server/test_trainer_server.py 34 0 100%
modyn/tests/trainer_server/test_trainer_server_entrypoint.py 22 0 100%
modyn/tests/utils/test_utils.py 55 0 100%
modyn/trainer_server/custom_lr_schedulers/dlrm_lr_scheduler/dlrm_scheduler.py 33 33 0%
modyn/trainer_server/internal/dataset/data_utils.py 12 0 100%
modyn/trainer_server/internal/dataset/online_dataset.py 133 7 95%
modyn/trainer_server/internal/grpc/trainer_server_grpc_server.py 22 0 100%
modyn/trainer_server/internal/grpc/trainer_server_grpc_servicer.py 152 15 90%
modyn/trainer_server/internal/metadata_collector/metadata_collector.py 33 0 100%
modyn/trainer_server/internal/mocks/mock_metadata_processor.py 22 2 91%
modyn/trainer_server/internal/trainer/metadata_pytorch_callbacks/base_callback.py 15 1 93%
modyn/trainer_server/internal/trainer/metadata_pytorch_callbacks/loss_callback.py 21 0 100%
modyn/trainer_server/internal/trainer/pytorch_trainer.py 233 38 84%
modyn/trainer_server/internal/trainer/remote_downsamplers/abstract_remote_downsample_strategy.py 24 1 96%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_gradnorm_downsample.py 18 0 100%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_loss_downsample.py 11 0 100%
modyn/trainer_server/internal/utils/metric_type.py 3 0 100%
modyn/trainer_server/internal/utils/trainer_messages.py 4 0 100%
modyn/trainer_server/internal/utils/training_info.py 39 0 100%
modyn/trainer_server/internal/utils/training_process_info.py 7 0 100%
modyn/trainer_server/trainer_server.py 19 0 100%
modyn/trainer_server/trainer_server_entrypoint.py 32 3 91%
modyn/utils/utils.py 71 7 90%
TOTAL 10021 640 94%
Coverage HTML written to
================== 462 passed, 7

@MaxiBoether MaxiBoether self-assigned this May 10, 2023
Copy link
Contributor

@fotstrt fotstrt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Copy link
Collaborator

@francescodeaglio francescodeaglio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@MaxiBoether
Copy link
Contributor Author

Note: This PR is on hold until we can measure the impact of this. We cannot be sure the current implementation here helps since we use multithreading. It should help, since after yield in the generator, we should be able to switch in the prefetch thread in the worker process, but who in the world understands the intricacies of Python multithreading

@MaxiBoether
Copy link
Contributor Author

Closing due to #301

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants