Skip to content

Commit

Permalink
modify statement of proposal
Browse files Browse the repository at this point in the history
Signed-off-by: wyoung1 <[email protected]>
  • Loading branch information
wyoung1 committed Aug 29, 2024
1 parent 225ce92 commit a6e3ccf
Show file tree
Hide file tree
Showing 4 changed files with 21 additions and 33 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
__pycache__/
*.py[cod]
*$py.class
dataset/
initial_model/

# C extensions
*.so
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,17 @@ _Heterogeneous Multi-Edge Collaborative Neural Network Inference for High Mobili

The scope of the system includes:

1. Encapsulating the capabilities of automatic computation graph partitioning into sub-module, and providing them as extended options in the job configuration file for users to customize (users can also choose the existing manual partitioning capabilities), seamlessly integrating with the existing multi-edge inference workflow;
1. Encapsulating the capabilities of automatic computation graph partitioning into function, and providing them as extended options for users to customize, seamlessly integrating with the existing multi-edge inference workflow;
2. Providing a multi-edge inference benchmarking job in a high-mobility scenario (such as edge-side LLM inference and image recognition, etc.) to verify the effectiveness and benefits of the automatic partitioning module.
3. Adding judgments to the multi-edge inference paradigm process has provided significant scalability for the partitioning algorithm. If the user has implemented a custom partition function, the user-defined partitioning algorithm is called first.

Targeting users include:

1. Beginners: Familiarize with distributed synergy AI and multi-edge inference, among other concepts.
2. Developers: Quickly integrate multi-edge inference algorithms into other development environments such as Sedna and test the performance for further optimization.
# Design Details
## Process Design
Firstly, taking the existing tracking_job and reid_job as examples, analyze the workflow of the two benchmarking jobs, clarify the function call logic of Ianvs, determine the writing position of configuration information and the insertion position of the automatic module, to ensure high cohesion and low coupling of the overall code. The workflow starts from the main() function in the benchmarking.py file (located in the ianvs/core directory), which reads the user's configuration file reid_job.yaml and creates a BenchmarkingJob. This process parses the configuration parameters of the yaml file and creates instances of classes such as TestEnv, Rank, Simulation, and TestCaseController that match the configuration description.
Firstly, taking the existing tracking_job and reid_job as examples, analyze the workflow of the two benchmarking jobs, clarify the function call logic of Ianvs, determine the writing position of configuration information and the insertion position of the partition function, to ensure high cohesion and low coupling of the overall code. The workflow starts from the main() function in the benchmarking.py file (located in the ianvs/core directory), which reads the user's configuration file reid_job.yaml and creates a BenchmarkingJob. This process parses the configuration parameters of the yaml file and creates instances of classes such as TestEnv, Rank, Simulation, and TestCaseController that match the configuration description.

Subsequently, the run() method of the BenchmarkingJob instance is called, using the build_testcases() method of the TestCaseController instance to create test cases. This step is actually parsing the algorithm configuration specified by _test_object.url_ in the reid_job.yaml file and creating instances of Algorithm and TestCase that match the algorithm configuration description. Then, the run_testcases() method of the TestCaseController instance is called, which ultimately calls the run() method of the corresponding algorithm paradigm, such as the run() method of the MultiedgeInference class instance in this case.

Expand Down Expand Up @@ -54,7 +55,7 @@ def load(self, model_url=None):
The overall process is illustrated in the following diagram:
![old process](images/old_process.png)

Based on the above process analysis, we find that the existing multi-edge inference benchmarking job only uses Ianvs to create and manage test cases, where the core algorithmic processes such as multi-device parallelism and model partitioning are left to the user to implement. It is also worth mentioning that the nn.DataParallel(self.model) used in this case only achieves data parallelism, and for scenarios with low computing power on the edge and large models, relying solely on data parallelism is obviously insufficient to support edge inference needs. Therefore, this project needs to implement model parallel capabilities based on model partitioning and encapsulate these capabilities (partitioning and scheduling) into an independent sub-module, separated from the user's code, as an optional feature in the multiedge_inference paradigm supported by Ianvs, allowing users to choose whether to use automatic partitioning or manual partitioning in the configuration information (algorithm.yaml).
Based on the above process analysis, we find that the existing multi-edge inference benchmarking job only uses Ianvs to create and manage test cases, where the core algorithmic processes such as multi-device parallelism and model partitioning are left to the user to implement. It is also worth mentioning that the nn.DataParallel(self.model) used in this case only achieves data parallelism, and for scenarios with low computing power on the edge and large models, relying solely on data parallelism is obviously insufficient to support edge inference needs. Therefore, this project needs to implement model parallel capabilities based on model partitioning and encapsulate these capabilities (partitioning and scheduling) into an function, separated from the user's code, as an optional feature in the multiedge_inference paradigm supported by Ianvs.

## Module Design and Code Integration
From the above process analysis, it is known that to provide automatic graph partitioning and scheduling capabilities within the Ianvs framework, the optimal code integration point is in the Algorithm Paradigm module of the Test Case Controller component, specifically in the directory core/testcasecontroller/algorithm/paradigm. The current structure of this directory is:
Expand All @@ -79,45 +80,32 @@ paradigm
└── singletask_learning_tta.py
```

Based on the process analysis, this project intends to add a utility file named partition.py under the multiedge_inference paradigm, and implement automatic graph partitioning and scheduling capabilities within it, which specifically includes:
Based on the process analysis, this project intends to add a _partition function under the multiedge_inference paradigm, and implement our computation graph partitioning and scheduling capabilities within it. The total process should include:

- Input: Initial model data and the user-declared devices.yaml file, which contains the number of heterogeneous devices the user simulates on a single machine, information about each device (such as GPU memory, number of GPUs, etc.), as well as communication bandwidth between devices.

- Parsing: The user-declared devices.yaml file is parsed to obtain device data, and the initial model computation graph is parsed to obtain model data.

- Modeling: Joint analysis of the parsed device data and model data is performed to enable the algorithm to calculate a matching list of devices and computational subgraphs.
- Modeling(optional): Joint analysis of the parsed device data and model data is performed to enable the algorithm to calculate a matching list of devices and computational subgraphs.

- Partitioning: The model is partitioned based on the decided computational subgraphs.

- Output: The matching list of devices and computational subgraphs, as well as the partitioned computational subgraphs.

It is worth noting that we have implemented a general interface and a simple partitioning algorithm here. More partitioning algorithms will be added to this file in the future, they only need to comply with the input and output specifications defined by the interface and user can also provide custom partitioning algorithm by adhering to the interface specifications.
It is worth noting that we have implemented a general interface and a simple partitioning algorithm here (by analyzing the partitioning points specified by the user). More partitioning algorithms will be added in the future and user can customize their own partition methods in basemodel.py, they only need to comply with the input and output specifications defined by the interface as follows:

Subsequently, modify the logic in multiedge_inference.py to decide whether to use the auto partitioning algorithm in partition.py based on the "whether to choose automatic partitioning" information declared by the user in the configuration file. If it is chosen to use, pass url of initial_model and devices.yaml into the automatic partitioning algorithm and then pass the returned matching list of devices and computational subgraphs as well as the partitioned computational subgraphs to the user's code.
```
def partiton(self, initial_model):
## 1. parsing devices.yaml
## 2. modeling
## 3. partition
return models_dir, map_info
```

Subsequently, modify the logic in multiedge_inference.py to decide whether to use the auto partitioning capability based on user's code. If it is chosen to use, pass url of initial_model and key information of devices.yaml into the automatic partitioning algorithm and then pass the returned matching list of devices and computational subgraphs as well as the partitioned computational subgraphs to the user's code.

Further, provide the load method of the BaseModel class in the benchmarking job to receive these parameters and use them to complete the multi-inference process.

The modified code structure is as follows:
```
paradigm
├── __init__.py
├── base.py
├── incremental_learning
│ ├── __init__.py
│ └── incremental_learning.py
├── lifelong_learning
│ ├── __init__.py
│ └── lifelong_learning.py
├── multiedge_inference
│ ├── __init__.py
│ ├── multiedge_inference.py
│ └── partition.py
└── singletask_learning
├── __init__.py
├── singletask_learning.py
├── singletask_learning_active_boost.py
└── singletask_learning_tta.py
```

At the same time, the corresponding multi-edge inference benchmarking job for high-mobility scenarios will be provided in the _examples_ folder.

Expand All @@ -128,13 +116,11 @@ In conjunction with the process design, the newly added automatic partitioning m
![image](images/partition_method.png)
We implement the heterogeneous neural network multi-edge collaborative inference for high-mobility scenarios using the method shown in the above figure.

First, the user-declared devices.yaml file is parsed to obtain device data, and the initial model computation graph is parsed to obtain model data.

Second, we model the devices and model computational tasks through Task-Device Co-analyze (TDC). We offer a simple algorithm here by calculating the feasibility and cost of the corresponding partitioning based on the device's GPU memory and the communication bandwidth between devices, and then select the most suitable matching list.
First, the user-declared devices.yaml file is parsed to obtain device data, our current partitioning logic is based on the partitioning points declared in the file for partitioning and scheduling. However, considering further updates to the partitioning algorithm in the future (such as joint modeling based on device capabilities and model computational complexity), we have reserved sufficient fields in devices.yaml to obtain information such as device type, memory, frequency, bandwidth, etc.

Subsequently, based on the matching relationship, the model is partitioned and scheduled to matching device (simulated by Docker or GPU by user themselves) to achieve the best collaborative effect.

Of course, in Ianvs, we only implement the partitioning and scheduling of the model, that is, a simple model parallize strategy, while more complex and efficient pipeline parallelization strategies are left for users to customize themselves.
It is worth noting that the parallelism we implement here is model parallelism. When multiple inference tasks are carried out simultaneously, models that complete inference in this round do not have to wait for the models that have not finished inference. Instead, they can proceed in parallel with the inference of the next task, thus forming a simple pipeline parallelism. More complex and efficient pipeline parallelization strategies are left for future work.

## Roadmap
**July**
Expand All @@ -144,7 +130,7 @@ Of course, in Ianvs, we only implement the partitioning and scheduling of the mo

**August**

- Implement the automatic graph scheduling and partitioning algorithm based on device-model joint analysis.
- Implement the automatic graph scheduling and partitioning algorithm based on Ianvs.

**September**

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit a6e3ccf

Please sign in to comment.