Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
migrate runtime to modern ET libraries (pytorch#2994)
Summary: Pull Request resolved: pytorch#2994 ## Overview Migrated methods from ET libraries to replace our home-brew logics. - Model and input flat buffer is migrated to bundled program flat buffer (.bpte) - Jarvis memory allocation in runtime is migrated to executorch memory manager defined by executorch Span - Input memory allocation is migrated to method-based data pointer assignment. - Output and debug buffer is **partially** migrated to ETDump. - Model output validation is **partially** migrated to method-based verification in bundled program. ## Input flow: - Takes the edge program manager - Build testsuites from methods. Only FOWARD method is applied and hardcoded. - Build bundled program - Serialize and store the bundled program in the flat buffer ## Output flow: - A bundled program is loaded from the serialized flat buffer - The program is executed on a selected backend. - The output is generated. - Validation: compare the expected with actual output by 1. the original Jarvis compare method (ENABLED), and 2. method-based VerifyResultWithBundledExpectedOutput (DISABLED) - **Note**: the sink flow was reverted backed to a series of .npy output files and unflatten by `torch.utils._pytree.tree_unflatten` to re-enable legacy tests. ET/Bolt adopted a new flow that save outputs as `.bin` and load by `np.fromfile`. ETDump gets output from debug buffer. **These will be investigate in stage2** TODO: T185104750 T185106115 ## Memory Allocation Re-abled Jarvis custom memory planning and supported to run on different backends (e.g. HIFI4). - Enabled alloc_graph_input and output. - Defined memory in torch::Span. - **Note**: alloc_graph_output is using deprecated ET APIs: set_data(), mutable_date_ptr(). It has memory misalignment issue when migrating to the new flow. **These will be investigate in stage2** TODO: T185104439 ## Output Validation Verify output by `torch::executor::bundled_program::VerifyResultWithBundledExpectedOutput`. This is currently a dummy validation for quantized tests which have high rtol. So their error threshold is set to a random large value i.e. 1e5 1e7. **These will be investigate in stage2** TODO:T180249993 T185104615 T185104862 # Design Major design decisions (ADR). ## Method 1 [ADOPTED] Modify executor.cpp to consume a bundled_program flatbuffer and execute on a different BUCK host. | - Pros: max reuse of existing configuration for custom Jarvis ops. | - Cons: impact to runtime performance due to starting a new host. ## Method 2 [ABANDONED] Use ET pybinding APIs to consume bundled program as a input and execute in runtime. | - Pros: all ET APIs are encapsulated in Pythons that gears well with existing infrastructure | - Cons: bad extensibility as backend is static (CPU) on start up and cannot be switched on the fly. | - Cons: missing custom ops in runtime on the same BUCK host. Have to duplicate and hardcode dependencies. # Progress Program Injestion (input) - [x] POC run of aten_relu_out and quantized_linear_out - [x] Obtain Javis custom ops in runtime Program Sink (Output) - [x] Get etdump as etdp - [x] Get Inspector object from etdump - [x] Get program output from method - [x] Re-enable scuba profile - [x] Get debug buffer binary - [x] enable dump output from etdump - [x] get output from etdump - [ ] migrate sink flow to etdump - [ ] adjust memory config for dump Verification - [x] verify_result_with_bundled_expected_output with rtol and atol. Will set a very large rtol and atol to pass the validation for quantize. - [x] Compare output with expected_output by original Jarvis compare (RMS) Memory Planning - [x] define memory planning input: MemoryConfig - [x] understand what ET MemoryManager actually takes - [x] migrate to ET MemoryManager with three new arguments - [x] Re-enable alloc_graph_input - [x] Re-enable alloc_graph_output - [x] update legacy of HierarchicalAllocator - [x] Verify if the size of planned buffer are correct Misc. - [ ] verify if input has been memcpy to a custom input buffer in bundled program when input mem is not allocated. Use set_input - [ ] investigate if testsuites run in serial or like buck in parallel - [ ] investigate output.bin workflow. Bolt as reference. - [ ] Refactor to reuse module.h, module.cpp, data_module.cpp - [ ] refactor based on TODO - [x] clean legacy code Reviewed By: tarun292, skrtskrtfb, mcremon-meta Differential Revision: D53870154 fbshipit-source-id: 05efdd48da040f089c0cc65ee7ad5f2cb14be5bd
- Loading branch information