Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IR] Improve external data handling #2020

Merged
merged 32 commits into from
Jan 22, 2025
Merged

[IR] Improve external data handling #2020

merged 32 commits into from
Jan 22, 2025

Conversation

justinchuby
Copy link
Collaborator

@justinchuby justinchuby commented Jan 17, 2025

  1. Add an external_data option to ir.save. This will save initializers as external tensors. It is robust against data loss when overwriting, and is idempotent when the current model does not contain external tensors already referencing the same path.
  2. Expose ir.external_data module as a public module users can use to manipulate external data.
    3. It defines the following methods
    py [ "set_base_dir", "unload_from_model", "load_to_model", "convert_tensors_to_external", "convert_tensors_from_external", ]
    I renamed to_external_data to unload_from_model for clarity. Reviewers please let me know if the naming sounds good.
  3. Support setting a threshold size_threshold_bytes to control which tensors are offloaded.
  4. Simplified torch_apis logic by leveraging to updated ir.save method.
  5. Updated the to_external_data function to always load data to memory, iff the tensor references an external data file that is being written to. This simplifies the logic and avoids creating and managing temporary files.
  6. Implemented a polyfill of the zip() function's strict mode to support Python<=3.9

Note

We do not need to add external data options to ir.load. The external data is always loaded lazily in the IR. If users want to transfer the data to memory at loading, they can use ir.external_data.load_to_model().

Example usage

ir.save(model, "model.onnx", external_data="model.onnx.data")
# Can save many times
ir.save(model, "model_copy.onnx", external_data="model_copy.onnx.data")

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

onnxscript/ir/_external_data_test.py:352

  • The tobytes method should raise a TypeError instead of returning it.
return TypeError

onnxscript/ir/_io.py Show resolved Hide resolved
onnxscript/ir/_external_data.py Outdated Show resolved Hide resolved
onnxscript/ir/_external_data.py Outdated Show resolved Hide resolved
Copy link

codecov bot commented Jan 17, 2025

❌ 32 Tests Failed:

Tests completed Failed Passed Skipped
10336 32 10304 2454
View the top 2 failed tests by shortest run time
onnxscript.backend.onnx_export_test.TestOnnxBackEnd::test_export2python_produces_correct_onnx_script_model_0365_test_erf
Stack Traces | 0.003s run time
onnxscript\backend\onnx_export_test.py:137: in extract_functions
    mod = importlib.import_module(import_name)
C:\hostedtoolcache\windows\Python\3.11.9\x64\Lib\importlib\__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
E   ModuleNotFoundError: No module named 'tests.onnx_backend_test_code.test_erf'

The above exception was the direct cause of the following exception:
.nox\test\Lib\site-packages\parameterized\parameterized.py:620: in standalone_func
    return func(*(a + p.args), **p.kwargs, **kw)
onnxscript\backend\onnx_export_test.py:271: in test_export2python_produces_correct_onnx_script_model
    functions = extract_functions(backend_test.name, code, self.test_folder)
onnxscript\backend\onnx_export_test.py:139: in extract_functions
    raise AssertionError(
E   AssertionError: Unable to import 'tests.onnx_backend_test_code.test_erf' (e=No module named 'tests.onnx_backend_test_code.test_erf') (file: 'D:\\a\\onnxscript\\onnxscript\\tests\\onnx_backend_test_code\\test_erf.py', absolute path: 'D:\\a\\onnxscript\\onnxscript\\tests\\onnx_backend_test_code\\test_erf.py', current folder: D:\a\onnxscript\onnxscript
E   ---- CONTENT --
E   import numpy
E   from onnx import TensorProto
E   from onnx.helper import make_tensor
E   from onnxscript import script, external_tensor
E   from onnxscript.values import Opset
E   from onnxscript.onnx_types import FLOAT
E   from onnxscript.onnx_opset import opset13
E   
E   @script()
E   def bck_test_erf(x: FLOAT[1,3,32,32]) -> (FLOAT[1,3,32,32]):
E       y = opset13.Erf(x)
E       return y
onnxscript.backend.onnx_export_test.TestOnnxBackEnd::test_export2python_produces_correct_onnx_script_model_0411_test_gemm_transposeA
Stack Traces | 0.003s run time
onnxscript\backend\onnx_export_test.py:137: in extract_functions
    mod = importlib.import_module(import_name)
C:\hostedtoolcache\windows\Python\3.11.9\x64\Lib\importlib\__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
E   ModuleNotFoundError: No module named 'tests.onnx_backend_test_code.test_gemm_transposeA'

The above exception was the direct cause of the following exception:
.nox\test\Lib\site-packages\parameterized\parameterized.py:620: in standalone_func
    return func(*(a + p.args), **p.kwargs, **kw)
onnxscript\backend\onnx_export_test.py:271: in test_export2python_produces_correct_onnx_script_model
    functions = extract_functions(backend_test.name, code, self.test_folder)
onnxscript\backend\onnx_export_test.py:139: in extract_functions
    raise AssertionError(
E   AssertionError: Unable to import 'tests.onnx_backend_test_code.test_gemm_transposeA' (e=No module named 'tests.onnx_backend_test_code.test_gemm_transposeA') (file: 'D:\\a\\onnxscript\\onnxscript\\tests\\onnx_backend_test_code\\test_gemm_transposeA.py', absolute path: 'D:\\a\\onnxscript\\onnxscript\\tests\\onnx_backend_test_code\\test_gemm_transposeA.py', current folder: D:\a\onnxscript\onnxscript
E   ---- CONTENT --
E   import numpy
E   from onnx import TensorProto
E   from onnx.helper import make_tensor
E   from onnxscript import script, external_tensor
E   from onnxscript.values import Opset
E   from onnxscript.onnx_types import FLOAT
E   from onnxscript.onnx_opset import opset13
E   
E   @script()
E   def bck_test_gemm_transposeA(a: FLOAT[6,3], b: FLOAT[6,4], c: FLOAT[1,4]) -> (FLOAT[3,4]):
E       y = opset13.Gemm(a, b, c, transA=1)
E       return y
View the full list of 1 ❄️ flaky tests
onnxscript.backend.onnx_export_test.TestOnnxBackEnd::test_export2python_produces_correct_onnx_script_model_0752_test_or2d

Flake rate in main: 8.82% (Passed 31 times, Failed 3 times)

Stack Traces | 0.003s run time
onnxscript\backend\onnx_export_test.py:137: in extract_functions
    mod = importlib.import_module(import_name)
C:\hostedtoolcache\windows\Python\3.11.9\x64\Lib\importlib\__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
E   ModuleNotFoundError: No module named 'tests.onnx_backend_test_code.test_or2d'

The above exception was the direct cause of the following exception:
.nox\test\Lib\site-packages\parameterized\parameterized.py:620: in standalone_func
    return func(*(a + p.args), **p.kwargs, **kw)
onnxscript\backend\onnx_export_test.py:271: in test_export2python_produces_correct_onnx_script_model
    functions = extract_functions(backend_test.name, code, self.test_folder)
onnxscript\backend\onnx_export_test.py:139: in extract_functions
    raise AssertionError(
E   AssertionError: Unable to import 'tests.onnx_backend_test_code.test_or2d' (e=No module named 'tests.onnx_backend_test_code.test_or2d') (file: 'D:\\a\\onnxscript\\onnxscript\\tests\\onnx_backend_test_code\\test_or2d.py', absolute path: 'D:\\a\\onnxscript\\onnxscript\\tests\\onnx_backend_test_code\\test_or2d.py', current folder: D:\a\onnxscript\onnxscript
E   ---- CONTENT --
E   import numpy
E   from onnx import TensorProto
E   from onnx.helper import make_tensor
E   from onnxscript import script, external_tensor
E   from onnxscript.values import Opset
E   from onnxscript.onnx_types import BOOL
E   from onnxscript.onnx_opset import opset7
E   
E   @script()
E   def bck_test_or2d(x: BOOL[3,4], y: BOOL[3,4]) -> (BOOL[3,4]):
E       r_or = opset7.Or(x, y)
E       return r_or

To view more test analytics, go to the Test Analytics Dashboard
📢 Thoughts on this report? Let us know!

onnxscript/ir/_io.py Outdated Show resolved Hide resolved
@justinchuby justinchuby added hold on merging Don't merge yet module: IR Intermediate representation labels Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (2)

onnxscript/ir/_external_data.py:174

  • The variable name 'base_path' should be renamed to 'base_dir' for consistency.
base_path: str | os.PathLike,

onnxscript/ir/_external_data.py:252

  • The word 'unneccesarry' should be corrected to 'unnecessary'.
# Sort all tensors based on tensor sizes, in order to avoid unneccesarry alignment.
onnxscript/ir/_io_test.py Fixed Show fixed Hide fixed
onnxscript/ir/_io_test.py Fixed Show fixed Hide fixed
onnxscript/ir/_io_test.py Fixed Show fixed Hide fixed
onnxscript/ir/_io_test.py Fixed Show fixed Hide fixed
onnxscript/ir/_io_test.py Fixed Show fixed Hide fixed
onnxscript/ir/_io_test.py Fixed Show fixed Hide fixed
onnxscript/ir/_io_test.py Fixed Show fixed Hide fixed
onnxscript/ir/_io_test.py Fixed Show fixed Hide fixed
onnxscript/ir/_io_test.py Fixed Show fixed Hide fixed
onnxscript/ir/_io_test.py Fixed Show fixed Hide fixed
onnxscript/ir/_io_test.py Fixed Show fixed Hide fixed
onnxscript/ir/_io_test.py Fixed Show fixed Hide fixed
onnxscript/ir/_io_test.py Fixed Show fixed Hide fixed
onnxscript/ir/_io_test.py Fixed Show fixed Hide fixed
onnxscript/ir/_io_test.py Fixed Show fixed Hide fixed
onnxscript/ir/_io_test.py Fixed Show fixed Hide fixed
onnxscript/ir/_io_test.py Fixed Show fixed Hide fixed
onnxscript/ir/_io_test.py Fixed Show fixed Hide fixed
onnxscript/ir/_io.py Show resolved Hide resolved
onnxscript/_framework_apis/torch_2_5.py Outdated Show resolved Hide resolved
@justinchuby justinchuby marked this pull request as draft January 21, 2025 16:44
onnxscript/ir/_external_data.py Fixed Show fixed Hide fixed
onnxscript/ir/_external_data.py Fixed Show fixed Hide fixed
onnxscript/ir/_external_data.py Fixed Show fixed Hide fixed
onnxscript/ir/_external_data.py Fixed Show fixed Hide fixed
onnxscript/ir/_external_data.py Fixed Show fixed Hide fixed
onnxscript/ir/_external_data.py Fixed Show fixed Hide fixed
onnxscript/ir/_io_test.py Fixed Show fixed Hide fixed
onnxscript/ir/_io_test.py Fixed Show fixed Hide fixed
onnxscript/ir/_io_test.py Fixed Show fixed Hide fixed
onnxscript/ir/_io_test.py Fixed Show fixed Hide fixed
@justinchuby justinchuby marked this pull request as ready for review January 22, 2025 07:06
onnxscript/ir/_external_data.py Fixed Show fixed Hide fixed
onnxscript/ir/_external_data.py Fixed Show fixed Hide fixed
onnxscript/ir/_polyfill.py Fixed Show fixed Hide fixed
onnxscript/ir/external_data.py Fixed Show fixed Hide fixed
onnxscript/ir/external_data.py Fixed Show fixed Hide fixed
onnxscript/ir/external_data.py Fixed Show fixed Hide fixed
@justinchuby justinchuby requested a review from Copilot January 22, 2025 18:10
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 5 out of 9 changed files in this pull request and generated 1 comment.

Files not reviewed (4)
  • onnxscript/ir/_external_data.py: Evaluated as low risk
  • onnxscript/ir/external_data_test.py: Evaluated as low risk
  • onnxscript/ir/init.py: Evaluated as low risk
  • onnxscript/_framework_apis/torch_2_5.py: Evaluated as low risk
Comments suppressed due to low confidence (8)

onnxscript/ir/external_data.py:383

  • The 'strict=True' argument in the 'zip' function is only available in Python 3.10 and later. This could cause compatibility issues with earlier versions of Python.
for value, external_tensor in zip(initializers_to_become_external, external_tensors, strict=True)

onnxscript/ir/external_data.py:387

  • The 'strict=True' argument in the 'zip' function is only available in Python 3.10 and later. This could cause compatibility issues with earlier versions of Python.
for value, memory_tensor in zip(initializers_to_load_to_memory, memory_tensors, strict=True)

onnxscript/ir/_io_test.py:131

  • Move the with self.assertRaisesRegex(ValueError, "is invalidated") statement to directly wrap the _io.save call to ensure the expected exception is raised by the correct code.
with self.assertRaisesRegex(ValueError, "is invalidated"):

onnxscript/ir/_io_test.py:139

  • In test_save_with_external_data_invalidates_obsolete_external_tensors, check that the new initializer is correctly saved and loaded.
_io.save(loaded_model, path, external_data=external_data_file, size_threshold_bytes=0)

onnxscript/ir/_core.py:614

  • Ensure that the tensor is valid before loading its data.
self._check_validity()

onnxscript/ir/_core.py:653

  • Ensure that the tensor is valid before converting it to a NumPy array.
self._check_validity()

onnxscript/ir/_core.py:682

  • Ensure that the tensor is valid before returning its NumPy representation.
self._check_validity()

onnxscript/ir/_core.py:693

  • Ensure that the tensor is valid before returning its byte representation.
self._check_validity()

onnxscript/ir/external_data.py Show resolved Hide resolved
@justinchuby justinchuby removed the hold on merging Don't merge yet label Jan 22, 2025
@justinchuby justinchuby requested a review from Copilot January 22, 2025 18:28

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 5 out of 9 changed files in this pull request and generated no comments.

Files not reviewed (4)
  • onnxscript/ir/_external_data.py: Evaluated as low risk
  • onnxscript/ir/external_data_test.py: Evaluated as low risk
  • onnxscript/ir/init.py: Evaluated as low risk
  • onnxscript/_framework_apis/torch_2_5.py: Evaluated as low risk
Comments suppressed due to low confidence (5)

onnxscript/ir/_io.py:92

  • The use of the 'strict' argument in the 'zip' function is only available in Python 3.10 and later. Ensure that the polyfill for older versions works correctly.
for initializer, tensor in zip(initializer_values, tensors, strict=True):

onnxscript/ir/_core.py:614

  • Ensure that the tensor's validity is checked before loading its data.
self._check_validity()

onnxscript/ir/_core.py:653

  • Ensure that the tensor's validity is checked before converting it to a numpy array.
self._check_validity()

onnxscript/ir/_core.py:682

  • Ensure that the tensor's validity is checked before accessing its numpy representation.
self._check_validity()

onnxscript/ir/_core.py:693

  • Ensure that the tensor's validity is checked before converting it to bytes.
self._check_validity()
@justinchuby justinchuby merged commit b8d3179 into main Jan 22, 2025
20 of 27 checks passed
@justinchuby justinchuby deleted the justinchu/ir-save branch January 22, 2025 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: IR Intermediate representation
Projects
Development

Successfully merging this pull request may close these issues.

4 participants