Update ort CIs (slow, gpu, train) #2024

IlyasMoutawwakil · 2024-09-14T19:20:29Z

What does this PR do?

This is a PR that fixes:

Onnxruntime Training testing and examples.
Onnruntime GPU testing (CUDA & TRT EPs), and io binding implementation.

I tried my best to make things work with the least change, in some cases that was hard, e.g. io binding was dependant on the input order in the forward signature, which made some model types incompatible with the ORTModel signature.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

echarlaix

Thanks a lot for the great PR @IlyasMoutawwakil !

tests/onnxruntime/test_diffusion.py

echarlaix · 2025-01-21T15:46:58Z

optimum/onnxruntime/trainer.py

        self.state.is_hyper_param_search = trial is not None
+        self.state.train_batch_size = self._train_batch_size



also cc @JingyaHuang who took care of the ort training integrations

optimum/onnxruntime/modeling_ort.py

optimum/utils/import_utils.py

optimum/onnxruntime/modeling_ort.py

optimum/utils/import_utils.py

setup.py

…tangle them

…arison

IlyasMoutawwakil · 2025-01-29T13:54:32Z

setup.py

-        "accelerate",  # ORTTrainer requires it.
+        "transformers>=4.36,<4.49.0",
+    ],
+    "onnxruntime-training": [


The readme and docs should reflect this change

added in #2173

echarlaix · 2025-01-29T17:08:35Z

Thanks a lot @IlyasMoutawwakil for adding many fixes !

update ort CIs

17bc171

IlyasMoutawwakil added training gpu-test trigger GPU tests labels Sep 14, 2024

IlyasMoutawwakil added 8 commits September 14, 2024 21:50

fix train ci

fbaa980

fix gpu ci

90aa85d

gpus all

87c9f3e

devel

0c1c6bd

enable trt

430260e

fix

00e51c7

fix

3fc5486

fix

8044232

IlyasMoutawwakil mentioned this pull request Sep 26, 2024

Added image-to-image task for ORT Pipeline #2031

Merged

3 tasks

test

2fd4d47

IlyasMoutawwakil added onnxruntime-gpu and removed training gpu-test trigger GPU tests labels Sep 27, 2024

IlyasMoutawwakil and others added 5 commits September 27, 2024 16:00

rename

1f322fc

change instance

6f7c599

test

806faca

use available

3eecee6

Merge branch 'main' into enable-ort-gpu-tests

ab62319

IlyasMoutawwakil added the training label Dec 10, 2024

IlyasMoutawwakil added 2 commits January 10, 2025 12:24

Merge branch 'main' into enable-ort-gpu-tests

1b7e652

update

cebe6bf

IlyasMoutawwakil added onnxruntime-training onnxruntime-slow and removed training labels Jan 10, 2025

IlyasMoutawwakil and others added 3 commits January 10, 2025 12:52

shorter labels as well

d0f62b0

add onnxruntime-traning

d001b9b

Merge branch 'main' into enable-ort-gpu-tests

d271637

echarlaix approved these changes Jan 21, 2025

View reviewed changes

IlyasMoutawwakil commented Jan 22, 2025

View reviewed changes

optimum/onnxruntime/modeling_ort.py Outdated Show resolved Hide resolved

echarlaix reviewed Jan 27, 2025

View reviewed changes

optimum/utils/import_utils.py Outdated Show resolved Hide resolved

Merge branch 'main' into enable-ort-gpu-tests

7e122c0

IlyasMoutawwakil commented Jan 28, 2025

View reviewed changes

setup.py Outdated Show resolved Hide resolved

IlyasMoutawwakil and others added 4 commits January 28, 2025 09:15

Update setup.py

dc2361d

update import utils

4eb95f1

Update optimum/onnxruntime/modeling_ort.py

7f1fc40

fix vision encoder decoder io binding

696cc95

IlyasMoutawwakil force-pushed the enable-ort-gpu-tests branch from 0be7883 to 696cc95 Compare January 28, 2025 12:46

IlyasMoutawwakil and others added 12 commits January 28, 2025 15:05

enable bigbird and bigbirg pegasus and seperate timm slow tests to un…

1827450

…tangle them

use bigger machine for slow tests

41abf7f

lower atol and rtol for image classification logits

6f3084a

fix

010030e

large

445b291

enable more Longformer and MCTCT

04c8904

enable commented models in export as well

18e1844

uncomment timm slow models, big bird optimization and marian pkv comp…

4487c74

…arison

Merge branch 'main' into enable-ort-gpu-tests

24d682e

Merge branch 'main' into enable-ort-gpu-tests

def5fdb

fix whisper/speech_to_text test and make convolution deterministic

458355d

pin torch for ort training

881015c

IlyasMoutawwakil force-pushed the enable-ort-gpu-tests branch from eedff54 to 881015c Compare January 29, 2025 13:32

IlyasMoutawwakil commented Jan 29, 2025

View reviewed changes

IlyasMoutawwakil added 2 commits January 29, 2025 15:11

ctc and speech also uses convolution so has to be deterministic

7c8c56f

revert vison2seq atol

3a4bac9

echarlaix merged commit b755036 into main Jan 29, 2025
40 checks passed

echarlaix deleted the enable-ort-gpu-tests branch January 29, 2025 17:11

echarlaix mentioned this pull request Jan 29, 2025

Enhance package availability check for multiple distributions #2164

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update ort CIs (slow, gpu, train) #2024

Update ort CIs (slow, gpu, train) #2024

IlyasMoutawwakil commented Sep 14, 2024 •

edited

Loading

echarlaix left a comment

echarlaix Jan 21, 2025

IlyasMoutawwakil Jan 29, 2025

echarlaix Jan 29, 2025

echarlaix commented Jan 29, 2025

		self.state.is_hyper_param_search = trial is not None
		self.state.train_batch_size = self._train_batch_size

Update ort CIs (slow, gpu, train) #2024

Update ort CIs (slow, gpu, train) #2024

Conversation

IlyasMoutawwakil commented Sep 14, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

echarlaix left a comment

Choose a reason for hiding this comment

echarlaix Jan 21, 2025

Choose a reason for hiding this comment

IlyasMoutawwakil Jan 29, 2025

Choose a reason for hiding this comment

echarlaix Jan 29, 2025

Choose a reason for hiding this comment

echarlaix commented Jan 29, 2025

IlyasMoutawwakil commented Sep 14, 2024 •

edited

Loading