TensorRT & Flux Dev #4484

Woukim · 2024-08-19T13:40:51Z

Woukim
Aug 19, 2024

@comfyanonymous
Sorry for the ping, but maybe you can help figure it out?” I have a RTX 4080 Super with 16gb VRAM, 64gb RAM.
I've tried different weights, also tried the Q8.gguf version, all of which resulted in errors.

What am I doing wrong? TensorRT doesn't support 16gb VRAM? Thank you in advance

Answered by comfyanonymous

Aug 20, 2024

TensorRT needs more than 24GB vram at the moment to convert a Flux model, even a 4090 isn't enough.

View full answer

Woukim · 2024-08-20T14:59:38Z

Woukim
Aug 20, 2024
Author

can someone please help?

1 reply

shammyfiveducks Aug 30, 2024

As Flux is not available in this program to train what model are you selecting to train flux (before you get the error)

comfyanonymous · 2024-08-20T15:08:22Z

comfyanonymous
Aug 20, 2024
Maintainer

TensorRT needs more than 24GB vram at the moment to convert a Flux model, even a 4090 isn't enough.

2 replies

Woukim Aug 20, 2024
Author

Thank you for taking the time to respond, it makes sense now

J-Cott Oct 18, 2024

You can put ComfyUI into CPU mode and do the converting then, but you still can't run it on a 24GB card as the .engine file is 22GB and in total it needs around 28GB when ComfyUI has loaded its other bits.

al-swaiti · 2024-08-22T20:12:31Z

al-swaiti
Aug 22, 2024

couldn't split it to chunks and save it , in loop for gpu and cpu

0 replies

BrechtCorbeel · 2024-08-23T06:15:31Z

BrechtCorbeel
Aug 23, 2024

I am going to get so mad if Nvidia doesn't start putting out affordable 48Gb cards soon come on even a 4090 ti with 36gb vram 20.000 tensors would be great.

0 replies

Foul-Tarnished · 2024-08-24T23:35:18Z

Foul-Tarnished
Aug 24, 2024

Couldn't someone rent a H100 to convert it ?

1 reply

ankh2054 Sep 14, 2024

If renting a H100 will work, I would happily try this? I've not used TensorRT before but I am eagerly waiting for my 4090 and really wanted to try it with Flux.

I could rent a H100 from vast.ai and spin a machine with a Docker image running comfyui. Guessing it only takes a few hours to train the dynamic models, so the cost would be +-$100, depending on how to long it takes to download initial models, train and re-download new ones.

drake7707 · 2024-09-24T07:11:58Z

drake7707
Sep 24, 2024

I could convert it at my workplace but if I remember correctly the TRT will be specific to the GPU I used and not portable to another?
Specs of the server I could use:
2024-09-24 07:03:48,199 - root - INFO - Total VRAM 45548 MB, total RAM 257517 MB
2024-09-24 07:03:48,199 - root - INFO - pytorch version: 2.4.0
2024-09-24 07:03:49,386 - root - INFO - xformers version: 0.0.27.post2
2024-09-24 07:03:49,401 - root - INFO - Device: cuda:0 Quadro RTX 8000

I tried it with the schnell fp8 checkpoint and I run into an error:

[09/24/2024-07:07:26] [TRT] [E] IBuilder::buildSerializedNetwork: Error Code 9: API Usage Error (Networks with BF16 precision require hardwar e with BF16 support.)
!!! Exception during processing !!! a bytes-like object is required, not 'NoneType'

0 replies

comfyanonymous · 2024-09-24T07:23:20Z

comfyanonymous
Sep 24, 2024
Maintainer

You can try converting the Flux model using a ada 6000 and then run the engine with a 4090. However this only works if nothing else is running on the 4090. This means no monitor plugged into it and no applications running on it.

Alternatively you can try fp8 + --fast + the torch compile node.

1 reply

al-swaiti Sep 24, 2024

Hey again , I faced before many issues of cuda out of memory plus out of ram , during quantization I was fixing that by save result model by chunks, I think you can apply this for tensor rt , plus missed u last week's hope u okay 👍

doctorpangloss · 2024-10-05T03:39:22Z

doctorpangloss
Oct 5, 2024

I compiled flux with tensorrt, using the defaults in the tensorrt node pack, and observed better performance*.

runtime	perf
pytorch	1.22s/it
tensorrt	1.06s/it

On RTX A5000, pytorch 2.4.1, tensorrt 10.5.0 Windows
I obviously have been misreading the secs per it line! So there's some promise here. There are a few tweaks that I needed to fix the tracing warnings.

3 replies

DuckersMcQuack Oct 20, 2024

I compiled flux with tensorrt, using the defaults in the tensorrt node pack

What's the process of doing this? As i want to try that on my 3090 to see how much the gain will be.

doctorpangloss Oct 21, 2024

you create the workflow then run the node.
the gain is going to be exactly the same as mine. it is similar to using wsl 2 with torch.compile.

DuckersMcQuack Nov 2, 2024

A5000

Could you share the TRT model? Want to try running it on my 3090.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorRT & Flux Dev #4484

{{title}}

Replies: 8 comments 8 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

TensorRT & Flux Dev #4484

Replies: 8 comments · 8 replies

Woukim Aug 20, 2024 Author

comfyanonymous Aug 20, 2024 Maintainer

Woukim Aug 20, 2024 Author

comfyanonymous Sep 24, 2024 Maintainer

Replies: 8 comments 8 replies

Woukim
Aug 20, 2024
Author

comfyanonymous
Aug 20, 2024
Maintainer

Woukim Aug 20, 2024
Author

comfyanonymous
Sep 24, 2024
Maintainer